Batch Models
If your model runs periodically every X days, we refer to it as a batch model (as opposed to a real-time model).
Typically, storing the predictions of batch models is straightforward. The code examples that follow are naive "illustrations" of how to do so.
If you use Pandas, you can append any
DataFrame
to a Parquet file on S3 or other cloud storages by using the fastparquet library:import fastparquet
# Preprocess & predict
X = preprocess(...)
y = model.predict(X_pred)
# Concatenate features, predictions and any other metadata
df = ...
# Store predictions
fastparquet.write(
filename=f"s3://my-models/{MODEL_ID}/{MODEL_VERSION}/serving.parquet",
data=df,
append=True,
)
This example is especially useful on Databricks, but can you can use it on Delta Lake + Spark on K8s operator for example:
# Predict on SparkML
y = model.transform(X)
# Concatenate features, predictions and any other metadata
df = ...
# Append to a Delta table
df.write.format("delta").mode("append").saveAsTable("my_model_serving")