Links

Batch Models

If your model runs periodically every X days, we refer to it as a batch model (as opposed to a real-time model).
Typically, storing the predictions of batch models is straightforward. The code examples that follow are naive "illustrations" of how to do so.

Example: Pandas to Parquet on S3

If you use Pandas, you can append any DataFrame to a Parquet file on S3 or other cloud storages by using the fastparquet library:
import fastparquet
​
# Preprocess & predict
X = preprocess(...)
y = model.predict(X_pred)
​
# Concatenate features, predictions and any other metadata
df = ...
​
# Store predictions
fastparquet.write(
filename=f"s3://my-models/{MODEL_ID}/{MODEL_VERSION}/serving.parquet",
data=df,
append=True,
)

Example: Pyspark to Delta Lake

This example is especially useful on Databricks, but can you can use it on Delta Lake + Spark on K8s operator for example:
# Predict on SparkML
y = model.transform(X)
​
# Concatenate features, predictions and any other metadata
df = ...
​
# Append to a Delta table
df.write.format("delta").mode("append").saveAsTable("my_model_serving")