Batch Models

If your model runs periodically every X days, we refer to it as a batch model (as opposed to a real-time model).

Typically, storing the predictions of batch models is straightforward. The code examples that follow are naive "illustrations" of how to do so.

Example: Pandas to Parquet on S3

If you use Pandas, you can append any DataFrame to a Parquet file on S3 or other cloud storages by using the fastparquet library:

import fastparquet

# Preprocess & predict
X = preprocess(...)
y = model.predict(X_pred)

# Concatenate features, predictions and any other metadata
df = ...

# Store predictions

Example: Pyspark to Delta Lake

This example is especially useful on Databricks, but can you can use it on Delta Lake + Spark on K8s operator for example:

# Predict on SparkML
y = model.transform(X)

# Concatenate features, predictions and any other metadata
df = ...

# Append to a Delta table

Last updated