Batch Models
If your model runs periodically every X days, we refer to it as a batch model (as opposed to a real-time model).
Typically, storing the predictions of batch models is straightforward. The code examples that follow are naive "illustrations" of how to do so.
Example: Pandas to Parquet on S3
If you use Pandas, you can append any DataFrame
to a Parquet file on S3 or other cloud storages by using the fastparquet library:
Example: Pyspark to Delta Lake
This example is especially useful on Databricks, but can you can use it on Delta Lake + Spark on K8s operator for example:
Last updated