Note that for each prediction you must specify an ID. This ID can later be used to log the actual value of the prediction. If you don't care about actuals, you can simply pass str(uuid.uuid4()) as prediction ID.
After logging your first prediction you'll be able to get into your model page on the dashboard.
Raw Inputs
Raw inputs are the inputs of the model before preprocessing, and they're used to construct the features. Logging them is optional but can help you detect issues in your data pipeline.
For example, if your model predicted that a client will buy insurance, and a day later the client actually makes a purchase, then the actual value of that prediction is True
All of the logging functions described above log the data asynchronously to avoid blocking your program. If you wish to wait for the data to be sent, you can use the flush method:
apr_model.flush()
Troubleshooting
By default, the Aporia SDK is very silent: it doesn't raise exceptions and doesn't write debug logs. This was done because we never want to interrupt your application!
However, when first playing with the Aporia SDK, we highly recommend using the verbose argument, e.g:
aporia.init(..., verbose=True)
This will print errors in a convenient way to make integration easier to debug. You can also pass throw_errors=True, which will make sure you aren't missing any errors.
Important: Make sure to remove throw_errors=True before uploading to staging / production!
Prediction isn't sent?
If your application exits immediately after logging a prediction, the prediction might get discarded.
The reason for this is that predictions are added to a queue and are sent asynchronously.
In order to fix this, use the following API:
apr_model.flush()
Pyspark
To log a Pyspark DataFrames directly, you can use the:
apr_model.log_batch_pyspark_prediction for serving data
apr_model.log_pyspark_training_set for training set
apr_model.log_pyspark_test_set for test set
Example:
import aporia
aporia.init(host="<HOST>",
token="<TOKEN>",
environment="<ENVIRONMENT>",
verbose=True,
raise_errors=True)
# Create a new model + model version in Aporia
model_id = aporia.create_model("my-model", "My Model")
apr_model = aporia.create_model_version(
model_id=model_id,
model_version="v1",
model_type="binary",
features={
"f1": "numeric",
"f2": "numeric",
"f3": "numeric",
},
predictions={
"score": "boolean",
},
)
# Log training set
# We'll assume that there is a column in the dataframe for each feature / prediction
df_train = spark.sql("SELECT * FROM ...")
apr_model.log_pyspark_training_set(df)
# Load & log production data to Aporia
# We'll assume that there is a column in the dataframe for each feature / prediction
df = spark.sql("SELECT * FROM <>")
apr_model.log_batch_pyspark_prediction(
data=df,
# Names of the "ID" and "occurred_at" columns
id_column="id",
timestamp_column="occurred_at",
# Map an prediction (from the schema) to a label
labels={
"<PREDICTION_NAME>": "<COLUMN_NAME>",
},
)
Model type can be , , , , or . Please refer to the relevant documentation on each model type for more info.
To log multiple predictions in one call, check out .
In some cases, you will have access to the of the prediction, based on real-world data.
If you have any further issues, please .
The API of these functions is similar to the connect_serving API (see ).