Logging to Aporia directly
This section will teach you how to integrate Aporia using Python SDK, but you can also use our REST API or the integrate to your own DB.
Get Started
To get started, install the Aporia SDK:
And then initialize it in your code:
Create Model
To create a new model to be monitored in Aporia, you can call the aporia.create_model(...)
API:
This API would not recreate the model if the model ID already exists. You can also specify color, icon, tags, and model owner:
Create Model Version
Each model in Aporia contains different Model Versions. When you (re)train your model, you should create a new model version in Aporia.
Manual
Inferring from Pandas DataFrame
Model version parameter can be any string - you can use the model file's hash, git commit hash, experiment/run ID from MLFlow or anything else.
Model type can be regression, binary, multiclass, multi-label, or ranking. Please refer to the relevant documentation on each model type for more info.
Field Types
numeric
- valid examples: 1, 2.87, 0.53, 300.13boolean
- valid examples: True, Falsecategorical
- a categorical field with integer valuesstring
- a categorical field with string valuesdatetime
- contains either python datetime objects, or an ISO-8601 timestamp stringtext
- freeform textdict
- dictionaries - at the moment keys are strings and values are numerictensor
- useful for unstructured data, must specify shape, e.g.{"type": "tensor", "dimensions": [768]}
vector
- useful for arrays that can be different in sizes
Get a reference to an existing version
If you already created a version, for example during your training, and you want to use it again, you can receive a reference to the version.
Logging Training / Test Sets
To log the training or test sets of your model, you can use the apr_model.log_training_set
or apr_model.log_test_set
functions, respectively.
For example, if we have the following training set:
Then you can run:
And similarly, you can use the apr_model.log_test_set
to log your test set.
In both functions, you can pass raw_inputs
to log the raw inputs of your training / test sets.
Logging Serving Data
Log Predictions
Use the apr_model.log_prediction
API to log a new prediction.
Note that for each prediction you must specify an ID. This ID can later be used to log the actual value of the prediction. If you don't care about actuals, you can simply pass str(uuid.uuid4())
as prediction ID.
After logging your first prediction you'll be able to get into your model page on the dashboard.
To log multiple predictions in one call, check out Batching.
Raw Inputs
Raw inputs are the inputs of the model before preprocessing, and they're used to construct the features. Logging them is optional but can help you detect issues in your data pipeline.
Example: Log raw inputs separately
Example: Log raw inputs in log_prediction
Actuals
In some cases, you will have access to the actual value of the prediction, based on real-world data.
For example, if your model predicted that a client will buy insurance, and a day later the client actually makes a purchase, then the actual value of that prediction is True
Example: Log actuals separately
Example: Log actuals in log_prediction
Batching
All of the function above log a single prediction. If you wish to log multiple predictions in one large batch, you can use the log_batch_*
functions.
Each of these functions receive a list of dictionaries, such that each dict contains the parameters of the singular version of the function.
Example: Logging batch predictions
Example: Logging batch actuals
Example: Logging batch raw inputs
Logging Pandas DataFrame / Series
If the data you wish to log is stored in a Pandas Series or DataFrame (with a single row), you can use the aporia.pandas
utility API:
Asynchronous logging
All of the logging functions described above log the data asynchronously to avoid blocking your program. If you wish to wait for the data to be sent, you can use the flush
method:
Troubleshooting
By default, the Aporia SDK is very silent: it doesn't raise exceptions and doesn't write debug logs. This was done because we never want to interrupt your application!
However, when first playing with the Aporia SDK, we highly recommend using the verbose argument, e.g:
This will print errors in a convenient way to make integration easier to debug. You can also pass throw_errors=True
, which will make sure you aren't missing any errors.
If you have any further issues, please contact us.
Important: Make sure to remove throw_errors=True
before uploading to staging / production!
Prediction isn't sent?
If your application exits immediately after logging a prediction, the prediction might get discarded.
The reason for this is that predictions are added to a queue and are sent asynchronously.
In order to fix this, use the following API:
apr_model.flush()
Pyspark
To log a Pyspark DataFrames directly, you can use the:
apr_model.log_batch_pyspark_prediction
for serving dataapr_model.log_pyspark_training_set
for training setapr_model.log_pyspark_test_set
for test set
The API of these functions is similar to the connect_serving
API (see Data Sources - Overview).
Example:
Last updated