Skip to content

Integrate your ML model

With just a few lines of code, your ML model can be integrated to Aporia and you'll be able to create monitors for it.

You might ask yourself what's our definition of a model? Check the Concepts page for more info. Spoiler: It's a very broad definition :-)

This quickstart will teach you how to integrate Aporia using the Dashboard and the Python SDK, but you can also use our REST API and/or the Importer.

STEP 1: Add Model

First, you'll first need to add your model to Aporia's console.

If this is your first time, you can do this as part of the onboarding wizard. Otherwise, click the Add Model button in the Models page.

Add Model

Enter the model name and optionally a description. Click Next.

STEP 2: Initialize the Aporia SDK

At minimum, the Aporia SDK should be integrated to your inference code - e.g the Flask application which makes predictions with the model.

Optionally, you can also log training data, actuals and raw inputs - this will give you better observerability and monitoring capabitilies. Logging your training data, for example, lets you create Data Drift monitors with training as baseline.

In this tutorial, we'll keep it simple and integrate inference only.

You should now see the following screen:

SDK Installation

The Cloud Storage button will help you integrate with the Importer.

Install the Aporia SDK:

pip3 install "aporia[all]" --upgrade
poetry add "aporia[all]"
pipenv install "aporia[all]"
pip3 install "aporia[all]" --upgrade
pip3 freeze > requirements.txt

Import and initialize the Aporia library by copy-pasting the snippet from the wizard:

import aporia

aporia.init(token="<>",
            environment="<>")

Click Mark as completed to continue.

STEP 3: Create Model Version

The next step is to define the schema of your model.

This schema tells the background story of the model: What's the model type? What features does it have? How does its prediction look like? This information is important in order to build good monitors for your model.

The model schema is versioned. We know that sometimes data scientists keep improving their models even after they've reached production - they might add or delete a feature, change hyperparameters, etc. Our monitoring infrastructure is aware of this, and allows you to compare the performance of multiple versions - even if they run in parallel.

The model schema should be inferred automatically. Even though we provide APIs for creating the schema manually, the best practice is to infer it automatically in order to avoid human errors and manual work. If you're using LightGBM, XGBoost, Keras, we have integration examples in our docs.

You can create the model schema from anywhere. Your training code, your inference code or your CI/CD pipeline. It doesn't really matter, as long as the schema is accurate. If you report the same version multiple times with the same schema, the SDK will be aware of this and will not raise an exception.

apr_model = aporia.create_model_version(
  model_id="my-model",
  model_version="v1",
  model_type="binary",
  features={
    "amount": "numeric",
    "owner": "string",
    "is_new": "boolean",
    "created_at": "datetime",
  },
  predictions={
    "approved": "boolean",
    "another_output_field": "numeric",
  },
)
# Example DataFrames, each one with one row
features_df = pd.DataFrame([[12.3, "John", True, pd.Timestamp.now()]], 
  columns=["amount", "owner", "is_new", "created_at"])

predictions_df = pd.DataFrame([[True, 105.12]], 
  columns=["approved", "another_output_field"])


# Create a model version by inferring schemas from pandas DataFrames
apr_model = aporia.create_model_version(
  model_id="my-model",
  model_version="v1",
  model_type="binary",

  features=aporia.pandas.infer_schema_from_dataframe(features_df),
  predictions=aporia.pandas.infer_schema_from_dataframe(predictions_df),
)

In this example, we define a schema for a model that has 4 input features (amount, owner, is_new and created_at), and 2 output fields.

model_id is the auto-generated model ID. You can copy it from the dashboard.

model_version is free text - you can use the model file's hash, git commit hash, experiment ID from MLFlow or anything else.

Possible field types are:

  • numeric
  • boolean
  • categorical - the categories must be numbers
  • string - a categorical field with string values
  • datetime - this can contain either python datetime objects, or an ISO-8601 timestamp string
  • vector - currently not supported in predictions
  • text - not supported in features and predictions

If you decided to create the version in your training code or CI/CD, you can use the following line in order to get a reference to the Aporia model object without specifying the schema:

apr_model = aporia.Model("my-model", "v1")

STEP 4: Log Predictions

Use the apr_model.log_prediction API to log a prediction.

You can also use the apr_model.log_batch_prediction API to log multiple predictions in one call.

apr_model.log_prediction(
  id=<PREDICTION_ID>,
  features={
    "amount": 15.3,
    "owner": "Joe",
    "is_new": True,
    "created_at": datetime.now(),
  },
  predictions={
    "approved": True,
    "another_output_field": 0.55,
  },
  confidence=0.84,
)
apr_model.log_batch_prediction([
  {
    "id": <PREDICTION_ID>,
    "features": {
      "amount": 15.3,
      "owner": "Joe",
      "is_new": True,
      "created_at": datetime.now(),
    },
    "predictions": {
      "approved": True,
      "another_output_field": 0.55,
    },
  },
  {
    "id": <ANOTHER_PREDICTION_ID>,
    "features": {
      "amount": 14.3,
      "owner": "John",
      "is_new": False,
      "created_at": datetime.now(),
    },
    "predictions": {
      "approved": False,
      "another_output_field": 0.3,
    },
  },
  ...
])

Note that for each prediction you must specify an ID. This ID can later be used to log the actual value of the prediction. If you don't care about actuals, you can simply pass str(uuid.uuid4()) as prediction ID.

Both of these APIs are completely async. This was done in order to avoid blocking your application, which possibly handles a lot of predictions per second.

You can now go to Aporia's dashboard, see your model and create monitors for it! 👏

Troubleshooting

By default, the Aporia SDK is very silent: it doesn't raise exceptions and doesn't write debug logs. This was done because we never want to interrupt your application!

However, when first playing with the Aporia SDK, we highly recommend using the verbose argument, e.g:

aporia.init(..., verbose=True)

This will print errors in a convenient way to make integration easier to debug. You can also pass throw_errors=True, which will make sure you aren't missing any errors.

If you have any further issues, please contact us.

Important: Make sure to remove throw_errors=True before uploading to staging / production!

Prediction isn't sent?

If your application exits immediately after logging a prediction, the prediction might get discarded.

The reason for this is that predictions are added to a queue and are sent asynchronously.

In order to fix this, use the following API:

apr_model.flush()