Overview
Aporia monitors your models by connecting directly to your data. If you don't store your predictions yet, see our guide on Storing Your Predictions (recommended), or just log them directly to Aporia.
Aporia currently supports the following data sources:
- Amazon S3
- BigQuery
- Redshift
- Athena
- Snowflake
- PostgreSQL
- Delta Lake
- Glue Data Catalog
If your storage or database are not shown here, please contact your Aporia account manager for further assistance.
Connecting to a data source begins with configuring its connection details. For example, to connect to a Postgres database, we can create the following data source object:
data_source = PostgresJDBCDataSource(
url="jdbc:postgresql://<POSTGRES_HOSTNAME>/<DBNAME>",
query="SELECT * FROM model_predictions",
user="<DB_USER>",
password="<DB_PASSWORD>"
)
Please refer to the documentation page of the relevant data source for a complete list of supported parameters and configuration options.
After creating a data source, we can create a model version and connect it to the data source. For example:
apr_model = aporia.create_model_version(
model_id="<MODEL_ID>",
model_version="v1",
model_type="binary"
raw_inputs={
"raw_text": "text",
},
features={
"amount": "numeric",
"owner": "string",
"is_new": "boolean",
"embeddings": {"type": "tensor", "dimensions": [768]},
},
predictions={
"will_buy_insurance": "boolean",
"proba": "numeric",
},
)
apr_model.connect_serving(
data_source=data_source,
# Names of the prediction ID and prediction timestamp columns
id_column="prediction_id",
timestamp_column="prediction_timestamp",
)
By default, each raw input, feature, and prediction is mapped to the same column in the PostgreSQL query.
As part of the
connect serving
API, you must specify the following two additional columns:id_column
- A unique ID to represent this prediction.timestamp_column
- A column representing when did this prediction occur.
Integrating actuals can be done by using the
labels
argument of the connect_serving
API. To use it, each Aporia prediction can be mapped to a column representing its actual value.For example, let's assume we have two columns -
will_buy_insurance
(which is the model prediction), and did_buy_insurance
(the ground truth). To integrate it to Aporia:apr_model = aporia.create_model_version(
...
predictions={
"will_buy_insurance": "boolean"
}
)
apr_model.connect_serving(
data_source=data_source,
id_column="prediction_id",
timestamp_column="prediction_timestamp",
labels={
# Prediction name -> Column name representing
"will_buy_insurance": "did_buy_insurance"
}
)
The ground truth can be
NULL
until it actually has value, and that's okay.To connect your model version to training or test sets, you can use the
connect_training
and connect_testing
APIs.For example:
# Training set
apr_model.connect_training(
data_source=training_set_data_source,
id_column="id",
timestamp_column="timestamp",
)
# Test set
apr_model.connect_testing(
data_source=test_set_data_source,
id_column="id",
timestamp_column="timestamp",
)
Any column that has the same name as a raw input, feature, or prediction in the model schema is mapped to the corresponding raw input, feature, or prediction.
However, you can override this mapping using the
raw_inputs
, features
, predictions
, and labels
arguments to the connect_serving
/ connect_training
/ connect_testing
APIs. Example:apr_model.connect_serving(
data_source=aporia.GlueDataSource(
database="datalake",
query="""
SELECT
my_id,
full_name,
age,
my_gender_col,
decision,
was_decision_correct,
occurred_at,
FROM predictions
""",
),
id_column="my_id",
timestamp_column="occurred_at",
raw_inputs={
"fullname": "full_name",
}
features={
"age": "age",
"gender": "my_gender_col",
},
predictions={
"will_buy_insurance": "decision",
},
labels={
"will_buy_insurance": "was_decision_correct"
}
)
Last modified 4mo ago