Amazon S3
Last updated
Last updated
This guide describes how to connect Aporia to an S3 data source in order to monitor a new ML Model in production.
We will assume that your model inputs, outputs and optionally delayed actuals are stored in a file in S3. Currently, the following file formats are supported:
parquet
json
csv
delta
This data source may also be used to connect to your model's training/test set to be used as a baseline for model monitoring.
In order to provide access to S3, create a IAM role with the necessary API permissions.
Log into your AWS Console and go to the IAM console.
Click the Roles tab in the sidebar.
Click Create role.
In Select type of trusted entity, click the Web Identity tile.
Under Identity Provider, click on Create New.
Under Provider Type, click the OpenID Connect tile.
In the Provider URL field, enter the Aporia cluster OIDC URL.
In the Audience field, enter "sts.amazonaws.com".
Click the Add provider button.
Close the new tab
Refresh the Identity Provider list.
Select the newly created identity provider.
In the Audience field, select “sts.amazonaws.com”.
Click the Next button.
Click the Next button.
In the Role name field, enter a role name.
In the list of roles, click the role you created.
Add an inline policy.
On the Permissions tab, click Add permissions then click Create inline policy.
In the policy editor, click the JSON tab.
Copy the following access policy, and make sure to fill your correct bucket name.
Click Review Policy.
In the Name field, enter a policy name.
Click Create policy.
If you use Service Control Policies to deny certain actions at the AWS account level, ensure that sts:AssumeRoleWithWebIdentity
is allowlisted so Aporia can assume the cross-account role.
In the role summary, copy the Role ARN.
Next, please provide your Aporia account manager with the Role ARN for the role you've just created.
To create a new model to be monitored in Aporia, you can call the aporia.create_model(...)
API:
Each model in Aporia contains different Model Versions. When you (re)train your model, you should create a new model version in Aporia.
Each raw input, feature or prediction is mapped by default to the column of the same name in the Athena query.
By creating a feature named amount
or a prediction named proba
, for example, the S3 data source will expect a column in the file named amount
or proba
, respectively.
Next, create an instance of S3DataSource
and pass it to apr_model.connect_serving(...)
or apr_model.connect_training(...)
:
Note that as part of the connect_serving
API, you are required to specify additional 2 columns:
id_column
- A unique ID to represent this prediction.
timestamp_column
- A column representing when did this prediction occur.
For more information on:
Advanced feature / prediction <-> column mapping
How to integrate delayed actuals
How to integrate training / test sets
Please see the Data Sources Overview page.