Glue Data Catalog
Last updated
Last updated
This guide describes how to use the Glue Data Catalog data source in order to monitor your ML Model in production.
We will assume that your model inputs, outputs and optionally delayed actuals can be found to exist as tables in Glue Data Catalog. This data source may also be used to connect to your model's training set to be used as a baseline for model monitoring.
In order to provide access to Glue Data Catalog, you'll need to update your Aporia IAM role with the necessary API permissions.
Use the same role used for the Aporia deployment. If someone else on your team has deployed Aporia, please reach out to them to obtain the role ARN (it should be in the following format: arn:aws:iam::<account>:role/<role-name-with-path>
).
In the list of roles, click the role you obtained.
Add an inline policy.
On the Permissions tab, click Add permissions then click Create inline policy.
In the policy editor, click the JSON tab.
Copy the following access policy, and make sure to fill your correct region, account ID and restrict access to specific databases and tables if necessary.
Click Review Policy.
In the Name field, enter a policy name.
Click Create policy.
Now Aporia has the read permission it needs to connect to the Glue Data Catalog databases and tables you have specified in the policy.
Go to Aporia platform and login to your account.
Go to Integrations page and click on the Data Connectors tab
Scroll to Connect New Data Source section
Click Connect on the Glue Data Catalog card and follow the instructions
Bravo! 👏 now you can use the data source you've created across all your models in Aporia.
A common use-case is storing serving data in JSONs on S3 files.
The following is a sample query of how to extract the JSON data to Aporia features: