Aporia Documentation
Get StartedBook a Demo🚀 Cool StuffBlog
V2
V2
  • 📖Aporia Docs
  • 🤗Introduction
    • Quickstart
    • Support
  • 💡Core Concepts
    • Why Monitor ML Models?
    • Understanding Data Drift
    • Analyzing Performance
    • Tracking Data Segments
    • Models & Versions
  • 🚀Deployment
    • AWS
    • Google Cloud
    • Azure
    • Databricks
    • Offline / On-Prem
    • Platform Architecture
  • 🏠Storing your Predictions
    • Overview
    • Real-time Models (Postgres)
    • Real-time Models (Kafka)
    • Batch Models
    • Kubeflow / KServe
  • 🧠Model Types
    • Regression
    • Binary Classification
    • Multiclass Classification
    • Multi-Label Classification
    • Ranking
  • 🌈Explainability
    • SHAP values
  • 📜NLP
    • Intro to NLP Monitoring
    • Example: Text Classification
    • Example: Token Classification
    • Example: Question Answering
  • 🍪Data Sources
    • Overview
    • Amazon S3
    • Athena
    • BigQuery
    • Databricks
    • Glue Data Catalog
    • Google Cloud Storage
    • PostgreSQL
    • Redshift
    • Snowflake
    • Microsoft SQL Server
    • Oracle
  • ⚡Monitors & Alerts
    • Overview
    • Data Drift
    • Metric Change
    • Missing Values
    • Model Activity
    • Model Staleness
    • Performance Degradation
    • Prediction Drift
    • Value Range
    • Custom Metric
    • New Values
    • Alerts Consolidation
  • 🎨Dashboards
    • Overview
  • 🤖ML Monitoring as Code
    • Getting started
    • Adding new models
    • Data Segments
    • Custom metrics
    • Querying metrics
    • Monitors
    • Dashboards
  • 📡Integrations
    • Slack
    • Webhook
    • Teams
    • Single Sign On (SAML)
    • Cisco
  • 🔐Administration
    • Role Based Access Control (RBAC)
  • 🔑API Reference
    • REST API
    • API Extended Reference
    • Custom Segment Syntax
    • Custom Metric Syntax
    • Code-Based Metrics
    • Metrics Glossary
  • ⏩Release Notes
    • Release Notes 2024
    • Release Notes 2023
Powered by GitBook
On this page
  • Update the Aporia IAM role for Glue Data Catalog access
  • Create a Glue Data Catalog data source in Aporia
  • Extracting features from JSON
  1. Data Sources

Glue Data Catalog

PreviousDatabricksNextGoogle Cloud Storage

Last updated 2 years ago

This guide describes how to use the Glue Data Catalog data source in order to monitor your ML Model in production.

We will assume that your model inputs, outputs and optionally delayed actuals can be found to exist as tables in Glue Data Catalog. This data source may also be used to connect to your model's training set to be used as a baseline for model monitoring.

Update the Aporia IAM role for Glue Data Catalog access

In order to provide access to Glue Data Catalog, you'll need to update your Aporia IAM role with the necessary API permissions.

Step 1: Obtain your aporia IAM role

Use the same role used for the Aporia deployment. If someone else on your team has deployed Aporia, please reach out to them to obtain the role ARN (it should be in the following format: arn:aws:iam::<account>:role/<role-name-with-path>).

Step 2: Create an access policy

  1. In the list of roles, click the role you obtained.

  2. Add an inline policy.

  3. On the Permissions tab, click Add permissions then click Create inline policy.

  4. In the policy editor, click the JSON tab.

  5. Copy the following access policy, and make sure to fill your correct region, account ID and restrict access to specific databases and tables if necessary.

    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Effect": "Allow",
                "Action": [
                    "glue:GetConnections"
                ],
                "Resource": [
                    "arn:aws:glue:<REGION>:<ACCOUNT_ID>:catalog",
                    "arn:aws:glue:<REGION>:<ACCOUNT_ID>:connection/*"
                ]
            },
            {
                "Effect": "Allow",
                "Action": [
                    "glue:GetDatabase",
                    "glue:GetDatabases"
                ],
                "Resource": [
                    "arn:aws:glue:<REGION>:<ACCOUNT_ID>:catalog",
                    "arn:aws:glue:<REGION>:<ACCOUNT_ID>:database/default",
                    "arn:aws:glue:<REGION>:<ACCOUNT_ID>:database/global_temp",
                    "arn:aws:glue:<REGION>:<ACCOUNT_ID>:database/*"
                ]
            },
            {
                "Effect": "Allow",
                "Action": [
                    "glue:GetTable",
                    "glue:GetTables",
                    "glue:GetPartitions",
                    "glue:GetPartition",
                    "glue:SearchTables"
                ],
                "Resource": [
                    "arn:aws:glue:<REGION>:<ACCOUNT_ID>:catalog",
                    "arn:aws:glue:<REGION>:<ACCOUNT_ID>:database/*",
                    "arn:aws:glue:<REGION>:<ACCOUNT_ID>:table/*"
                ]
            },
            {
                "Effect": "Allow",
                "Action": [
                    "glue:GetUserDefinedFunctions"
                ],
                "Resource": [
                    "*"
                ]
            },
            {
                "Effect": "Allow",
                "Action": [
                    "glue:CreateDatabase"
                ],
                "Resource": [
                    "arn:aws:glue:<REGION>:<ACCOUNT_ID>:catalog",
                    "arn:aws:glue:<REGION>:<ACCOUNT_ID>:database/default",
                    "arn:aws:glue:<REGION>:<ACCOUNT_ID>:database/global_temp"
                ]
            }
        ]
    }
  6. Click Review Policy.

  7. In the Name field, enter a policy name.

  8. Click Create policy.

Now Aporia has the read permission it needs to connect to the Glue Data Catalog databases and tables you have specified in the policy.

Create a Glue Data Catalog data source in Aporia

  1. Go to Integrations page and click on the Data Connectors tab

  2. Scroll to Connect New Data Source section

  3. Click Connect on the Glue Data Catalog card and follow the instructions

Extracting features from JSON

A common use-case is storing serving data in JSONs on S3 files.

The following is a sample query of how to extract the JSON data to Aporia features:

WITH model_data AS (
    SELECT
        prediction_id,
        prediction_timestamp,
        model_version,
        proba,
        actual,
        FROM_JSON(
            features_json,
            "features STRUCT<age FLOAT, state STRING, is_single BOOLEAN>"
        ) AS parsed_json
    FROM
        models_store.test_model
)
SELECT
    prediction_id,
    prediction_timestamp,
    model_version,
    proba,
    actual,
    parsed_json.features.age as age,
    parsed_json.features.state as state,
    parsed_json.features.is_single as is_single
FROM
    model_data

Go to and login to your account.

Bravo! now you can use the data source you've created across all your models in Aporia.

🍪
👏
Aporia platform