Athena

This guide describes how to connect Aporia to an Athena data source in order to monitor your ML Model in production.

We will assume that your model inputs, outputs and optionally delayed actuals can be queried with Athena SQL. This data source may also be used to connect to your model's training set to be used as a baseline for model monitoring.

Create a workgroup for Aporia queries

Create a workgroup for Aporia to use to perform queries, see instructions here.

An S3 location (bucket and folder) to which query results will be written must be designated. It is recommended that the bucket be in the same region as the catalog that Athena uses.

Update the Aporia IAM role for Athena access

In order to provide access to Athena, you'll need to update your Aporia IAM role with the necessary API permissions.

Step 1: Obtain your aporia IAM role

Use the same role used for the Aporia deployment. If someone else on your team has deployed Aporia, please reach out to them to obtain the role ARN (it should be in the following format: arn:aws:iam::<account>:role/<role-name-with-path>).

Step 2: Create an access policy

  1. In the list of roles, click the role you obtained.

  2. Add an inline policy.

  3. On the Permissions tab, click Add permissions then click Create inline policy.

  4. In the policy editor, click the JSON tab.

  5. Copy the following access policy, and make sure to fill your correct region, account ID and restrict access to specific databases and tables if necessary.

    Make sure to replace the following placeholders:

    • <region>: You can specify the Athena AWS region or * for the default region

    • <account-id>: The Athena AWS account ID.

    • <data-bucket>: The S3 bucket storing the data for your Athena tables - if more than one bucket, just add the others to the resource list as well.

    • <database-name>: You can specify one or more database names or use * to give Aporia access to all Athena databases.

    • <aporia-workgroup>: The workgroup created on the previous step.

    • <results-bucket>: The bucket configured for the workgroup.

      {   
       "Version": "2012-10-17",
          "Statement": [
              {
                  "Effect": "Allow",
                  "Action": [
                      "s3:ListBucket",
                      "s3:GetBucketLocation"
                  ],
                  "Resource": [
                      "arn:aws:s3:::<data-bucket>",
                      "arn:aws:s3:::<results-bucket>"
                  ]
              },
              {
                  "Effect": "Allow",
                  "Action": "s3:GetObject",
                  "Resource": [
                      "arn:aws:s3:::<data-bucket>/*",
                      "arn:aws:s3:::<results-bucket>/*"
                  ]
              },
              {
                  "Effect": "Allow",
                  "Action": "s3:PutObject",
                  "Resource": [
                      "arn:aws:s3:::<results-bucket>/*"
                  ]
              },
              {
                  "Effect": "Allow",
                  "Action": [
                      "athena:StartQueryExecution",
                      "athena:StopQueryExecution",
                      "athena:GetQueryResults"
                  ],
                  "Resource": "arn:aws:athena:<region>:<account-id>:workgroup/<aporia-workgroup>"
              },
              {
                  "Effect": "Allow",
                  "Action": "athena:ListWorkGroups",
                  "Resource": "*"
              },
              {
                  "Effect": "Allow",
                  "Action": "athena:ListDatabases",
                  "Resource": [
                      "arn:aws:athena:<region>:<account-id>:datacatalog/*"
                  ]
              },
              {
                  "Effect": "Allow",
                  "Action": "glue:GetDatabases",
                  "Resource": [
                      "arn:aws:glue:<region>:<account-id>:catalog",
                      "arn:aws:glue:<region>:<account-id>:database/<database-name>"
                  ]
              },
              {
                  "Effect": "Allow",
                  "Action": [
                      "athena:GetQueryExecution",
                      "athena:BatchGetQueryExecution",
                      "athena:ListQueryExecutions",
                      "athena:GetWorkGroup"
                  ],
                  "Resource": [
                      "arn:aws:athena:<region>:<account-id>:workgroup/*",
                      "arn:aws:athena:<region>:<account-id>:datacatalog/*"
                  ]
              },
              {
                "Effect": "Allow",
                "Action": [
                  "athena:CreatePreparedStatement",
                  "athena:DeletePreparedStatement",
                  "athena:ListPreparedStatements",
                  "athena:GetPreparedStatement",
                  "athena:GetQueryResultsStream"
                ],
                "Resource": [
                  "arn:aws:athena:${AWS::Region}:${AWS::AccountId}:workgroup/*",
                  "arn:aws:athena:${AWS::Region}:${AWS::AccountId}:datacatalog/*",
                  "arn:aws:athena:${AWS::Region}:${AWS::AccountId}:table/*"
                ]
              },
              {
                  "Effect": "Allow",
                  "Action": [
                      "glue:GetTables",
                      "glue:GetTable",
                      "glue:GetPartitions",
                      "glue:GetPartition"
                  ],
                  "Resource": [
                      "arn:aws:glue:<region>:<account-id>:catalog",
                      "arn:aws:glue:<region>:<account-id>:database/<database-name>",
                      "arn:aws:glue:<region>:<account-id>:table/<database-name>/*"
                  ]
              }
          ]
      }
  6. Click Review Policy.

  7. In the Name field, enter a policy name.

  8. Click Create policy.

Now Aporia has the permission it needs to connect to the Athena databases and tables you have specified in the policy.

Create an Athena data source in Aporia

  1. Go to Aporia platform and login to your account.

  2. Go to Integrations page and click on the Data Connectors tab

  3. Scroll to Connect New Data Source section

  4. Click Connect on the Athena card and follow the instructions

Bravo! 👏 now you can use the data source you've created across all your models in Aporia.

Last updated