Google Cloud Storage

This guide describes how to connect Aporia to a Google Cloud Storage (GCS) data source in order to monitor your ML Model in production.

We will assume that your model inputs, outputs, and optionally delayed actuals are stored in a file in GCS. Currently, the following file formats are supported:

parquet
json

This data source may also be used to connect to your model's training dataset to be used as a baseline for model monitoring.

Grant bucket access to Aporia Dataproc Worker Service Account

In order to provide access to GCS, you'll need to update your Aporia Dataproc worker service account with the necessary API permissions.

Go to the Cloud Storage buckets page.

Select the buckets where your data is stored.
Click on the permissions button:

On the Permissions tab, click on the Add Principal button.

On the Grant access page, do the following:

Add the Aporia Dataproc Worker Service Account as a principal.
Assign the Storage Object Viewer role
Click Save.

Now Aporia has the read permission it needs to connect to the GSC buckets you have granted permissions.

Create a GCS data source in Aporia

Go to the Aporia platform and log in to your account.
Go to the Integrations page and click on the Data Connectors tab
Scroll to Connect New Data Source section
Click Connect on the GCS card and follow the instructions

Bravo! 👏 now you can use the data source you've created across all your models in Aporia.

PreviousGlue Data Catalog NextPostgreSQL

Last updated 2 years ago