Aporia Documentation
Get StartedBook a Demo🚀 Cool StuffBlog
V2
V2
  • 📖Aporia Docs
  • 🤗Introduction
    • Quickstart
    • Support
  • 💡Core Concepts
    • Why Monitor ML Models?
    • Understanding Data Drift
    • Analyzing Performance
    • Tracking Data Segments
    • Models & Versions
  • 🚀Deployment
    • AWS
    • Google Cloud
    • Azure
    • Databricks
    • Offline / On-Prem
    • Platform Architecture
  • 🏠Storing your Predictions
    • Overview
    • Real-time Models (Postgres)
    • Real-time Models (Kafka)
    • Batch Models
    • Kubeflow / KServe
  • 🧠Model Types
    • Regression
    • Binary Classification
    • Multiclass Classification
    • Multi-Label Classification
    • Ranking
  • 🌈Explainability
    • SHAP values
  • 📜NLP
    • Intro to NLP Monitoring
    • Example: Text Classification
    • Example: Token Classification
    • Example: Question Answering
  • 🍪Data Sources
    • Overview
    • Amazon S3
    • Athena
    • BigQuery
    • Databricks
    • Glue Data Catalog
    • Google Cloud Storage
    • PostgreSQL
    • Redshift
    • Snowflake
    • Microsoft SQL Server
    • Oracle
  • ⚡Monitors & Alerts
    • Overview
    • Data Drift
    • Metric Change
    • Missing Values
    • Model Activity
    • Model Staleness
    • Performance Degradation
    • Prediction Drift
    • Value Range
    • Custom Metric
    • New Values
    • Alerts Consolidation
  • 🎨Dashboards
    • Overview
  • 🤖ML Monitoring as Code
    • Getting started
    • Adding new models
    • Data Segments
    • Custom metrics
    • Querying metrics
    • Monitors
    • Dashboards
  • 📡Integrations
    • Slack
    • Webhook
    • Teams
    • Single Sign On (SAML)
    • Cisco
  • 🔐Administration
    • Role Based Access Control (RBAC)
  • 🔑API Reference
    • REST API
    • API Extended Reference
    • Custom Segment Syntax
    • Custom Metric Syntax
    • Code-Based Metrics
    • Metrics Glossary
  • ⏩Release Notes
    • Release Notes 2024
    • Release Notes 2023
Powered by GitBook
On this page
  • Grant bucket access to Aporia Dataproc Worker Service Account
  • Create a GCS data source in Aporia
  1. Data Sources

Google Cloud Storage

PreviousGlue Data CatalogNextPostgreSQL

Last updated 1 year ago

This guide describes how to connect Aporia to a Google Cloud Storage (GCS) data source in order to monitor your ML Model in production.

We will assume that your model inputs, outputs, and optionally delayed actuals are stored in a file in GCS. Currently, the following file formats are supported:

  • parquet

  • json

This data source may also be used to connect to your model's training dataset to be used as a baseline for model monitoring.

Grant bucket access to Aporia Dataproc Worker Service Account

In order to provide access to GCS, you'll need to update your Aporia Dataproc worker service account with the necessary API permissions.

  1. Select the buckets where your data is stored.

  2. Click on the permissions button:

On the Permissions tab, click on the Add Principal button.

On the Grant access page, do the following:

  1. Add the Aporia Dataproc Worker Service Account as a principal.

  2. Assign the Storage Object Viewer role

  3. Click Save.

Now Aporia has the read permission it needs to connect to the GSC buckets you have granted permissions.

Create a GCS data source in Aporia

  1. Go to the Integrations page and click on the Data Connectors tab

  2. Scroll to Connect New Data Source section

  3. Click Connect on the GCS card and follow the instructions

Go to the .

Go to the and log in to your account.

Bravo! now you can use the data source you've created across all your models in Aporia.

🍪
👏
Cloud Storage buckets page
Aporia platform