Aporia Documentation
Get StartedBook a Demo🚀 Cool StuffBlog
V1
V1
  • Welcome to Aporia!
  • 🤗Introduction
    • Quickstart
    • Support
  • 💡Core Concepts
    • Why Monitor ML Models?
    • Understanding Data Drift
    • Analyzing Performance
    • Tracking Data Segments
    • Models & Versions
    • Explainability
  • 🏠Storing your Predictions
    • Overview
    • Real-time Models (Postgres)
    • Real-time Models (Kafka)
    • Batch Models
    • Kubeflow / KServe
    • Logging to Aporia directly
  • 🚀Model Types
    • Regression
    • Binary Classification
    • Multiclass Classification
    • Multi-Label Classification
    • Ranking
  • 📜NLP
    • Intro to NLP Monitoring
    • Example: Text Classification
    • Example: Token Classification
    • Example: Question Answering
  • 🍪Data Sources
    • Overview
    • Amazon S3
    • Athena
    • BigQuery
    • Delta Lake
    • Glue Data Catalog
    • PostgreSQL
    • Redshift
    • Snowflake
  • ⚡Monitors
    • Overview
    • Data Drift
    • Metric Change
    • Missing Values
    • Model Activity
    • Model Staleness
    • New Values
    • Performance Degradation
    • Prediction Drift
    • Value Range
    • Custom Metric
  • 📡Integrations
    • Slack
    • JIRA
    • New Relic
    • Single Sign On (SAML)
    • Webhook
    • Bodywork
  • 🔑API Reference
    • Custom Metric Definition Language
    • REST API
    • SDK Reference
    • Metrics Glossary
Powered by GitBook
On this page
  • Storage
  • Directory Structure
  • Data Structure
  1. Storing your Predictions

Overview

PreviousExplainabilityNextReal-time Models (Postgres)

Last updated 2 years ago

Monitoring your Machine Learning models begins with storing their inputs and outputs in production.

Oftentimes, this data is used not just for model monitoring, but also for retraining, auditing, and other purposes; therefore, it is crucial that you have complete control over it.

Aporia monitors your models by connecting directly to your data, in your format. This section discusses the fundamentals of storing model predictions.

If you are not storing your predictions today, you can also , although storing your predictions in your own database is highly recommended.

Storage

Depending on your existing enterprise data lake infrastructure, performance requirements, and cloud costs constraints, storing your predictions can be done in a variety of data stores.

Here are some common options:

  • /

  • /

  • Parquet files on S3 / GCS / ABS

    • If you choose this option, a metastore such as is recommended.

Directory Structure

When storing your predictions, it's highly recommended to adopt a standardized directory structure (or SQL table structure) across all of your organization's models.

With a standardized structure, you'll be able to get all models onboarded to the monitoring system automatically.

Here is a very basic example:

s3://myorg-models/
├── my-model/
    ├── v1/
    │   ├── train.parquet
    │   ├── test.parquet
    │   ├── serving.parquet
    │   ├── artifact.onnx
    ├── v2/
    │   ├── train.parquet
    │   ├── test.parquet
    │   └── serving.parquet
    │   └── artifact.onnx

Even though this section focuses on the storage of predictions, you should also consider saving the training and test sets of your models. They can serve as a monitoring baseline.

Data Structure

Recommendations:

  • One row per prediction.

  • One column per feature, prediction or raw input.

  • Use a prefix for column names to identify their group (e.g features., raw_inputs., predictions., actuals., etc.)

  • For serving, add ID and prediction timestamp columns.

Example:

+-----+----------------------+-------------------+---------------+----------------+-------------------+-------------------------+--------------+----------------------+------------------------+
| id  |      timestamp       | predictions.score | actuals.score | raw_inputs.age | raw_inputs.gender | features.my_embeddings  | features.age | features.gender_male | features.gender_female |
+-----+----------------------+-------------------+---------------+----------------+-------------------+-------------------------+--------------+----------------------+------------------------+
|   1 | 2022-10-19T14:21:08Z |              0.58 |          0.59 |             64 | male              | [0.58, 0.19, 0.38, ...] |           64 |                    1 |                      0 |
|   2 | 2022-10-19T14:21:08Z |              0.64 |          0.66 |             62 | woman             | [0.48, 0.20, 0.42, ...] |           62 |                    0 |                      1 |
| ... | ...                  |               ... |           ... |            ... | ...               | ...                     |          ... |                  ... |                    ... |
+-----+----------------------+-------------------+---------------+----------------+-------------------+-------------------------+--------------+----------------------+------------------------+
🏠
log your predictions directly to Aporia
BigQuery
Delta Lake
Databricks Lakehouse
Snowflake
Elasticsearch
OpenSearch
Glue Data Catalog