Aporia Documentation
Get StartedBook a DemoπŸš€ Cool StuffBlog
V2
V2
  • πŸ“–Aporia Docs
  • πŸ€—Introduction
    • Quickstart
    • Support
  • πŸ’‘Core Concepts
    • Why Monitor ML Models?
    • Understanding Data Drift
    • Analyzing Performance
    • Tracking Data Segments
    • Models & Versions
  • πŸš€Deployment
    • AWS
    • Google Cloud
    • Azure
    • Databricks
    • Offline / On-Prem
    • Platform Architecture
  • 🏠Storing your Predictions
    • Overview
    • Real-time Models (Postgres)
    • Real-time Models (Kafka)
    • Batch Models
    • Kubeflow / KServe
  • 🧠Model Types
    • Regression
    • Binary Classification
    • Multiclass Classification
    • Multi-Label Classification
    • Ranking
  • 🌈Explainability
    • SHAP values
  • πŸ“œNLP
    • Intro to NLP Monitoring
    • Example: Text Classification
    • Example: Token Classification
    • Example: Question Answering
  • πŸͺData Sources
    • Overview
    • Amazon S3
    • Athena
    • BigQuery
    • Databricks
    • Glue Data Catalog
    • Google Cloud Storage
    • PostgreSQL
    • Redshift
    • Snowflake
    • Microsoft SQL Server
    • Oracle
  • ⚑Monitors & Alerts
    • Overview
    • Data Drift
    • Metric Change
    • Missing Values
    • Model Activity
    • Model Staleness
    • Performance Degradation
    • Prediction Drift
    • Value Range
    • Custom Metric
    • New Values
    • Alerts Consolidation
  • 🎨Dashboards
    • Overview
  • πŸ€–ML Monitoring as Code
    • Getting started
    • Adding new models
    • Data Segments
    • Custom metrics
    • Querying metrics
    • Monitors
    • Dashboards
  • πŸ“‘Integrations
    • Slack
    • Webhook
    • Teams
    • Single Sign On (SAML)
    • Cisco
  • πŸ”Administration
    • Role Based Access Control (RBAC)
  • πŸ”‘API Reference
    • REST API
    • API Extended Reference
    • Custom Segment Syntax
    • Custom Metric Syntax
    • Code-Based Metrics
    • Metrics Glossary
  • ⏩Release Notes
    • Release Notes 2024
    • Release Notes 2023
Powered by GitBook
On this page
  • Why Monitor Data Drift?
  • Comparison methods
  • Customizing your monitor
  • How are drifts calculated?
  1. Monitors & Alerts

Data Drift

PreviousOverviewNextMetric Change

Last updated 1 year ago

Why Monitor Data Drift?

Data drifts are one of the top reasons why model accuracy degrades over time. Data drift is a change in model input data that leads to model performance degradation. Monitoring data drift helps detect these model performance issues.

Causes of data drift include:

  • Upstream process changes, such as a sensor being replaced that changes the units of measurement from inches to centimeters.

  • Data quality issues, such as a broken sensor always reading 0.

  • Natural drift in the data, such as mean temperature changing with the seasons.

  • Change in relation between features, or covariant shift.

Comparison methods

For this monitor, the following comparison methods are available:

Customizing your monitor

Configuration may slightly vary depending on the baseline you choose.

STEP 1: choose the fields you would like to monitor

You may select as many fields as you want 😊

Note that the monitor will run on each selected field separately.

STEP 2: choose inspection period and baseline

For the fields you chose in the previous step, the monitor will compare the inspection period distribution with the baseline distribution. An alert will raise if the monitor finds a drift between these two distributions.

STEP 3: calibrate thresholds

Use the monitor preview to help you choose the right threshold and make sure you have the amount of alerts that fits your needs.

The threshold for categorical fields is different then the one for numeric fields. Make sure to calibrate them both if relevant.

How are drifts calculated?

If you need to use other metrics, please contact us.

You have the control to choose the drift metric that best fits your need out of a list of optional metrics including , , , and (for embedding).

⚑
Jensen–Shannon
Hellinger distance
PSI
Euclidean Distance
Anomaly detection
Compared to segment
Compared to training