Aporia Documentation
Get StartedBook a Demo🚀 Cool StuffBlog
V1
V1
  • Welcome to Aporia!
  • 🤗Introduction
    • Quickstart
    • Support
  • 💡Core Concepts
    • Why Monitor ML Models?
    • Understanding Data Drift
    • Analyzing Performance
    • Tracking Data Segments
    • Models & Versions
    • Explainability
  • 🏠Storing your Predictions
    • Overview
    • Real-time Models (Postgres)
    • Real-time Models (Kafka)
    • Batch Models
    • Kubeflow / KServe
    • Logging to Aporia directly
  • 🚀Model Types
    • Regression
    • Binary Classification
    • Multiclass Classification
    • Multi-Label Classification
    • Ranking
  • 📜NLP
    • Intro to NLP Monitoring
    • Example: Text Classification
    • Example: Token Classification
    • Example: Question Answering
  • 🍪Data Sources
    • Overview
    • Amazon S3
    • Athena
    • BigQuery
    • Delta Lake
    • Glue Data Catalog
    • PostgreSQL
    • Redshift
    • Snowflake
  • ⚡Monitors
    • Overview
    • Data Drift
    • Metric Change
    • Missing Values
    • Model Activity
    • Model Staleness
    • New Values
    • Performance Degradation
    • Prediction Drift
    • Value Range
    • Custom Metric
  • 📡Integrations
    • Slack
    • JIRA
    • New Relic
    • Single Sign On (SAML)
    • Webhook
    • Bodywork
  • 🔑API Reference
    • Custom Metric Definition Language
    • REST API
    • SDK Reference
    • Metrics Glossary
Powered by GitBook
On this page
  • Why Monitor Data Drift?
  • Comparison methods
  • Customizing your monitor
  • How are drifts calculated?
  1. Monitors

Data Drift

PreviousOverviewNextMetric Change

Last updated 2 years ago

Why Monitor Data Drift?

Data drifts are one of the top reasons why model accuracy degrades over time. Data drift is a change in model input data that leads to model performance degradation. Monitoring data drift helps detect these model performance issues.

Causes of data drift include:

  • Upstream process changes, such as a sensor being replaced that changes the units of measurement from inches to centimeters.

  • Data quality issues, such as a broken sensor always reading 0.

  • Natural drift in the data, such as mean temperature changing with the seasons.

  • Change in relation between features, or covariant shift.

Comparison methods

For this monitor, the following comparison methods are available:

Customizing your monitor

Configuration may slightly vary depending on the baseline you choose.

STEP 1: choose the fields you would like to monitor

You may select as many fields as you want 😊

Note that the monitor will run on each selected field separately.

STEP 2: choose inspection period and baseline

For the fields you chose in the previous step, the monitor will compare the inspection period distribution with the baseline distribution. An alert will raise if the monitor finds a drift between these two distributions.

STEP 3: calibrate thresholds

Use the monitor preview to help you choose the right threshold and make sure you have the amount of alerts that fits your needs.

The threshold for categorical fields is different then the one for numeric fields. Make sure to calibrate them both if relevant.

How are drifts calculated?

If you need to use other metrics, please contact us.

For numeric fields, Aporia detects drifts based on the divergence metric. For categorical fields, drifts are detected using .

⚡
Jensen–Shannon
Hellinger distance
Anomaly detection
Compared to segment
Compared to training