Aporia Documentation
Get StartedBook a DemoπŸš€ Cool StuffBlog
V2
V2
  • πŸ“–Aporia Docs
  • πŸ€—Introduction
    • Quickstart
    • Support
  • πŸ’‘Core Concepts
    • Why Monitor ML Models?
    • Understanding Data Drift
    • Analyzing Performance
    • Tracking Data Segments
    • Models & Versions
  • πŸš€Deployment
    • AWS
    • Google Cloud
    • Azure
    • Databricks
    • Offline / On-Prem
    • Platform Architecture
  • 🏠Storing your Predictions
    • Overview
    • Real-time Models (Postgres)
    • Real-time Models (Kafka)
    • Batch Models
    • Kubeflow / KServe
  • 🧠Model Types
    • Regression
    • Binary Classification
    • Multiclass Classification
    • Multi-Label Classification
    • Ranking
  • 🌈Explainability
    • SHAP values
  • πŸ“œNLP
    • Intro to NLP Monitoring
    • Example: Text Classification
    • Example: Token Classification
    • Example: Question Answering
  • πŸͺData Sources
    • Overview
    • Amazon S3
    • Athena
    • BigQuery
    • Databricks
    • Glue Data Catalog
    • Google Cloud Storage
    • PostgreSQL
    • Redshift
    • Snowflake
    • Microsoft SQL Server
    • Oracle
  • ⚑Monitors & Alerts
    • Overview
    • Data Drift
    • Metric Change
    • Missing Values
    • Model Activity
    • Model Staleness
    • Performance Degradation
    • Prediction Drift
    • Value Range
    • Custom Metric
    • New Values
    • Alerts Consolidation
  • 🎨Dashboards
    • Overview
  • πŸ€–ML Monitoring as Code
    • Getting started
    • Adding new models
    • Data Segments
    • Custom metrics
    • Querying metrics
    • Monitors
    • Dashboards
  • πŸ“‘Integrations
    • Slack
    • Webhook
    • Teams
    • Single Sign On (SAML)
    • Cisco
  • πŸ”Administration
    • Role Based Access Control (RBAC)
  • πŸ”‘API Reference
    • REST API
    • API Extended Reference
    • Custom Segment Syntax
    • Custom Metric Syntax
    • Code-Based Metrics
    • Metrics Glossary
  • ⏩Release Notes
    • Release Notes 2024
    • Release Notes 2023
Powered by GitBook
On this page
  • Building the metric code
  • Registering your metric
  • Testing your metric
  • Supported 3rd party libraries
  1. API Reference

Code-Based Metrics

PreviousCustom Metric SyntaxNextMetrics Glossary

Last updated 1 year ago

Code-based metrics allow users to define Pyspark-based metrics that allow for computation on raw data, element-wise operations, and support third-party libraries.

In the following guide we will explain how one can use code-based metrics in Aporia to gain higher flexibility on the metric’s calculation.

Building the metric code

A code-based metric in Aporia gets a Pyspark data frame as an input and should return a numeric value/NaN as an output. Similar to custom metrics, code-based metrics are defined for a specific model and can be used with all versions/datasets/segments of that model.

Let's take a look at the following example:

import numpy as np

def calc_metric(df):
	"""
	My function simply returns the average age, but I can do whatever calculation
  	I wish with the data frame
	"""
	return np.average(df.collect().columns.age)

Supported libraries can be found below.

Code-based metrics are calculated at the same frequency of all other calculation jobs as specified by your model's aggregation period. The code-based metric will be calculated on the following data frames:

  1. all data over your model's retention period (you can filter this data to a specific time period)

  2. all segments (separately) over your model's retention period (you can filter this data to a specific time period)

Performance wise, it is best practice to perform the calculation on top of the Pyspark data frame rather than collecting it first using df.collect()

Registering your metric

Once you have your metric ready, you can register it to the relevant Aporia model. Below you will find example code to help you get started:

import requests
from http import HTTPStatus

ACCOUNT = <<comlete your account ID>>
WORKSPACE = <<complete your workspace ID>>
MODEL_ID = <<complete the model ID to which you want to register the metric>>

BASE_URL =  f"https://platform.aporia.com/api/v1/{ACCOUNT}/{WORKSPACE}"
BASE_METRICS_URL = f"{BASE_URL}/metrics"

API_KEY = <<complete your API key>>
AUTH_HEADERS = {"Authorization": f"Bearer {API_KEY}"}

# First we read the code we prepared for our metric
with open('my_metric.py') as f:
	METRIC_CODE = f.read()

# Then we register it to the relevant Aporia model
metric_creation_body = {
    "model_id": MODEL_ID,
    "name": "my cool metric",
    "code": METRIC_CODE
}	
CREATE_METRIC_EP = f"{BASE_METRICS_URL}/code-based-metrics"
response = requests.post(
    url=CREATE_METRIC_EP,
    json=metric_creation_body,
    headers={"Authorization": f"Bearer {API_KEY}"}
)

# We'll use the metric ID later in order to test it
if (response.status_code == HTTPStatus.OK):
	metric_id = response.json().get('id')
	print(f"Successfully created metric, id: {metric_id}")

Testing your metric

Once you have your metric registered, it is time to test it. Testing a code-based metric can be performed on a dataset of your choice. Below you will find example code for testing your metric on the latest version's serving dataset:

MODEL_VERSIONS_EP = f'{BASE_URL}/model-versions'

# Select which version I want to use for the test
model_version_params = {"model_id" : MODEL_ID}
response = requests.get(
	MODEL_VERSIONS_EP,
	params=model_version_params,
	headers=AUTH_HEADERS
)
if response.status_code != HTTPStatus.OK:
    raise Exception(f"Failed getting model versions, error: {response.status_code}")

# We will use the last version returned, but you can choose a different one
versions = response.json()
dataset_id = versions[-1].get('serving_dataset').get('id')

# Test the metric to make sure it works
validate_metric_ep = f"{BASE_METRICS_URL}/code-based-metrics/validate"
body = {"metric_id" : metric_id, "dataset_id": dataset_id}
response = requests.post(
    url=validate_metric_ep,
    json=body,
    headers=AUTH_HEADERS
)

while HTTPStatus.OK == response.status_code and "pending" == response.json().get('status'):
	print(f"{response.json().get('progress')}% of metric validation task is completed")
	
	response = requests.post(
	    url=validate_metric_ep,
	    json=body,
	    headers=AUTH_HEADERS
	)

print(response.status_code)
print(response.json())

Supported 3rd party libraries

  • pyspark

  • pyspark.sql

  • pyspark.sql.functions

  • snowflake

  • snowflake.snowpark

  • snowflake.snowpark.functions

  • numpy

  • numpy.core._methods

  • pandas

  • math

  • scipy

  • scipy.stats

  • statsmodels

  • statsmodels.stats.proportion

You can further explore all available code-based metrics features via REST API in our docs, .

πŸ”‘
here