Code-Based Metrics
Code-based metrics allow users to define Pyspark-based metrics that allow for computation on raw data, element-wise operations, and support third-party libraries.
In the following guide we will explain how one can use code-based metrics in Aporia to gain higher flexibility on the metric’s calculation.
Building the metric code
A code-based metric in Aporia gets a Pyspark data frame as an input and should return a numeric value/NaN as an output. Similar to custom metrics, code-based metrics are defined for a specific model and can be used with all versions/datasets/segments of that model.
Let's take a look at the following example:
Supported libraries can be found below.
Code-based metrics are calculated at the same frequency of all other calculation jobs as specified by your model's aggregation period. The code-based metric will be calculated on the following data frames:
all data over your model's retention period (you can filter this data to a specific time period)
all segments (separately) over your model's retention period (you can filter this data to a specific time period)
Performance wise, it is best practice to perform the calculation on top of the Pyspark data frame rather than collecting it first using df.collect()
Registering your metric
Once you have your metric ready, you can register it to the relevant Aporia model. Below you will find example code to help you get started:
Testing your metric
Once you have your metric registered, it is time to test it. Testing a code-based metric can be performed on a dataset of your choice. Below you will find example code for testing your metric on the latest version's serving dataset:
Supported 3rd party libraries
pyspark
pyspark.sql
pyspark.sql.functions
snowflake
snowflake.snowpark
snowflake.snowpark.functions
numpy
numpy.core._methods
pandas
math
scipy
scipy.stats
statsmodels
statsmodels.stats.proportion
You can further explore all available code-based metrics features via REST API in our docs, here.
Last updated