Custom Metric Syntax

In Aporia, custom metrics are defined using syntax that is similar to python's.

There are three building blocks which can be used in order to create a custom metric expression:

  • Constants - a numeric value (e.g. 2, 0.5, ..)

  • Functions - out of the builtin function collection you can find below (e.g. sum, count, ...). All those functions return a numeric value.

  • Binary operation - +, -, *, /, **. Operands can be both constants or function calls.

Builtin Functions

Before we dive into each of the supported functions, let's take a look at a few examples of custom metric definitions.

// Average annual premium of those with a driving license
sum(column="annual_premium") / count()

// Mean predicted probability
mean(column="proba")

// Model revenue
5 * tp_count(column="will_buy_insurance") -2 * fp_count(column="will_buy_insurance")

// nDCG@4 per step
ndcg_at_k(column="p_views", k=4)
ndcg_at_k(column="p_add_to_cart", k=4)
ndcg_at_k(column="p_purchases", k=4)

// accuracy using custom threshold
accuracy(column="proba", type="numeric", threshold=0.2)

Filters within functions

Within Aporia we can always set a segment on our metrics as a whole, but sometimes this is just not enough. Many times we will need to pass a segment of our data to a specific function as part of our metric.

Aporia supports these cases by passing another argument to functions called "filter".

With the "filter" argument you'll be able to set any filtering to the data passed in the "column" argument using the custom segment syntax.

For example:

// Ratio of the annual premium of people above 70 out of the total premium
sum(column="annual_premium", filter="age > 70") / sum(column="annual_premium")

To allow you to set any of your segments upon these metrics as a whole as well, setting a filter within a metric will create behind the scenes, the intersection of the segment within the filter with all of your existing filters. These segments will be counted as any regular segment.

Supported functions

count

Parameters

No parameters needed, the metric will count the total number of unique IDs.

missing_count

Parameters

  • column: the name of the field on which we want to apply the function. Can be numeric field of any group (feature / raw_input / prediction / actual)

missing_ratio

Parameters

  • column: the name of the field on which we want to apply the function. Can be numeric field of any group (feature / raw_input / prediction / actual)

max

Parameters

  • column: the name of the field on which we want to apply the function. Can be numeric field of any group (feature / raw_input / prediction / actual)

min

Parameters

  • column: the name of the field on which we want to apply the function. Can be numeric field of any group (feature / raw_input / prediction / actual)

mean

Parameters

  • column: the name of the field on which we want to apply the function. Can be numeric field of any group (feature / raw_input / prediction / actual)

sum

Parameters

  • column: the name of the field on which we want to apply the function. Can be numeric field of any group (feature / raw_input / prediction / actual)

absolute_error_sum

Parameters

  • column: the name of the prediction field on which we want to apply the function

absolute_sum

Parameters

  • column: the name of the prediction field on which we want to apply the function

mae

Parameters

  • column: the name of the prediction field on which we want to apply the function

mse

Parameters

  • column: the name of the prediction field on which we want to apply the function

rmse

Parameters

  • column: the name of the prediction field on which we want to apply the function

tp_count

Parameters

  • column: the name of the prediction field on which we want to apply the function

  • type: the data type of the prediction field we chose. Can be: "numeric" or "boolean".

  • threshold: probability threshold according to which we decide the if a class is positive. Required for numeric predictions

fp_count

Parameters

  • column: the name of the prediction field on which we want to apply the function

  • type: the data type of the prediction field we chose. Can be: "numeric" or "boolean".

  • threshold: probability threshold according to which we decide the if a class is positive. Required for numeric predictions

tn_count

Parameters

  • column: the name of the prediction field on which we want to apply the function

  • type: the data type of the prediction field we chose. Can be: "numeric" or "boolean".

  • threshold: probability threshold according to which we decide the if a class is positive. Required for numeric predictions

fn_count

Parameters

  • column: the name of the prediction field on which we want to apply the function

  • type: the data type of the prediction field we chose. Can be: "numeric" or "boolean".

  • threshold: probability threshold according to which we decide the if a class is positive. Required for numeric predictions

accuracy

Parameters

  • column: the name of the prediction field on which we want to apply the function

  • type: the data type of the prediction field we chose. Can be: "numeric", "boolean" or "categorical"

  • threshold: probability threshold according to which we decide the if a class is positive. Required for numeric predictions

  • method: will define the average strategy to use. Can be: "macro", "micro" or "weighted". Required for categorical predictions

precision

Parameters

  • column: the name of the prediction field on which we want to apply the function

  • type: the data type of the prediction field we chose. Can be: "numeric", "boolean" or "categorical"

  • threshold: probability threshold according to which we decide the if a class is positive. Required for numeric predictions

  • method: will define the average strategy to use. Can be: "macro", "micro" or "weighted". Required for categorical predictions.

recall

Parameters

  • column: the name of the prediction field on which we want to apply the function

  • type: the data type of the prediction field we chose. Can be: "numeric", "boolean" or "categorical"

  • threshold: probability threshold according to which we decide the if a class is positive. Required for numeric predictions

  • method: will define the average strategy to use. Can be: "macro", "micro" or "weighted". Required for categorical predictions.

f1

Parameters

  • column: the name of the prediction field on which we want to apply the function

  • type: the data type of the prediction field we chose. Can be: "numeric", "boolean" or "categorical"

  • threshold: probability threshold according to which we decide the if a class is positive. Required for numeric predictions

  • method: will define the average strategy to use. Can be: "macro", "micro" or "weighted". Required for categorical predictions.

fp_count_per_class

Parameters

  • column: the name of the prediction field on which we want to apply the function

  • type: the data type of the prediction field we chose. Can be: "categorical" or "array"

  • class_name: the class on which we want to calculate the function

fn_count_per_class

Parameters

  • column: the name of the prediction field on which we want to apply the function

  • type: the data type of the prediction field we chose. Can be: "categorical" or "array"

  • class_name: the class on which we want to calculate the function

tp_count_per_class

Parameters

  • column: the name of the prediction field on which we want to apply the function

  • type: the data type of the prediction field we chose. Can be: "categorical" or "array"

  • class_name: the class on which we want to calculate the function

tn_count_per_class

Parameters

  • column: the name of the prediction field on which we want to apply the function

  • type: the data type of the prediction field we chose. Can be: "categorical" or "array"

  • class_name: the class on which we want to calculate the function

accuracy_at_k

Parameters

  • column: the name of the prediction field on which we want to apply the function

  • k: numeric integer between 1 to 12. Only the top-k items will be considered.

precision_at_k

Parameters

  • column: the name of the prediction field on which we want to apply the function

  • k: numeric integer between 1 to 12. Only the top-k items will be considered.

recall_at_k

Parameters

  • column: the name of the prediction field on which we want to apply the function

  • k: numeric integer between 1 to 12. Only the top-k items will be considered.

ndcg_at_k

Parameters

  • column: the name of the prediction field on which we want to apply the function

  • k: numeric integer between 1 to 12. Only the top-k items will be considered

map_at_k

Parameters

  • column: the name of the prediction field on which we want to apply the function

  • k: numeric integer between 1 to 12. Only the top-k items will be considered

mrr_at_k

Parameters

  • column: the name of the prediction field on which we want to apply the function

  • k: numeric integer between 1 to 12. Only the top-k items will be considered

Last updated