Custom Metric Definition Language

Last updated 2 years ago

Custom Metric Definition Language

In Aporia, custom metrics are defined using syntax that is similar to python's.

There are three building blocks which can be used in order to create a custom metric expression:

Constants - a numeric value (e.g. 2, 0.5, ..)
Functions - out of the builtin function collection you can find below (e.g. sum, count, ...). All those functions return a numeric value.
Binary operation - +, -, *, /, **. Operands can be both constants or function calls.

Builtin Functions

Before we dive into each of the supported function, there are two general concepts you should be familiar with regarding all functions - field expressions and data segment filters.

Field Expressions

A field expression can be described in the following format:

<field_category>.<field_name>[<segment filter>]

Field category is one of the following: features / raw_inputs / predictions / actuals. Note that you can only use categories which you defined in you schema while . In addition, don't forget that predictions and actuals categories have the same field names.

The segment filter is optional, for further information about the filters read the section below.

Data Segment Filters

Data segment filters are boolean expressions, designed to reduce to a specific data segment the field on which we perform the function.

Each boolean condition in a segment filter is a comparison between a field and a constant value. For example:

[features.Driving_License == True] // will filter out records in which Driving_License != True
[raw_inputs.Age <= 35]             // will only include records in which Age <= 35

Conditions can be combined using and / or and all fields can be checked for missing values using is None / is not None.

The following describe the supported combinations:

Type / Operation

Boolean

True/False

✖️

Categorical

numeric constants

✖️

String

numeric constants

✖️

Numeric

numeric constants

The table cells indicates the type we can compare to.

Examples

// Average annual premium of those with a driving license
sum(features.Annual_Premium[features.Driving_License == True]) / prediction_count()

// Three time number of prediction of those who are under 35 years old and live in CA
prediction_count(raw_inputs.Age <= 35 and raw_inputs.Region_Code == 28) * 3

prediction_count(features.Age > 27) / (sum(features.Annual_Premium) + sum(features.Vintage))

Supported functions

accuracy

Parameters

prediction: prediction field
label: label field
threshold: numeric. Probability threshold according to which we decide the if a class is positive
filter: the filter we want to apply on the records before calculating the metric

actuals_count

Parameters

No parameters needed, cannot apply filters on this metric.

actuals_ratio

Parameters

No parameters needed, cannot apply filters on this metric.

auc_roc

Parameters

prediction: prediction probability field
label: label field
filter: the filter we want to apply on the records before calculating the metric

count

Parameters

No parameters needed, cannot apply filters on this metric.

f1_score

Parameters

prediction: prediction probability field
label: label field
threshold: numeric. Probability threshold according to which we decide the if a class is positive
average: the average strategy (micro / macro / weighted)
top_k: consider only top-k items.
filter: the filter we want to apply on the records before calculating the metric

fn_count

Parameters

prediction: prediction probability field
label: label field
threshold: numeric. Probability threshold according to which we decide the if a class is positive
filter: the filter we want to apply on the records before calculating the metric

fp_count

Parameters

prediction: prediction probability field
label: label field
threshold: numeric. Probability threshold according to which we decide the if a class is positive
filter: the filter we want to apply on the records before calculating the metric

fp_rate

Parameters

prediction: prediction probability field
label: label field
threshold: numeric. Probability threshold according to which we decide the if a class is positive
filter: the filter we want to apply on the records before calculating the metric

logloss

Parameters

prediction: prediction field
label: label field
filter: the filter we want to apply on the records before calculating the metric

mae

Parameters

prediction: prediction field
label: label field
filter: the filter we want to apply on the records before calculating the metric

mape

Parameters

prediction: prediction field
label: label field
filter: the filter we want to apply on the records before calculating the metric

max

Parameters

field: numeric or dict. The field for which the metric will be computed. Can be of any category (feature / raw_input / prediction / actual)
filter: the filter we want to apply on the records before calculating the metric
keys: keys to filter in when field type is dict.

mean

Parameters

field: numeric or dict. The field for which the metric will be computed. Can be of any category (feature / raw_input / prediction / actual)
filter: the filter we want to apply on the records before calculating the metric
keys: keys to filter in when field type is dict.

median

Parameters

field: numeric or dict. The field for which the metric will be computed. Can be of any category (feature / raw_input / prediction / actual)
filter: the filter we want to apply on the records before calculating the metric
keys: keys to filter in when field type is dict.

min

Parameters

field: numeric or dict. The field for which the metric will be computed. Can be of any category (feature / raw_input / prediction / actual)
filter: the filter we want to apply on the records before calculating the metric
keys: keys to filter in when field type is dict.

miss_rate

Parameters

prediction: prediction probability field
label: label field
threshold: numeric. Probability threshold according to which we decide the if a class is positive
filter: the filter we want to apply on the records before calculating the metric

missing_count

Parameters

field: the field for which the metric will be computed. Can be of any category (feature / raw_input / prediction / actual)
filter: the filter we want to apply on the records before calculating the metric

missing_ratio

Parameters

field: the field for which the metric will be computed. Can be of any category (feature / raw_input / prediction / actual)
filter: the filter we want to apply on the records before calculating the metric

mse

Parameters

prediction: prediction field
label: label field
filter: the filter we want to apply on the records before calculating the metric

ndcg

Parameters

prediction: prediction field
label: label field
rank: the rank position
filter: the filter we want to apply on the records before calculating the metric

not_missing_count

Parameters

field: the field for which the metric will be computed. Can be of any category (feature / raw_input / prediction / actual)
filter: the filter we want to apply on the records before calculating the metric

precision_score

Parameters

prediction: prediction probability field
label: label field
threshold: numeric. Probability threshold according to which we decide the if a class is positive
average: the average strategy (micro / macro / weighted)
top_k: consider only top-k items.
filter: the filter we want to apply on the records before calculating the metric

recall_score

Parameters

prediction: prediction probability field
label: label field
threshold: numeric. Probability threshold according to which we decide the if a class is positive
average: the average strategy (micro / macro / weighted)
top_k: consider only top-k items.
filter: the filter we want to apply on the records before calculating the metric

rmse

Parameters

prediction: prediction field
label: label field
filter: the filter we want to apply on the records before calculating the metric

specificity

Parameters

prediction: prediction probability field
label: label field
threshold: numeric. Probability threshold according to which we decide the if a class is positive
filter: the filter we want to apply on the records before calculating the metric

std

Parameters

field: numeric or dict. The field for which the metric will be computed. Can be of any category (feature / raw_input / prediction / actual)
filter: the filter we want to apply on the records before calculating the metric
keys: keys to filter in when field type is dict

sum

Parameters

field: numeric or dict. The field for which the metric will be computed. Can be of any category (feature / raw_input / prediction / actual)
filter: the filter we want to apply on the records before calculating the metric
keys: keys to filter in when field type is dict

tn_count

Parameters

prediction: prediction probability field
label: label field
threshold: numeric. Probability threshold according to which we decide the if a class is positive
filter: the filter we want to apply on the records before calculating the metric

tp_count

Parameters

prediction: prediction probability field
label: label field
threshold: numeric. Probability threshold according to which we decide the if a class is positive
filter: the filter we want to apply on the records before calculating the metric

unique_count

Parameters

field: numeric or dict. The field for which the metric will be computed. Can be of any category (feature / raw_input / prediction / actual)
filter: the filter we want to apply on the records before calculating the metric
keys: keys to filter in when field type is dict.

value_count

Parameters

field: the field for which the metric will be computed. Can be of any category (feature / raw_input / prediction / actual)
value: the value we want to count
filter: the filter we want to apply on the records before calculating the metric
keys: keys to filter in when field type is dict.

value_percentage

Parameters

field: the field for which the metric will be computed. Can be of any category (feature / raw_input / prediction / actual)
value: the value we want to count
filter: the filter we want to apply on the records before calculating the metric
keys: keys to filter in when field type is dict.

variance

Parameters

field: numeric or dict. The field for which the metric will be computed. Can be of any category (feature / raw_input / prediction / actual)
filter: the filter we want to apply on the records before calculating the metric
keys: keys to filter in when field type is dict.

wape

Parameters

prediction: prediction field
label: label field
filter: the filter we want to apply on the records before calculating the metric

PreviousBodywork NextREST API

Last updated 2 years ago

In Aporia, custom metrics are defined using syntax that is similar to python's.

There are three building blocks which can be used in order to create a custom metric expression:

Constants - a numeric value (e.g. 2, 0.5, ..)
Functions - out of the builtin function collection you can find below (e.g. sum, count, ...). All those functions return a numeric value.
Binary operation - +, -, *, /, **. Operands can be both constants or function calls.

Builtin Functions

Before we dive into each of the supported function, there are two general concepts you should be familiar with regarding all functions - field expressions and data segment filters.

Field Expressions

A field expression can be described in the following format:

<field_category>.<field_name>[<segment filter>]

The segment filter is optional, for further information about the filters read the section below.

Data Segment Filters

Data segment filters are boolean expressions, designed to reduce to a specific data segment the field on which we perform the function.

Each boolean condition in a segment filter is a comparison between a field and a constant value. For example:

[features.Driving_License == True] // will filter out records in which Driving_License != True
[raw_inputs.Age <= 35]             // will only include records in which Age <= 35

Conditions can be combined using and / or and all fields can be checked for missing values using is None / is not None.

The following describe the supported combinations:

Type / Operation

Boolean

True/False

✖️

Categorical

numeric constants

✖️

String

numeric constants

✖️

Numeric

numeric constants

The table cells indicates the type we can compare to.

Examples

// Average annual premium of those with a driving license
sum(features.Annual_Premium[features.Driving_License == True]) / prediction_count()

// Three time number of prediction of those who are under 35 years old and live in CA
prediction_count(raw_inputs.Age <= 35 and raw_inputs.Region_Code == 28) * 3

prediction_count(features.Age > 27) / (sum(features.Annual_Premium) + sum(features.Vintage))

Supported functions

accuracy

Parameters

prediction: prediction field
label: label field
threshold: numeric. Probability threshold according to which we decide the if a class is positive
filter: the filter we want to apply on the records before calculating the metric

actuals_count

Parameters

No parameters needed, cannot apply filters on this metric.

actuals_ratio

Parameters

No parameters needed, cannot apply filters on this metric.

auc_roc

Parameters

prediction: prediction probability field
label: label field
filter: the filter we want to apply on the records before calculating the metric

count

Parameters

No parameters needed, cannot apply filters on this metric.

f1_score

Parameters

prediction: prediction probability field
label: label field
threshold: numeric. Probability threshold according to which we decide the if a class is positive
average: the average strategy (micro / macro / weighted)
top_k: consider only top-k items.
filter: the filter we want to apply on the records before calculating the metric

fn_count

Parameters

prediction: prediction probability field
label: label field
threshold: numeric. Probability threshold according to which we decide the if a class is positive
filter: the filter we want to apply on the records before calculating the metric

fp_count

Parameters

prediction: prediction probability field
label: label field
threshold: numeric. Probability threshold according to which we decide the if a class is positive
filter: the filter we want to apply on the records before calculating the metric

fp_rate

Parameters

prediction: prediction probability field
label: label field
threshold: numeric. Probability threshold according to which we decide the if a class is positive
filter: the filter we want to apply on the records before calculating the metric

logloss

Parameters

prediction: prediction field
label: label field
filter: the filter we want to apply on the records before calculating the metric

mae

Parameters

prediction: prediction field
label: label field
filter: the filter we want to apply on the records before calculating the metric

mape

Parameters

prediction: prediction field
label: label field
filter: the filter we want to apply on the records before calculating the metric

max

Parameters

field: numeric or dict. The field for which the metric will be computed. Can be of any category (feature / raw_input / prediction / actual)
filter: the filter we want to apply on the records before calculating the metric
keys: keys to filter in when field type is dict.

mean

Parameters

field: numeric or dict. The field for which the metric will be computed. Can be of any category (feature / raw_input / prediction / actual)
filter: the filter we want to apply on the records before calculating the metric
keys: keys to filter in when field type is dict.

median

Parameters

field: numeric or dict. The field for which the metric will be computed. Can be of any category (feature / raw_input / prediction / actual)
filter: the filter we want to apply on the records before calculating the metric
keys: keys to filter in when field type is dict.

min

Parameters

field: numeric or dict. The field for which the metric will be computed. Can be of any category (feature / raw_input / prediction / actual)
filter: the filter we want to apply on the records before calculating the metric
keys: keys to filter in when field type is dict.

miss_rate

Parameters

prediction: prediction probability field
label: label field
threshold: numeric. Probability threshold according to which we decide the if a class is positive
filter: the filter we want to apply on the records before calculating the metric

missing_count

Parameters

field: the field for which the metric will be computed. Can be of any category (feature / raw_input / prediction / actual)
filter: the filter we want to apply on the records before calculating the metric

missing_ratio

Parameters

field: the field for which the metric will be computed. Can be of any category (feature / raw_input / prediction / actual)
filter: the filter we want to apply on the records before calculating the metric

mse

Parameters

prediction: prediction field
label: label field
filter: the filter we want to apply on the records before calculating the metric

ndcg

Parameters

prediction: prediction field
label: label field
rank: the rank position
filter: the filter we want to apply on the records before calculating the metric

not_missing_count

Parameters

field: the field for which the metric will be computed. Can be of any category (feature / raw_input / prediction / actual)
filter: the filter we want to apply on the records before calculating the metric

precision_score

Parameters

prediction: prediction probability field
label: label field
threshold: numeric. Probability threshold according to which we decide the if a class is positive
average: the average strategy (micro / macro / weighted)
top_k: consider only top-k items.
filter: the filter we want to apply on the records before calculating the metric

recall_score

Parameters

prediction: prediction probability field
label: label field
threshold: numeric. Probability threshold according to which we decide the if a class is positive
average: the average strategy (micro / macro / weighted)
top_k: consider only top-k items.
filter: the filter we want to apply on the records before calculating the metric

rmse

Parameters

prediction: prediction field
label: label field
filter: the filter we want to apply on the records before calculating the metric

specificity

Parameters

prediction: prediction probability field
label: label field
threshold: numeric. Probability threshold according to which we decide the if a class is positive
filter: the filter we want to apply on the records before calculating the metric

std

Parameters

field: numeric or dict. The field for which the metric will be computed. Can be of any category (feature / raw_input / prediction / actual)
filter: the filter we want to apply on the records before calculating the metric
keys: keys to filter in when field type is dict

sum

Parameters

field: numeric or dict. The field for which the metric will be computed. Can be of any category (feature / raw_input / prediction / actual)
filter: the filter we want to apply on the records before calculating the metric
keys: keys to filter in when field type is dict

tn_count

Parameters

prediction: prediction probability field
label: label field
threshold: numeric. Probability threshold according to which we decide the if a class is positive
filter: the filter we want to apply on the records before calculating the metric

tp_count

Parameters

prediction: prediction probability field
label: label field
threshold: numeric. Probability threshold according to which we decide the if a class is positive
filter: the filter we want to apply on the records before calculating the metric

unique_count

Parameters

field: numeric or dict. The field for which the metric will be computed. Can be of any category (feature / raw_input / prediction / actual)
filter: the filter we want to apply on the records before calculating the metric
keys: keys to filter in when field type is dict.

value_count

Parameters

field: the field for which the metric will be computed. Can be of any category (feature / raw_input / prediction / actual)
value: the value we want to count
filter: the filter we want to apply on the records before calculating the metric
keys: keys to filter in when field type is dict.

value_percentage

Parameters

field: the field for which the metric will be computed. Can be of any category (feature / raw_input / prediction / actual)
value: the value we want to count
filter: the filter we want to apply on the records before calculating the metric
keys: keys to filter in when field type is dict.

variance

Parameters

field: numeric or dict. The field for which the metric will be computed. Can be of any category (feature / raw_input / prediction / actual)
filter: the filter we want to apply on the records before calculating the metric
keys: keys to filter in when field type is dict.

wape

Parameters

prediction: prediction field
label: label field
filter: the filter we want to apply on the records before calculating the metric