Metrics Glossary
Last updated
Last updated
Here you can find information about all the performance metrics supported by Aporia.
Can't find what you are looking for? We are constantly expanding our metrics support, but in the meantime you can always define your own custom metric .
This metric counts the amount of records that didn't report a specific field while logging the data. It can be useful for surfacing data pipeline or infrastructure problems that may affect your model.
This metric calculates the average value of the given data. It can be applied on any numeric field.
This metric finds the minimal value out of the given data. It can be applied on ant numeric field.
This metric finds the maximal value out of the given data. It can be applied on ant numeric field.
This metric calculates the sum of all values of the given data. It can be applied on any numeric field.
Variance is the expectation of the squared deviation of a random variable from its sample mean.
For sample variables, it is calculated using the following formula:
The standard deviation is a statistical metric that measures the amount of variation or dispersion of a set of values.
STD is calculated using the following formula:
Mean squared error is an estimator which measures the average squared difference between the estimated value and the actual value. MSE is calculated using the following formula:
Root mean squared error is the root of MSE. RMSE is calculated using the following formula:
Mean absolute error is an estimator which measures the average absolute difference between the estimated value and the actual value. MAE is calculated using the following formula:
This metric measures the amount of correctly predicted to be positive for a specific characteristic. It is commonly used in classification problems.
This metric measures the amount of correctly predicted to be negative for a specific characteristic. It is commonly used in classification problems.
This metric measures the amount of incorrectly predicted to be positive for a specific characteristic. It is commonly used in classification problems.
This metric measures the amount of incorrectly predicted to be negative for a specific characteristic. It is commonly used in classification problems.
This metric measures the percentage of our correctly predicted positive for a specific class, out of all of the positive predictions. The higher score we get, the more concise our classification is.
Precision is useful to measure when the cost of a False Positive is high. For example, let's say that your model predicts whether an email is spam (positive) or not (negative). The cost of classifying an email as spam when it's not (FP) is high so we would like to monitor that our model's precision score remains high to avoid bad business impact.
Precision is calculated using the following formula:
This metric measures the percentage of our correctly predicted positive for a specific class, out of all the actual positives. The higher score we get, the fewer positives we missed.
Recall is useful to measure when the cost of a False Negative is high. For example, let's say that your model predicts whether a certain seller is a fraud (positive) or not(negative). The cost of miss detecting the fraud seller (FN) is high so we would like to monitor that our model's recall score remains high to avoid bad business impact.
Recall is calculated using the following formula:
This metric measures the percentage of our correct predictions out of all the predictions. The higher score we get, the "closer to reality" our classifications are.
Accuracy is useful when we have a balanced class distribution and we want to give more weight to the business value of the TP and TN.
Accuracy is calculated using the following formula:
This metric is trying to balance between the precision and the recall metrics. It fits when we have an uneven class distribution and we want to give more weight to the business cost of the FP and FN.
F1 is calculated using the following formula:
This metric measures the quality of ranking.
Using the DCG metric we assume two things: First, an object with high relevance will produce more gain if it gets a higher rank. Second, that giving the same rank objects with higher relevance will produce more gain.
DCG is calculated using the following formula:
Where RELi is the list of top i objects ordered by their rank.
The normalized version of the metric (nDCG) gives you the ability to compare between two rankings of different lengths.
nDCG is calculated using the following formula:
where IDCG is the ideal DCG calculated by:
The Jensen-Shannon distance measures the similarity between two probability distributions. It is a symmetric version of the Kullback-Leibler divergence, which is a measure of how different one distribution is from another. The Jensen-Shannon distance is given by:
where and are the probability distributions being compared, is the Kullback-Leibler divergence, and is the midpoint distribution.
Population Stability Index (PSI) is a measure of the stability of the distribution of a variable over two different populations. It is commonly used in credit risk modeling and fraud detection. PSI is calculated as follows:
where is the observed proportion of a variable in a given population, is the expected proportion of the same variable in a reference population, and is the number of categories or bins used to group the variable.
The Hellinger distance is a measure of the similarity between two probability distributions. It is closely related to the Bhattacharyya distance, but has the advantage of being bounded between 0 and 1. The Hellinger distance is given by:
where and are the two probability distributions being compared, and are the probabilities of the th event under and respectively, and is the number of possible events.
The Kolmogorov-Smirnov distance is a measure of the difference between two probability distributions. It is based on the Kolmogorov-Smirnov test, which tests whether two samples come from the same underlying distribution. The distance is given by:
where and are the cumulative distribution functions of the two distributions being compared, and denotes the supremum, or the least upper bound, of the set of values inside the brackets.