Metrics Glossary
Here you can find information about all the performance metrics supported by Aporia.
Statistic metrics
Missing Count
This metric counts the amount of records that didn't report a specific field while logging the data. It can be useful for surfacing data pipeline or infrastructure problems that may affect your model.
Average
This metric calculates the average value of the given data. It can be applied on any numeric field.
Minimum
This metric finds the minimal value out of the given data. It can be applied on ant numeric field.
Maximum
This metric finds the maximal value out of the given data. It can be applied on ant numeric field.
Sum
This metric calculates the sum of all values of the given data. It can be applied on any numeric field.
Variance
Variance is the expectation of the squared deviation of a random variable from its sample mean.
For sample variables, it is calculated using the following formula:
Standard Deviation (STD)
The standard deviation is a statistical metric that measures the amount of variation or dispersion of a set of values.
STD is calculated using the following formula:
Performance metrics
Mean Squared Error (MSE)
Mean squared error is an estimator which measures the average squared difference between the estimated value and the actual value. MSE is calculated using the following formula:
Root Mean Squared Error (RMSE)
Root mean squared error is the root of MSE. RMSE is calculated using the following formula:
Mean Absolute Error (MAE)
Mean absolute error is an estimator which measures the average absolute difference between the estimated value and the actual value. MAE is calculated using the following formula:
Confusion matrix
True Positive Count (TP)
This metric measures the amount of correctly predicted to be positive for a specific characteristic. It is commonly used in classification problems.
True Negative Count (TN)
This metric measures the amount of correctly predicted to be negative for a specific characteristic. It is commonly used in classification problems.
False Positive Count (FP)
This metric measures the amount of incorrectly predicted to be positive for a specific characteristic. It is commonly used in classification problems.
False Negative Count (FN)
This metric measures the amount of incorrectly predicted to be negative for a specific characteristic. It is commonly used in classification problems.
Precision
This metric measures the percentage of our correctly predicted positive for a specific class, out of all of the positive predictions. The higher score we get, the more concise our classification is.
Precision is useful to measure when the cost of a False Positive is high. For example, let's say that your model predicts whether an email is spam (positive) or not (negative). The cost of classifying an email as spam when it's not (FP) is high so we would like to monitor that our model's precision score remains high to avoid bad business impact.
Precision is calculated using the following formula:
Recall
This metric measures the percentage of our correctly predicted positive for a specific class, out of all the actual positives. The higher score we get, the fewer positives we missed.
Recall is useful to measure when the cost of a False Negative is high. For example, let's say that your model predicts whether a certain seller is a fraud (positive) or not(negative). The cost of miss detecting the fraud seller (FN) is high so we would like to monitor that our model's recall score remains high to avoid bad business impact.
Recall is calculated using the following formula:
Accuracy
This metric measures the percentage of our correct predictions out of all the predictions. The higher score we get, the "closer to reality" our classifications are.
Accuracy is useful when we have a balanced class distribution and we want to give more weight to the business value of the TP and TN.
Accuracy is calculated using the following formula:
F1
This metric is trying to balance between the precision and the recall metrics. It fits when we have an uneven class distribution and we want to give more weight to the business cost of the FP and FN.
F1 is calculated using the following formula:
Normalized Discounted Commutative Gain (nDCG)
This metric measures the quality of ranking.
Using the DCG metric we assume two things: First, an object with high relevance will produce more gain if it gets a higher rank. Second, that giving the same rank objects with higher relevance will produce more gain.
DCG is calculated using the following formula:
Where RELi is the list of top i objects ordered by their rank.
The normalized version of the metric (nDCG) gives you the ability to compare between two rankings of different lengths.
nDCG is calculated using the following formula:
where IDCG is the ideal DCG calculated by:
Statistical distances
Jensen-Shannon Distance (J-S Distance)
The Jensen-Shannon distance measures the similarity between two probability distributions. It is a symmetric version of the Kullback-Leibler divergence, which is a measure of how different one distribution is from another. The Jensen-Shannon distance is given by:
Population Stability Index (PSI)
Population Stability Index (PSI) is a measure of the stability of the distribution of a variable over two different populations. It is commonly used in credit risk modeling and fraud detection. PSI is calculated as follows:
Hellinger Distance
The Hellinger distance is a measure of the similarity between two probability distributions. It is closely related to the Bhattacharyya distance, but has the advantage of being bounded between 0 and 1. The Hellinger distance is given by:
Kolmogorov-Smirnov Distance
The Kolmogorov-Smirnov distance is a measure of the difference between two probability distributions. It is based on the Kolmogorov-Smirnov test, which tests whether two samples come from the same underlying distribution. The distance is given by:
Last updated