Metrics Glossary

Here you can find information about all the performance metrics supported by Aporia.

Can't find what you are looking for? 😯 We are constantly expanding our metrics support, but in the meantime you can always define your own custom metric 🙌.

Statistic metrics

Missing Count

This metric counts the amount of records that didn't report a specific field while logging the data. It can be useful for surfacing data pipeline or infrastructure problems that may affect your model.

Average

This metric calculates the average value of the given data. It can be applied on any numeric field.

Minimum

This metric finds the minimal value out of the given data. It can be applied on ant numeric field.

Maximum

This metric finds the maximal value out of the given data. It can be applied on ant numeric field.

Sum

This metric calculates the sum of all values of the given data. It can be applied on any numeric field.

Variance

Variance is the expectation of the squared deviation of a random variable from its sample mean.

For sample variables, it is calculated using the following formula:

Var(x) = \frac{\sum{(x_i-\mu)^2}}{n-1}

Standard Deviation (STD)

The standard deviation is a statistical metric that measures the amount of variation or dispersion of a set of values.

STD is calculated using the following formula:

\sigma = \sqrt{\frac{\sum{(x_i-\mu)^2}}{N}}

Performance metrics

Mean Squared Error (MSE)

Mean squared error is an estimator which measures the average squared difference between the estimated value and the actual value. MSE is calculated using the following formula:

MSE = \frac{1}{n}\sum_{i=1}^{n}(y_i-x_i)^2

Root Mean Squared Error (RMSE)

Root mean squared error is the root of MSE. RMSE is calculated using the following formula:

RMSE = \sqrt{\sum_{i=1}^n\frac{(y_i - x_i)^2}{n}}

Mean Absolute Error (MAE)

Mean absolute error is an estimator which measures the average absolute difference between the estimated value and the actual value. MAE is calculated using the following formula:

MAE = \frac{\sum_{i-1}^{n} |y_i - x_i|}{n}

Confusion matrix

True Positive Count (TP)

This metric measures the amount of correctly predicted to be positive for a specific characteristic. It is commonly used in classification problems.

True Negative Count (TN)

This metric measures the amount of correctly predicted to be negative for a specific characteristic. It is commonly used in classification problems.

False Positive Count (FP)

This metric measures the amount of incorrectly predicted to be positive for a specific characteristic. It is commonly used in classification problems.

False Negative Count (FN)

This metric measures the amount of incorrectly predicted to be negative for a specific characteristic. It is commonly used in classification problems.

Precision

This metric measures the percentage of our correctly predicted positive for a specific class, out of all of the positive predictions. The higher score we get, the more concise our classification is.

Precision is useful to measure when the cost of a False Positive is high. For example, let's say that your model predicts whether an email is spam (positive) or not (negative). The cost of classifying an email as spam when it's not (FP) is high so we would like to monitor that our model's precision score remains high to avoid bad business impact.

Precision is calculated using the following formula:

Precision = \frac{TP}{TP + FP}

Recall

This metric measures the percentage of our correctly predicted positive for a specific class, out of all the actual positives. The higher score we get, the fewer positives we missed.

Recall is useful to measure when the cost of a False Negative is high. For example, let's say that your model predicts whether a certain seller is a fraud (positive) or not(negative). The cost of miss detecting the fraud seller (FN) is high so we would like to monitor that our model's recall score remains high to avoid bad business impact.

Recall is calculated using the following formula:

Recall = \frac{TP}{TP + FN}

Accuracy

This metric measures the percentage of our correct predictions out of all the predictions. The higher score we get, the "closer to reality" our classifications are.

Accuracy is useful when we have a balanced class distribution and we want to give more weight to the business value of the TP and TN.

Accuracy is calculated using the following formula:

Accuracy = \frac{TP + TN}{TP + TN + FP + FN}

F1

This metric is trying to balance between the precision and the recall metrics. It fits when we have an uneven class distribution and we want to give more weight to the business cost of the FP and FN.

F1 is calculated using the following formula:

F1 = 2\cdot\frac{Precision\times Recall}{Precision + Recall}

Normalized Discounted Commutative Gain (nDCG)

This metric measures the quality of ranking.

Using the DCG metric we assume two things: First, an object with high relevance will produce more gain if it gets a higher rank. Second, that giving the same rank objects with higher relevance will produce more gain.

DCG is calculated using the following formula:

DCG_p = \sum_{i=1}^{p}\frac{2^{rel_i}-1}{log_2(i-1)}

Where RELi is the list of top i objects ordered by their rank.

The normalized version of the metric (nDCG) gives you the ability to compare between two rankings of different lengths.

nDCG is calculated using the following formula:

nDCG = \frac{DCG_p}{IDCG_p}

where IDCG is the ideal DCG calculated by:

DCG_p = \sum_{i=1}^{REL_p}\frac{2^{rel_i}-1}{log_2(i-1)}

Statistical distances

Jensen-Shannon Distance (J-S Distance)

The Jensen-Shannon distance measures the similarity between two probability distributions. It is a symmetric version of the Kullback-Leibler divergence, which is a measure of how different one distribution is from another. The Jensen-Shannon distance is given by:

JSD(P,Q) = \frac{1}{2}(D_{KL}(P || M) + D_{KL}(Q || M))

where $P$ and $Q$ are the probability distributions being compared, $D_{KL}$ is the Kullback-Leibler divergence, and $M = \frac{1}{2}(P+Q)$ is the midpoint distribution.

Population Stability Index (PSI)

Population Stability Index (PSI) is a measure of the stability of the distribution of a variable over two different populations. It is commonly used in credit risk modeling and fraud detection. PSI is calculated as follows:

PSI = \sum_{i=1}^k(O_i - E_i)ln \frac{O_i}{E_i}

where $O_i$ is the observed proportion of a variable in a given population, $E_i$ is the expected proportion of the same variable in a reference population, and $k$ is the number of categories or bins used to group the variable.

Hellinger Distance

The Hellinger distance is a measure of the similarity between two probability distributions. It is closely related to the Bhattacharyya distance, but has the advantage of being bounded between 0 and 1. The Hellinger distance is given by:

H(P,Q)=\sqrt{\frac{1}{2}\sum_{i=1}^{n}(\sqrt{p_i}-\sqrt{q_i})^2}

where $P$ and $Q$ are the two probability distributions being compared, $p_i$ and $q_i$ are the probabilities of the $i$ th event under $P$ and $Q$ respectively, and $n$ is the number of possible events.

Kolmogorov-Smirnov Distance

The Kolmogorov-Smirnov distance is a measure of the difference between two probability distributions. It is based on the Kolmogorov-Smirnov test, which tests whether two samples come from the same underlying distribution. The distance is given by:

F_{KS}=sup_x |F(x)-G(x)|

where $F$ and $G$ are the cumulative distribution functions of the two distributions being compared, and $sup$ denotes the supremum, or the least upper bound, of the set of values inside the brackets.

PreviousCode-Based Metrics NextRelease Notes 2024

Last updated 2 years ago