Continuous Ranked Probability Score (CRPS)

Notebook-as-a-book illustration








By Joannes Vermorel, June 2016

Probabilistic forecasts assign a probability to every possible future. Yet, all probabilistic forecasts are not equally accurate, and metrics are needed to assess the respective accuracy of distinct probabilistic forecasts. Simple accuracy metrics such as MAE (Mean Absolute Error) or MAPE (Mean Absolute Percentage Error) are not directly applicable to probabilistic forecasts. The Continuous Ranked Probability Score (CRPS) generalizes the MAE to the case of probabilistic forecasts. Along with the cross entropy, the CPRS is one of the most widely used accuracy metrics where probabilistic forecasts are involved.

Overview

The CRPS is frequently used in order to assess the respective accuracy of two probabilistic forecasting models. In particular, this metric can be combined with a backtesting process in order to stabilize the accuracy assessment by leveraging multiple measurements over the same dataset.

This metric notably differs from simpler metrics such as MAE because of its asymmetric expression: while the forecasts are probabilistic, the observations are deterministic. Unlike the pinball loss function, the CPRS does not focus on any specific point of the probability distribution, but considers the distribution of the forecasts as a whole.

Formal definition

Let $X$ be a random variable.

Let $F$ be the cumulative distribution function (CDF) of $X$, such as $F(y)=\mathbf{P}\left[X \leq y\right]$.

Let $x$ be the observation, and $F$ the CDF associated with an empirical probabilistic forecast.

The CRPS between $x$ and $F$ is defined as: $$CRPS(F, x) = \int_{-\infty}^{\infty}\Big(F(y)- 𝟙(y - x)\Big)^2dy$$ where $𝟙$ is the Heaviside step function and denotes a step function along the real line that attains:

  • the value of 1 if the real argument is positive or zero,
  • the value of 0 otherwise.

The CRPS is expressed in the same unit as the observed variable. The CRPS generalizes the mean absolute error; in fact, it reduces to the mean absolute error (MAE) if the forecast is deterministic.

Known properties

Gneiting and Raftery (2004) show that the continuous ranked probability score can be equivalently written as: $$CRPS(F,x) = \mathbf{E}\Big[|X-x|\Big]-\frac{1}{2}\mathbf{E}\Big[|X-X^*|\Big]$$ where
  • $X$ and $X^*$ are independent copies of a linear random variable,
  • $X$ is the random variable associated with the cumulative distribution function $F$,
  • $\mathbf{E}[X]$ is the expected value of $X$.

Numerical evaluation

From a numerical perspective, a simple way of computing CPRS consists of breaking down the original integral into two integrals on well-chosen boundaries to simplify the Heaviside step function, which gives: $$CRPS(F, x) = \int_{-\infty}^x F(y)^2dy + \int_x^{\infty}\Big(F(y)- 1\Big)^2dy$$ In practice, since $F$ is an empirical distribution obtained through a forecasting model, the corresponding random variable $X$ has a compact support, meaning that there is only a finite number of points where $\mathbf{P}[X = x] \gt 0$. Thus, the integrals can be turned into discrete finite sums.

References

  • Gneiting, T. and Raftery, A. E. (2004). Strictly proper scoring rules, prediction, and estimation. Technical Report no. 463, Department of Statistics, University of Washington, Seattle, Washington, USA.