Moments and "mesokurtosis"
Tags: statistics
Many of you may be wondering what “mesokurtosis” is. A greater number of you may be wondering why I’d name a personal website after it. Still another group of readers may have perfectly adequate understandings of the naming that don’t rely on what it denotes. This note is intended for those first two groups. If the existence of that last group is surprising, I’ll eventually write something about exosemantics to clear that up.
Much, but not everything, about a probability distribution is captured in its moments.
Definition. Let \(X\) be some random variable. For \(i\) a non-negative integer, the \(i\)-th moment of \(X\) is \(\mu_i := {\mathbf{E}}[X^i]\).
The first moment, conventionally called the “mean”, indicates location. The second moment, “variance”, indicates scale. The third moment is related to “skewness” and indicates asymmetry about the mean. The fourth moment is related to what we’re interested in, “kurtosis”. It’s not immediately obvious what it indicates, but a little reflection reveals that the fourth moment is contaminated by the mean and the variance. Shift the location away from the origin or stretch out the scale, and \(\mu_4\) will increase, even if the distribution “looks the same”. We proceed to fix this problem.
Definition. The \(i\)-th standardized moment of \(X\) is \(\beta_{i-2} := \mu_i / \sigma^i\), where \(\sigma := \mu_2^{1/2}\) is the standard deviation of \(X\).
The first and second standardized moments are boring and tell us nothing about \(X\), except maybe whether it is degenerate (\(\sigma = 0\)) or not, depending on technical details I fudged in the definitions. But the third and fourth (and higher) standardized moments have the advantage that simply adjusting scale and location leave them unchanged. We can go right ahead and call \(\beta_1\) the skewness. But we make things more complicated for the fourth moment.
Definition. The kurtosis or excess kurtosis of \(X\) is \(\gamma_2 := \beta_2 - 3\).
One can imagine the terminological confusion that ensued as “excess kurtosis” was worn away by use back into “kurtosis”. And how did “excess” find its way into the name in the first place? A(ny) normal distribution has \(\beta_2 = 3\).
Reckoned against the normal, some distribution have positive kurtosis and are called “leptokurtic”, e.g., the Cauchy and Poisson distributions. Others have negative kurtosis and are called “platykurtic”, e.g., the uniform distribution. The leftovers that fall into neither of these two classes are the “mesokurtic” distributions. (Quick exercise to the reader: name one such distribution!) If your answer to the exercise was a Bernoulli with success probability \(p = 1/2 \pm (1/12)^{1/2}\), or something even more exotic, congratulations on showing that mesokurtosis is a property bestowed not just on normal distributions. From these examples, the meaning of kurtosis might informally look like how much (lepto-) or little (platy-) “outliers” contribute to variance versus “usual” values.
Explaining why mesokurtosis appeals to me to the point that I used it in an act of semi-self-definition (picking it as the domain name for this website) is difficult. Maybe it’s the compelling similarity to Norbert Wiener’s claim that
The best material model of a cat is another, or preferably the same, cat.
Kurtosis is a generally useful notation yet it is exactly on its most central case where it reduces to tautology.
P.S. I promised connections to moment-generating functions and some cool, new, non-standard notation. To be continued\(\ldots\).