Mesokurtosis - Time is a circle (1/4)

Posted on 2 Apr 2015
Tags: statistics, meta, r

A lot of my analytics data is straightforward to analyze. It’s not that I can go on autopilot, but most of it can be understood as counting hits in predetermined bins (e.g., for regions, 114 from California, 10 from Ontario, 0 from Donets’ka oblast, etc.). There are two big exceptions. Traffic by date can be understood as a time series and analyzed by standard tools. But traffic by time is a tricky blend of these two situations. It can be sorted into some number of bins (say, one for each hour of a day) but it still has the ordered character of a time series. Except this is not even quite right. It is natural and correct to say that 05:00 UTC comes after 04:00 UTC, but if this argument is repeated twenty-four times, it compels us to accept that 05:00 UTC comes after itself.

It’s like in this universe, we process time linearly forward. But outside […] our spacetime would look flattened, like a single sculpture with matter in a superposition of every place it ever occupied, our sentience just cycling through our lives like carts on a track. […] It’s a circle.¹

So something is weird about time. What is to be done about this?

Before going further, here’s the traffic by time data binned to the hour, current as of today:²

[("00+0000'", 24), ("01+0000'", 40), ("02+0000'", 97),
("03+0000'", 79), ("04+0000'", 71), ("05+0000'", 48),
("06+0000'", 22), ("07+0000'", 19), ("08+0000'", 24),
("09+0000'", 20), ("10+0000'", 11), ("11+0000'", 23),
("12+0000'", 23), ("13+0000'", 46), ("14+0000'", 31),
("15+0000'", 46), ("16+0000'", 60), ("17+0000'", 24),
("18+0000'", 33), ("19+0000'", 66), ("20+0000'", 41),
("21+0000'", 22), ("22+0000'", 39), ("23+0000'", 31)]

It’s uncontroversial to say that the modal number of hits occur during the hour of 02:00 UTC (10:00 PM in my local time) and that the fewest hits occur during the hour of 10:00 UTC (6:00 AM locally), during which there are about 11% (≈11/97) as many hits as during the maximum. Crudely, this seems like enough information to determine a location and a scale, if there were just a family of distributions appropriate to these weird circular data.

It turns out that there are a number of such families, under the name of circular distributions in the subfield of directional statistics. Tomorrow, I’ll describe how one of these, the von Mises (–Fisher) circular distribution, can be applied to the problem at hand.

True Detective, season 1, episode 5.↩

If you’re following this in R, you can get this in a data frame with:

times <- 0:23
hits <- c(24,40,97,79,71,48,22,19,24,20,11,23,23,46,31,46,60,24,33,66,41,22,39,31)
dat <- data.frame(times=times, hits=hits)

↩