Stats miscellany (2)

Posted on 14 Apr 2015
Tags: statistics, probabilistic-programming

Here are some lukewarm takes on recent events in statistics and associated domains. At risk of jinxing the endeavor, I’ll try to make this a semi-regular feature.

Probabilistic programming

Probabilistic programming has been in the zeitgeist recently, with Office 365’s pessimistically named Clutter feature arguably being “the first large-scale commercial use of this innovative programming paradigm”. In fact, you might as well check out @beaucronin’s tweets from roughly 2015-04-09 through 2015-04-13 for the flavor of the field (also, h/t for the preceding link).

I’m not very familiar with Infer.NET, Microsoft’s framework on which Clutter is built. I’ve paid a bit more attention to the various sub-projects under the aegis of The MIT Probabilistic Programming Project1 and to Andrew Gelman (et al.’s!) Stan.

The common idea explored by these languages is the combination of the declarative paradigm with randomness. If you were trying to understand the Boy or Girl paradox, you might specify random variables for each child’s gender and also the observation that at least one of them is a boy. Then the language should be able to correctly report the probability that both children are boys2 when queried.

Following my initial feeling of elation after encountering probabilistic programming for the first time (“why should I bother writing functions to sample random variables, when I can just write objects that are random variables and have the language do all the work‽”), I’ve mostly just been left confused. In vague terms, the whole point of these languages is to compute conditional distributions. In particular, the distribution of an model parameter of interest conditioned on the observed data, or in other words, the posterior distribution. The biggest challenge in doing this is that it’s impossible, in general.

As is often the case in computing, this is not as bad a stumbling block as it sounds. In situations where we know inference is tractable, specifying our problem in probabilistic programming terms shouldn’t necessarily make things worse. But it does mean—in the absence of a sufficiently smart compiler or interpreter—that the user will need to tell the language how to proceed in the form of some imperative hints. Still, even if this seems theoretically inelegant or disappointing, it seems like it might work well enough. In settings where computation is cheap but data scientists are expensive, the prospect of coding serious computer vision tasks in “less than 50 lines” is compelling.

How to think

I’m trying to figure out whether to dive into David Chapman’s Meaningness. The metablog post How To Think Real Good is an engaging mix of philosophical professional autobiography, aphorisms, and musings like

My answer to “If not Bayesianism, then what?” is: all of human intellectual effort. Figuring out how things work, what’s true or false, what’s effective or useless, is “human complete.” In other words, it’s unboundedly difficult, and every human intellectual faculty must be brought to bear.

But a (randomly chosen) page in the book itself has material that looks more like

It is very difficult to say anything about what instrumental music means—even when you are sure it is highly meaningful.


From the abstract of “Micro-randomized trials in mHealth” (Liao et al., 2015):

In “just-in-time” mobile interventions, treatments are provided via a mobile device that are intended to help an individual make healthy decisions “in the moment,” and thus have a proximal, near future impact. […] In this paper, we propose a […] micro-randomized trial, treatments are sequentially randomized throughout the conduct of the study, with the result that each participant may be randomized at the 100s or 1000s of occasions at which a treatment might be provided.

I’m not particularly impressed by one of their key modeling choices—defining the effect of treatment as an expectation conditioned on “in the moment” availability for treatment rather than as an expectation restricted to availability—but it’s still cool to see how the analysis plays out.

On the theoretical side, I liked the trick presented in “Early Stopping is Nonparametric Variational Inference” (Maclaurin, Duvenaud, and Adams, 2015) of using the Hessian to approximately evolve the variational entropy in parallel with the variational energy, as the latter evolves under stochastic gradient descent.

  1. In this case “attention” means playing around with a buggy early version of Church and implementing a chunk of Keith Bonawitz’s BLAISE virtual machine in Haskell. I highly recommend the former. The latter was a silly idea but was a valuable exercise in doing surgery on trees without mutation.

  2. 1/3.