Freedom in De Bruijn sequences
A correspondent asked me not long ago if I knew anything about De Bruijn sequences. Unlike the indicies of the same eponym, which I encountered in programming language theory contexts so pure that it not do to sully the proceedings with named variables, these were new to me. They turn out to be a versatile and interesting combinatorial object. Taking Wikipedia’s definition to be as good as any:
In combinatorial mathematics, a \(k\)-ary De Bruijn sequence \(B(k,n)\) of order \(n\) […] is a cyclic sequence of a given alphabet \(A\) with size \(k\) for which every possible subsequence of length \(n\) in \(A\) appears as a sequence of consecutive characters exactly once.
For example, a binary De Bruijn sequence of order 2, \(B(2,2)\), is “0011” (with cyclic equivalents “0110”, “1100”, and “1001”). It seems convenient to introduce the variable \(n' = k^n\) for the length of the De Bruijn sequence \(B(k,n)\). A formula for the number of De Bruijn sequences is known:\(\log \#(B(k,n))=n' \log(k!) / k - n \log k\). Applying this to the example gives \(\#(B(2,2)) = 1\), which shows that “0011” is the unique binary De Bruijn sequence of order 2.
(Exercise to the reader: Investigate \(B(2,3)\). What symmetry do these De Bruijn sequences have that is not modded out in the counting formula?)
The correspondent was interested in methods for constructing De Bruijn sequences for a “pretty dull” application. But unrelated to the application, I found it interesting to consider how many De Bruijn sequences there are, not just absolutely as given by the formula above but relative to other similar objects, like \(k\)-ary sequence of length \(n'\) or \(k\)-ary sequences of length \(n'\) with equal counts of all \(k\) characters in the alphabet.
(Exercise to the reader: convince yourself that these fractional counts both go to 0 as \(n \to \infty\).)
The De Bruijn sequences have less freedom, under the above sense, in their construction than arbitrary sequences on the same alphabet or sequences that match unigram frequencies. I’d expect this to hold true for any fixed \(m\)-gram frequencies as well.
For a less trivial statement, note that \((1/n') \log \#(B(k,n)) \to \log(k!) / k\) as \(n \to \infty\). In particular, for \(k = 2\), this limit is 0.5, assuming base-2 logarithms. This suggests (assuming lots of dynamical systems technical detail)1 that the binary De Bruijn sequence has an entropy rate of 0.5 bits. That feels like a lot of information for what would essentially be a side-channel in a rotary encoding application.
I wish I had the time and endurance to develop this topic in the spirit of , say, Russel Norvig’s TSP notebook, but I’ll need to leave it as a half-baked idea for now.
See “De Bruijn graphs and entropy at finite scales” for different questions and approaches in a similar setting.↩