author: niplav, created: 2019-02-08, modified: 2019-11-19, language: english, status: maintenance, importance: 4, confidence: highly likely

Imagine getting up every morning and throwing a coin so often that heads and tails have come up an equal amount of times. How often would you throw the coin on average? Infinitely often, it turns out. Code in Klong.

Equally Many Heads and Tails

This thought experiment can be modeled as a simple expected value calculation in a countably infinite case:

$$\mathbb{E}=\sum_{i=1}^{\infty} p_{i}*x_{i}$$

Finding $x_{i}$

We know that flipping the coin definitely ends when for 2*n coin flips, we have gotten heads n times and tails n times as well. This gives us $x_{i}=2*i$.

Finding $p_{i}$

Finding out $p_{i}$ is a bit harder. In order to find it, it is best to look at the first iterations of the process.

Observing Coin Flips

After starting, we have flipped the coin 2 times, so the possible solutions are:


We are done in two cases, H;T and T;H. p₁ therefore is $\frac{2}{4}=0.5$. We then continue flipping the coin 2 more times, starting with either H;H or T;T, with the following possible results:


Of these, only two finish: H;H;T;T and T;T;H;H. So the chance of finishing given 4 coin flips is $\frac{2}{8}=0.25$.

We continue flipping the coin with the non-finishing sequences, and our possible results look like this:







Here, the sequences H;H;H;T;T;T, H;H;T;H;T;T, T;T;T;H;H;H,T;T;H;T;H;H finish, so given six coin flips, the chance of finishing is $\frac{4}{24}=0.1666666\dots$.

Writing down the next step gets messy, but we have already observed enough iterations to find a pattern in the sequences.

Considerations on Coin Flips

First of all, the number of finishing sequences given 2*n coin flips is the Catalan number Cₙ. In this case, it describes the number of Dyck words of the length 2*n. To quote Wikipedia:

A Dyck word is a string consisting of n X's and n Y's such that no initial segment of the string has more Y's than X's.

Wikipedia, “Dyck word”, 2019

This applies exactly to the given problem. A finishing sequence has equally many heads and tails, but doesn't begin with a sequence of equally many heads and tails. Cₙ is defined as $\frac{1}{n+1}*{2*n \choose n}$.

Catalan numbers can be easily implemented using the binomial coefficient:

bincoeff::{[n k];n::x;k::y;fact(n)%(fact(k)*fact(n-k))}

We call the number of finishing steps given 2*n coin flips fₙ.

One can also see that the total number of possible sequences of coin flips oₙ after 2*n coin flips is $4*(o_{n-1}-f_{n-1})$, because one appends H;H or H;T or T;H or T;T to the remaining number of sequences. So let oₙ be

$$o_{1}=4\\ o_{n}=4*(o_{n-1}-f_{n-1})$$

One can implement $o_{n}$ simply:


When one executes o and f, one notices something peculiar: it seems that $o_{n}=2*n*f_{n}$.

Proof that $o_{n}=2*n*f_{n}$

Induction basis:

$$o_{1}=2*1*f_{1}\\ 4=2*1*2$$

Induction assumption:


Induction step:

$$o_{n+1}=2*(n+1)*f_{n+1}\\ 2*(o_{n}-f_{n})=(n+1)*f_{n+1}\\ 2*(2*n*f_{n}-f_{n})=(n+1)*f_{n+1}\\ 8*n*C_{n}-4*C_{n}=(n+1)*2*C_{n+1}\\ C_{n}*\frac{4*n-2}{n+1}=C_{n+1}$$

This is equivalent to a recursive definition of the Catalan numbers in the OEIS (sixth formula).

Probability of Finishing at $2*n$ Steps

So, what's the probability of finishing given 2*n steps now? Simple: it's $\frac{f_{n}}{o_{n}}=\frac{f_n}{2*n*f_{n}}=\frac{1}{2*n}$.


Note that this probability is different from finishing at 2*n steps: pgn simply tells us how likely it is for us to finish in the next 2 flips if we have flipped the coin 2*(n-1) times steps already. But we are looking for the probability of finishing at 2*n steps.

For this, we can now define $r_{n}$ that tells us the probability of arriving at a sequence with 2*n coin flips. This can only have happened if we arrived at the last step and did not finish there. We define $r_{n}$ recursively:

$$r_{0}=1\\ r_{n}=r_{n-1}-\frac{1}{2*(n-1)}*r_{n-1}$$

Now, defining $p_{n}$ is simple: $p_{n}=\frac{1}{2*n}*r_{n}$.

Final Formula and Final Code

Our final formula for the expected value is thus

$$\mathbb{E}=\sum_{i=1}^{\infty} \frac{1}{2*i}*r_{i}*2*i$$

We could implement $r_{n}$ very easily:


But since we don't want to compute $r_{n}$ every time we execute a new iteration, we simply embed it into the final evaluation function:


x is the number of flips, y is the probability of not finishing after x steps, and z is the expected value. The function calculates the probability of finishing at the current step, prints the current expected value, and passes its updated arguments forward, subtracting y*px from y.

We now call ev:


Here is a sample output, piped through tr , '\t' to make it easier to read:

1   0.5 0
2   0.125   1.0
3   0.0625000000000000001   1.5
4   0.0390624999999999998   1.875
99998   0.00000000892092166024425108    356.819024780555414
99999   0.00000000892078784508119576    356.820808929203776
100000  0.0000000089206540332635195     356.822593068931216
100001  0.00000000892052022479110523    356.824377199737868

The first field is the number of iterations, the second field is the probability of finishing at the given iteration, and the third field is the expected value at the given step.

Proof of Divergence

After watching the output of the code above for a while, one gets the suspicion that the expected value diverges to infinity.

To prove this, let the $\mathbb{E}_n$ for $n \in \mathbb{N}$ be the expected value if we finish throwing the coin after 2*n steps:

$$\mathbb{E}_n=\sum_{i=1}^{n} \frac{1}{2*i}*r_{i}*2*i$$

We now want to show that

$$\lim_{n \to \infty} \mathbb{E_{n}}=\lim_{n \to \infty} \sum_{i=1}^{n} r_{i}=\infty$$

We will show that $r_{n} \ge \frac{1}{n}$, and since we know the harmonic series diverges, we can make a direct comparison test and show that $\sum_{i=1}^{n} r_{i}$ diverges as well.

Let $r_{n}$ be


Proof that $r_n \ge \frac{1}{n}$

Proof by induction.

Induction Basis:

$$r_{0}=1 \ge 1=\frac{1}{1}$$

Induction Assumption:

$$r_n \ge \frac{1}{n}$$

Induction Step:

$$r_{n+1} \ge \frac{1}{n+1}\\ r_{n}-\frac{1}{2*n}*r_{n} \ge \frac{1}{n+1}\\ r_{n}*(1-\frac{1}{2*n}) \ge \frac{1}{n+1}\\ r_{n} \ge \frac{1}{(n+1)*(1-\frac{1}{2*n})}\\ r_{n} \ge \frac{1}{n+\frac{1}{2}-\frac{1}{2*n}}$$

Inserting the assumption:

$$r_{n} \ge \frac{1}{n} \ge \frac{1}{n+\frac{1}{2}-\frac{1}{2*n}}\\ n+\frac{1}{2}-\frac{1}{2*n} \ge n\\ \frac{1}{2} \ge \frac{1}{2*n}\\ n \ge 1$$

Since $n \in \mathbb{N}$, we have therefore proven that

$$\lim_{n \to \infty} \sum_{i=1}^{n} \frac{1}{i}=\infty \le \lim_{n \to \infty} \sum_{i=1}^{n} r_{i}=\infty$$

Further Considerations

This model can be generalized to the average length of a random walk with steps with length 1 in $\mathbb{Z}$ starting at 0 and ending at 0.

The question poses itself whether the found length would still hold in $\mathbb{Z}^n$ ($n \ge 2$) for steps of size 1. It seems obvious that this should be the case, but one shouldn't be too sure of it.