Genetic Drift and Singapore Pools

3 minute read

png

Over a year-end dinner gathering with friends, someone mentioned that there is a Singapore Pools (betting) outlet right outside Changi prison. Besides the irony of gambling vices tempting newly released inmates, the same person said that the outlet has poor business because it wasnt one of the “旺” location — Singaporeans like to bet at outlets which had a history of winning. But someone else pointed out that if there are very few people betting at that outlet in the first place due to its remote location, the outlet is further unlikely to gain a winner. A death spiral. Outlets which were lucky in a past draw more crowd in the next round and thus further increasing their chances of a winner. This is Matthew’s effect in play.

I replied that this situation was a Chinese Restaurant process. For those who do not know, the Chinese Restaurant process goes like this: A series of independent customers enters a chinese restaurant. The first sits at a table. The second may either join the first table or form a new table. The probability of a customer joining a table is proportional to the number of people already at that table. Tables with a lot of people are more like to attract the new customer, so similarly outlets with a lot of customers attract more customers. I was happy to appear as the geek of the group, until I …

… realised I was wrong.

The Singapore Pools Outlet scenario is NOT a chinese restaurant process. The first issue I silently realised was that in the Chinese restaurant Process, there was a variable number of tables. But in the Singapore Pools scenario, there was a fixed number of outlets. There are many other differences as well which made me drop the idea of a Chinese Restaurant Process.

After silently thinking of a while, my mental model converged onto a “multinomial resampling process”. (I didnt know of the name “multinomial resampling” until I looked out chatGPT.) During every round, all punters have the same multinomial probability of choosing each outlet. This distribution was defined by the number of punters choosing each outlet in the previous round.

Simple enough. But I struggled to nail down the dynamics of this process. Given a uniform initial probability, sampling variance due to a finite punter population $N$ means that the probability distribution in the next round will differ. But at the same time, if there was a very large population $N$, the probability distribution would be almost the same as the initial.

I wondered if there were any surprising “phase transitions (aka discontinuities)” in this statistical Singapore Pools problem. I recalled in undergrad Physics, in a simple statistical Ising Model, a surprising phase transition appeared at the macro level when at the micro level there were no specific thresholds. If initially there was a dominantly popular Singapore Pool outlet and the punter population was small, then the probability distribution is sure to collapse to a Kronecker delta. But perhaps if the initial Singapore Pools outlet preference was uniform and there was a sufficiently large population, then the probability distribution will never collapse?

I admit, I asked ChatGPT. The answer was that no matter how large the population is, given sufficiently long enough time the probability will collapse to 1. With an infinitely large population, the drift in the outlet preference distribution in the M-simplex is continuous. Given an infinite time, the drift will terminate at one of the corners of the simplex.

This infinite population limit is studied academically as the “Wright-Fisher Distribution” which appears in the context of population genetics. Just like Singapore Pools outlet, the prevalence of a gene in the next generation is dependent on the prevalence of the gene in the previous generation. But fear not,this doesnt mean that in the long term human genetic diversity will collapse. There is still genetic mutations constantly creating diversity. But for species with a small population, genetic diversity collapse is certainly a real risk.

Multinomial Resampling Simulator

I wrote a python simulation to study the dynamics of the Multinomial Resampling Process. But to show it on this blog, I got chatGPT to convert the python into a javascript demo. The simulation will track two measures of how "concentrated" a multinomial distribution is: Entropy and the collision probability. The simulation will also show three different noise regimes. The smaller the ratio between the population and number of outlets, the greater the "sampling noise" and the faster the probability collapse.

Share on

X Facebook LinkedIn Bluesky

Ping Liang

Genetic Drift and Singapore Pools

Multinomial Resampling Simulator

Share on

You may also enjoy

Thresholds in Error Correction Codes

Upgrading to macOS 26 and the Post-Quantum Era

From Sequitur to STUMPY: The Evolution of Motif Discovery

Precise estimation of a tone’s frequency