DiscourseFan 3 days ago

Interesting but the paper suffers in certain respects within its methodology by conflating real probabilities vs theoretical probabilities.

Roulette, for instance, is only theoretically 38/1, but in actuality all roulette tables have imperfections such that certain numbers almost always get hit more than others; even certain colors, under extraordinary circumstances.

One could say: well, but isn't this the case for all probabilities? Not so: in the case of the lottery, the spread of numbers people tend to choose may not be so random, but the drawing itself is as close to random as possible. A run on the lottery is very different from a run on a roulette table and a run in baseball, or even a run in elections: there are forces, even if they aren't necessarily measurable, that determine these things and strict probabilistic analysis has no hold on these forces. It's almost certainly the case that a hurricane will hit Florida in September of 2025, even though nobody can precisely predict it, nobody would bet against it. It's just the same way with almost all chance in society, except for that which is already controlled from the outset.

  • notahacker 2 days ago

    > actuality all roulette tables have imperfections such that certain numbers almost always get hit more than others; even certain colors, under extraordinary circumstances

    Seems unlikely these imperfections are enough to shift it significantly from 1/38, based on both the variation in the geometry of roulette tables that's small enough to be non-obvious being tiny in comparison with the variation in croupier action, and the likelihood of casinos noticing any very long run deviation in the size of their edge (which is contingent upon customers hitting the zero pocket(s) with a certain frequency)

fastaguy88 3 days ago

One of the major breakthroughs in Bioinformatics was the recognition that local similarity scores (which can be thought of as runs of positive sequence similarity) are extreme-value distributed.[0] The logic of that discovery uses almost exactly the same mathematical argument as this paper [1], indeed I recognized some of the same equations.

It is difficult to overstate the importance of this discovery for biology, as today, the vast vast majority of protein functional inferences for newly sequenced genomes are based on the statistics of long runs of sequence similarity.

[0] https://www.ncbi.nlm.nih.gov/BLAST/tutorial/Altschul-1.html [1] https://www.pnas.org/doi/epdf/10.1073/pnas.87.6.2264

treetalker 3 days ago

Long streaks, not runners’ long runs (which are also surprisingly predictable).

  • mcswell 3 days ago

    Unless of course a streaker does a long run.

nuancebydefault 4 days ago

I once saw on some website a chart with distribution of flat tire events. Often one does not encounter it in 10 years and suddenly 2 or 3 times in a year. Mathematically, chances of such distribution are quite high.

wenc 3 days ago

This is an interesting finding. There are two takeaways from the paper.

1. The length of streaks L for an independent Bernoulli process with success probability p (with q = 1-p) over n trials can easily be calculated.

L = log_{1/p} (n*q)

2. This estimate becomes more accurate as p decreases. Because the distribution of L is an extreme value distribution which gets more concentrated as p decreases.

This means for low values of p, L becomes more predictable and accurate.

I don’t know how this result will change my life, but at least now I know that I can predict streaks if I know p.

  • jonathan_landy 2 days ago

    First thing it makes we want to do is qualify success rates among individuals. Eg investors. Some are quite successful, but more so relative to what you’d expect give equal randomness?

SoftTalker 2 days ago

Randomness doesn't look random.

  • mannyv 16 hours ago

    I wonder if you can combine this with Benford's law to detect fraud.

    The author says that people who are making up "random" numbers generally won't put in identical sequences. Using that + Bedford = a good way to find faked data. But for this to work you need to understand the probabilities, which would be difficult to do for "natural" distributions?