| |||||||||
The birthday paradox states that if there are 23 people in a room then there is a slightly more than 50/50 chance that at least two of them will have the same birthday. For 60 or more people, the probability is greater than 99%. This is not a paradox in the sense of it leading to a logical contradiction; it is a paradox in the sense that it is a mathematical truth that contradicts common intuition. Most people estimate that the chance is much lower.
Calculating this probability (and related ones) is the birthday problem. The theory behind it was described in the American Mathematical Monthly in 1938 in leap years and twins, and assume that the 365 possible birthdays are equally likely. The trick is to first calculate the probability that the n birthdays are different. This probability is given by
because the second person cannot have the same birthday as the first (364/365), the third cannot have the same birthday as the first two (363/365), etc. Using factorial notation, this expression can also be written as
Now, 1 - p is the probability that at least two persons have the same birthday. For n = 23 one obtains a probability of about 0.507.
By contrast, the probability that someone in a room of n other people has the same birthday as you is given by
which for n = 22 gives only about 0.059, and would need n to be at least 253 to give a value over 0.5.
The birthday paradox in its more generic sense applies to hash functions: the number of N-bit hashes that can be generated before probably getting a collision is not 2N, but rather only 2N/2. This is exploited by birthday attacks on cryptographic systems (see cryptographic hash function).
In his autobiography, Paul Halmos wrote:
<\left(\frac{1}{n}\int_0^n\left(1-\frac{x}{365}\right)\,dx\right)^n
=\left(1-\frac{n}{730}\right)^n In the argument of Halmos quoted above, the inequality
<\int_0^n \left(1-{x \over 365}\right)\,dx<math>
is incorrect, as one may readily check. But the argument can be fixed. In the product
the first factor is equal to 1, and may therefore be discarded. Then we have
<\left({1 \over n-1}\sum_{k=1}^{n-1}\left(1-{k \over 365}\right)\right)^{n-1}<math>
<math>
The first inequality above is a case of the inequality of arithmetic and geometric means. The second inequality comes from 1 − x < e−x.
The value of the last expression is less than 1/2 if and only if
where "loge" means the natural logarithm. This falls just barely short of 506, and as luck would have it, 506 is one of the values of n2 − n ; it attains that value when n = 23.
An (non-fatal) error in Halmos' argument (whose general idea is right)