These problems (in the mathematical sense that any question is called a problem in mathematics) or paradoxes (in the sense that something counter intuitive or surprising is a paradox).
The Birthday Paradox is that in a group of 23 people, there is a 50% chance that there are at least 2 people with the same birthday (Day and Month). For some people this is counter-intutive or paradoxical, because it seems too small.
In this and the following pardox, we assume there are 365 birthdays each year (we ignore 29 February) and that each birthday has the same probabiliey, 1/365 (whereas some birthdays are actually more common than others.)
The My Birthday Paradox is that to get a 50% chance that a group member has the same birthday as a given birthday (Day and Month), for instance, a group member shares my birthday, the group size must be 253. This seems a large figure to some because we might expect it to be 365/2.
These problems could be modelled using a jar containing marbles numbered 1 to 365, each equally likely to be selected. We would require that either the jar is extremely large (infinite number of marbles), or that we sample and replace our sample marble and randomise those in the jar (shake the jar each time).
The Birthday Problem in particular has relevance in many areas of activity, where matches can be obtained far more often than would be expected from a very large number of items. For instance, dvd repeating songs, viruses, and other security matters.
The words 'problem' and 'paradox' are used interchangeably on this page.
My Birthday Problem: Minimum sample size for 50% chance of one other person shares a specific birthday is 253
Birthday Paradox: Minimum sample size for a 50% chance of a match between any two birthdays is 23
My Birthday Problem formula: for a case where there are N possible values (for instance, 365 birthdays), all equally likely, the minimum group size, r, for a match between a specific value (for instance, my birthday) and another is when the required probability of a match is p (for instance, 50%) is:
My Birthday Problem: For a probability p of a match for N different items is r, when N is large (approximate formula):
My Birthday Problem: Approximate formula for 50% chance of a match:
Birthday Problem: Minimum group size, r, for a probability p of a match, when there are N equally likely outcomes is (approximate formula, N is large):
Birthday Problem or Paradox: minimum group size r, for N different equally likely events with a probability of a match, p, is:
Birthday Problem (or Paradox), approximate formula for 50% match:
We can compute the minimum size of group so that we have a 50% chance that one member shares a given birthday, for instance another member shares my birthday. Let us call this probability p. Because computing p directly is difficult, we will compute the converse probability, (1−p), which is the probability that no-one shares my birthday. We can easily compute this, (1−p), by computing the probability that none shares my birthday. The probability that another does not share my birthday is 364/365 because there are 364 other birthdays and only one that matches mine.
That is we need to pair a single birthday (day of the year) with several other birthdays until we reach a 50% probability that we find a match. That is, if X is 'my birthday' then we compare X with A, B, C, D, etc, where these represent a birthday.
The probability that two people do not share my birthday is 364/365·364/365, because the two probabilities are independent. The probability that n people do not share my birthday is (364/365)n, and we require this to equal or exceed 0.5
Therefore, using p as the probability of a match, we require:
p≥0.5, and so we require,
So, re-arranging and grouping similar items gives us:
Note that as n→∞, (364/365)n →0, because 0<364/365<1, so the bigger n, the smaller (364/365)n
Setting p as 0.5 and so (1−0.5)=0.5, we require:
n=253 (next integer) ■
Hence, I need to meet 253 people chosen at random to be at least 50% sure that at least one other will have the same birthday as me.
In the above calculation, I have not made any approximations (except for rounding). This contrasts with some of the other techniques on this page, where I have used approximations.
We can understand why the figure is greater than 365/2 when we realise that the random people might not have different birthdays, and the Birthday Paradox, discussed next, tells us that there most likely will be duplicate (pairs) of birthdays in the random sample. By our assumption, there are 365 different birthdays, but a random sample of birthdays might not, therefore, contain different birthdays.
For instance, suppose I had a jar containing 365 marbles each numbered 1 to 365, and I had a marble, say, number 120 and wanted to match it. I take out one marble at a time and compare it with 120, and I do not replace the marble. Clearly, after 183 pairings I have a 50% chance of a match: the matching marble is just as likely to be in the first 183 marbles as in the next. (Actually, 0.5014:0.4986, to 4 decimal places).
If I replace each marble after seeking a match, then the situation is different because each marble will no longer always be different from the other tries due to selecting some marbles again by chance. To get 50%, I have to try more marbles than 183, and, as we calculated, I need to select 253.
In the My Birthday Paradox, we needed to compare a single value with each of 253 other values, making 253 possible matches or pairs. In the Birthday Paradox, we need the same number of pairs but we do not care which birthdays match. That is, in the My Birthday Paradox, we seek a match between a specific birthday, X, say 23 of June, with other birthdays, A, B, C, etc. Any matches between these other birthdays works against us. If A and B match, which they will with a probability of 1/365, the probability that they also match X, is 1/3652. So, it is more likely that if they match each other they do not also match X.
In contrast, in the Birthday Problem, these matches between different members are all counted. This is why the group size is so much smaller.
The number of possible matches (or the sample space for this problem) in a group of n people is nC2 , the number of ways to select 2 from n, and we require this value to be 253. Hence:
We require from 2.01:
Because n must be positive in this example, we have n=23. That is, the smallest group size with a 50% chance of having at least one matched birthday is 23 people. In this section our only approximations are arithmetic roundings.
We mentioned above that if all the birthdays being compared were different, we would need 365/2 comparisons for a better than 50% chance of a match. To achieve this, we seek a number n, which after subtracting the probable pairs, and adding the possible triples, we obtain 365/2. So, we require:
182.5=n−nC2/365 + nC3/3652−nC4/3653 [3.01]
We are seeking a number, n, which is the minimum number of comparisons that will give us a 50% or better chance of finding at least one match. Because n contains matching pairs, making some of the n randomly selected birthdays the same, we need to subtract these pairs. Of course, sometimes, these pairs will also match our required birthday, but these will then become triples. Triples are less likely than pairs (the probability of a pair is 1/365, and the probability of a triple is 1/ 3652 ) but we need to add these back. Finally, we need to subtract the groups of 4 that also match, because they are more likely to match each other than our required birthday value.
Of course, depending on how big n is, we may have to continue using the inclusion-exclusion principle until the values are small enough to ignore. With this example, we will consider only as far as groups of 3 (because larger groups challenge calculator accuracy). We will achieve an approximate answer only, but seek to demonstrate how the final answer is based on pairs, triples, etc.
nC3/3652 =n(n-1)(n-2)/(3!·3652) [3.03]
We can assemble these in our equation [3.01]
182.5≈n−n(n-1)/(2!·365)+n(n-1)(n-2)/(3!·3652) [3.04] (This is only approximately correct because we have omitted the higher terms)
By simplifying and equating to 0, we obtain the following equation:
We could solve this equation by hand. Fortunately, however, even cheap calculators can solve this equation, and we obtain:
n=251.80 (2 decimals), the other values are complex.
This value is less than our earlier calculation, giving 252 instead of 253 (the correct answer), because we have approximated.
In this section we illustrate how the value for the minimum number for the My Birthday Problem is higher than a rough guess of 365/2.
By assuming the correct answer (253, calculated earler), we can use this knowledge to add and subtract, as appropriate the various possible combinations. These combinations alternately increase the probability and decrease it.
|182.5||182.5||This is the expected answer, if there were no duplicates|
|+253C2/365||87.3370||269.8370||We add the pairs, because most of these reduce the probability of a match.|
|−253C3/3652||−20.0197||249.8173||We subtract the triples, which help us.|
|+253C4/3653||3.42803||253.2453||Add on the groups of 4|
|− 253C5/3654||−0.4677||252.7776||Subtract the 5's|
Add the 6's.
|−253C7/3656||−0.0051||252.8255||We might go on, but stop here, because the value is small (we are looking for the nearest integer value.)|
Therefore we require 253 people in a group (or randomly selected) to produce a match 50% of the time.
In a group of r people, we wish to calculate the probability that at least one match in dates will occur with a 50% probability. For each member of the group, we can compare with other members. It is easier to calculate the converse probability in this case.
For a given person, the probability that his or her birthday will not match another is 364/365. If we compare this two with another member, the probability that this new member's birthday will not match is 363/365. The probability that none of them will match is 364/365·363/365.
The probability of a match is 1-364/365·363/365.
For r people, the probability that their birthdays match with a probability of p is when:
Rearranging this, and dividing out the factors (because we have an approximation in mind):
We note the following approximations when x is small, for each of the factors:
Substituting 4.03 in 4.02:
Noting that the numerators form an arithmetic series, with a sum, as below:
Using 4.05 with 4.04, we can add up the powers, to obtain the product, remembering this is a sum of negative number:
Taking logs, base e:
Multiplying throughout by -1:
We require that p≥0.5, or the converse,
Multiplying by -1 throughout and reversing the inequality:
Hence, as −ln(1-p) is equal to the left-hand-side of [4.07a], the left-hand-side is required to be greater than −ln(1−0.5), so
Multiplying throughout by 730:
Multiplying out and putting into the normal form for solution :
The only positive solution of [4.11] is r=23, so the minimum number of people required for a 50% probability of a match is 23. The same as before, even though we have used an approximation in [4.03].
This is a generalisation of My Birthday Paradox, for N different items, and a required probability of a match, p.
The minimum number of items, r, which are required to be selected at random from N equally likely and different items (such as birthdays), from an infinite source, or a finite source, with replacement, in order to achieve a required probability p (such as 50%) that there is as least one match, is given by:
Because the probability that a given item does not match a selected item is (N-1)/N, and for r selections it is (N-1)r/N, by assuming that the selections are independent, and multiplying the probabilities.
Making r the subject of the equation, and taking the least value, we obtain an accurate value for r:
We can derive an approximate formula for the My Birthday Problem from [5.04]
First, we note that by dividing by N, the denominator of [5.04], above, it becomes:
Where x=1/N into the right-hand-side of [6.01]
Substituting [6.02] in [5.04], we obtain the approximate formula:
For a probability p of a match from N different items is r, where:
We recall, of course, that 0≤p≤1, so ln(1-p) is negative. The right-hand-side of [6.04] is, therefore a positive number, as required (because r must be a positive integer).
Therefore, we would take the next integer value for n, or ceiling(r).
When p=0.5 (50% chance of a match), we can substitute p=0.5 in [6.04], so
For instance, when N=365, r=365·ln(2)=252.9987 (4 decimals), so r=253, the same as the accurate calculation above.
We wish to know the minimum number of selections, r, from an infinite source, or a finite source, with replacement, where each item occurs with a probability of 1/N (or there are N different equally likely items), with a likelihood of selecting at least one match of p.
Above, we required that the number of pairings in our sample r is equal to the number of pairings required in My Birthday Paradox. That is, we equate the number of ways we can choose 2 from r, with . Hence we have:
Because the denominator in [7.01] is:
Substituting [6.02] in [7.01], gives:
Multiplying throughout by 2 and multiplying out the brackets:
Solving the equation for our required value (the real positive one)
Dividing the numerator of [7.04] by 2:
For the Birthday Paradox, N=365, p=0.5, so formula [7.05] gives:
=23 (4 decimal places)
From 7.05, we can ignore the 1/2 and the 1/4 in the root sign, to give us:
We recall, of course, that 0≤p≤1, so ln(1-p) is negative, because (1-p)≤1, and e must have a negative power to be less than 1 (e0=1). The right-hand-side of [6.04] is, therefore a positive number, as required (because r must be a positive integer, not a complex number).
So we would take the next integer up for r, that is the ceiling of [7.07]
When p=0.5, the approximate formula is:
Because, substituting p=0.5 in [7.07], we obtain:
The −2 times the log raises its number to the power −2:
Using the fact that −2 squares the number (0.5, in this case) and makes it a reciprocal
Because 1/0.25=4, we obtain [7.08]:
Ken Ward's Mathematics Pages