The Birthday Paradox and the Football Enigma

As sure as eggs is eggs, every person reading this is one year older than they were last year.
We all have birthdays. And this is the 1st birthday for this blog.

And to begin with I’d like to talk about one of the first statistical nuances that blew my mind (a little bit at least) – the Birthday Paradox.

It's quite a famous one – but in case you haven’t come across this before I will start by posing the question and come back to the answer at the end.

“If you wanted to be 50% certain that two people in a room shared the same birthday, how many randomly selected people would you need in the room?”

Without giving away too many spoilers, what I always liked about this question is that initially the answer seems completely counter intuitive and quite frankly a bare-faced lie. But some quite simple mathematics unequivocally proves the answer beyond challenge.

And quite often simple statistics, mathematics, analytics, throws up surprising evidence that is hard for people to accept because the message is so far removed from the conventional wisdom that has generally been accepted time after time.

At a recent Data & Analytics away day I posed the question:

“Do you think you can select the outcome of 3 randomly selected football matches?”
(Because who doesn’t love a sports analogy?!?!?)

Across our Data/Research teams we have a plethora of sports enthusiasts, many a folk with a keen interest in soccer, and even a couple of semi-professional footballers. And so the general response was “YES – Yes of course I can, its only 3 games, how difficult can that be?”

But if presented with 3 games from the Lithuanian 3nd Division Southern Zone then individuals’ ‘predictions’ are basically random (apologies to any big fans of FK Kauno Zalgiris B or Marijampole City). And the chances of randomly guessing the outcomes (Home/Draw/Away) of 3 matches is as low as 3.8%

(As a very quick explanation there are 27 possible permutations of outcomes: HomeWin-HomeWin-HomeWin is the most likely at 17% (based on hundreds of thousands of historic results); AwayWinx3 is less than 1%, Home-Away-Draw circa 3%, etc)

But overall you have a 3.8% chance of being correct

The next question was whether SME (Subject Matter Experts) would fare any better. Can the football pundits predict the outcome of 3 matches – not the Lithuanian leagues, but the Premier League. The league they get paid for watching, analysing, and opinionizing week in week out.

Paul Merson (Sky Sports) and Mark Lawrenson (BBC Sport) make weekly predictions on the weekends matches. Selecting any random 3 matches from the last 5+ years – how likely is it that they have predicted all 3 outcomes correctly?

Well, with all their infinite wisdom, there is still only a 13% chance that 3 randomly selected matches from the top tier are predicted correctly! Crazy right?

Let us use some analytics instead. These days bookmakers create odds/probabilities of outcomes based on complex (ish) mathematical models, where this is usually a ‘favourite outcome’.

I’ll jump straight to the 90th minute here – the favourite outcome only happens circa 60% of the time in football matches (other sports like Tennis; Basketball; NFL this occurs 80%+).
So 3 random match outcomes being correct with analytics alone is still only 22% (60% x 60% x 60%). This is 6 times better than randomly guessing and almost twice as good as SME, but still 22% is pretty unreliable.

Using some stratified analytics (selecting the strongest favourites) boosts this to around 42%, but it is the combination of analytics AND SME that starts to return some favourable figures.

Evidence suggests analytical models with human adjustment and interpretation can predict individual outcomes 80-90% of the time – giving a chance of 3 correctly called outcomes now hitting 61% (a whopping 16.4 times better than random).

(These adjustments can take any shape, eg, “I know Manchester Utd are missing key players from injury”, or even a gut-feeling – “Tottenham just conceded 7 to Bayern Munich so I don’t trust them” – plus elements of superstition – “I’ve selected West Ham in a weekly prediction competition with friends, they’re now certain to lose!”)

It is this combination of analytics and SME that we are embedding into our work. Our view (and I think I have said this before here) is that ‘Analytics is always to augment human decision making, not replace it’. As Data/Research teams our job is to make the most effective evidence base from everything that is available to us, and use our judgement and that of others, to optimise decision making and service delivery.

We’ve had a number of successes over the course of the year doing just this.

From expanding our evidence base through partnership collaborations in ecda and launching an Open Data platform enabling others to enhance their own access to intelligence, to LARIA award winning research into libraries and a new consultation portal (Citizen Space) to gather the views of the SMEs of the county (the public) – we have an extra years worth of insight, and so rather than saying we are a year ‘older’, lets say we are a year ‘wiser’.

(Basically so that next time someone asks me my age, I can say “I am 37 years wise” as opposed to “37 years old” which uses the dreaded ‘O’ word)

So how many random people are needed in a room for there to be a 50% chance of two people sharing the same birthday?

A mere 23

And whilst that may sound wrong, all I can say is…
‘You do the math(s)’

(Or Ask Jeeves, whatever – I need to go blow out some birthday candles)

Share this page

Leave a comment

We only ask for your email address so we know you're a real person