Staring down the wrong end of the telescope

After writing about how many Panini stickers collectors should expect to buy to fill a book I’ve had a fair few comments about it. Greg Newman brought John Crace’s article in The Guardian to my attention where he talks about “the four-yearly great Panini conspiracy theory.” The conspiracy being that Panini don’t distribute the stickers evenly, so you have to buy even more of them to complete your set.

As evidence, he cites Chris Taylor, whose “album is now about two-thirds full and I’ve already ended up with a whole load of Lee Young-Pyos, Hameur Bouazzas and Vince Grellas“. He then compares the fact that he has never even seen a Thierre Henri, whereas Chris has got six. ?hich is an opening for swapsies if I ever heard one.

Anyway, what he described looked quite reasonable to me. After all, if you expect to have to buy 4505 stickers then you are going to get rather a lot of some players. So, I decided to do a bit of mathematical modelling and wrote this little bit of Python to do it for me.

The code simulates somebody buying stickers by generating random numbers between 1 and 640, and it keeps on running until every number is picked once. It also keeps a count of how many of each sticker is bought.

When all of the stickers were bought and the album filled up I then went through and counted how many stickers had been opened just the once, how many were opened twice, three times, four times etc.

For good measure I then did it another nine times and took averages. It’s not a great sample but hey, it’ll do for the purposes of this explanation.

So, what were the result then? Well, here is the table I created

First of all, I was pleased that the average number of stickers that were bought was 4500. Within just 10 iterations of the simulation this is already extremely close to the expected number of 4505 that Laurent et Julie corrected my earlier workings out to. It is worth noting that the run with the fewest stickers bought was just 3306 while the most was over double that with 7244.

You can immediately see how even in a small group of friends one person could appear to be a lot luckier than the other.

Now look at the number of duplicates we get. Even on the lowest number of duplicates out of the ten runs there were 2 players that were opened 14 times each. And on the one run there was a player that was opened 23 times!!

Below is the averaged out graph of the 10 runs I did.

A Spread of Duplicates for the Coupon Collector problem where n=640

Which has a really interesting curve as, by the time the last sticker in the book is opened, there are duplicates for nearly every player. In fact, there are only 7 players that there aren’t swaps for.

The number of these multiples rises up to the 90 players that have 6 stickers opened and then it eventually tails off at the end where you have 20 stickers of the player who by now the poor collector must heartily hate the sight of.

And when you look at it like this you do see that the conspiracy theories are just the result of looking at the problem down the wrong end of the telescope. If anything, the examples that John writes about in the article aren’t as extreme as the ones created by the simulation.

I reckon he’s doing alright really.

1 comment

Tim Hobbs says:

June 14, 2010 at 5:37 pm

Great work, in fact more interesting than most of the football so far.

It shows that Panini don’t have to fiddle their distribution, but I wonder about the impact if one card was withheld (say a Henry card were to be only half as likely as the others). Easy to do and could dramatically shift the curve?

1 comment

Leave a Reply Cancel reply