This was the hardest question I got in an interview.
Consider there are cereal boxes of a top brand containing one of four different types of toys with equal probability. What is the expected value of the number of cereal boxes which a person will need to buy in order to complete the collection of 4 toys.
An n-sided fair die is rolled again and again until all the numbers come up. On average how many times does one need to roll the die until all the numbers come up?
Suppose that at a certain point of this process we notice that of the n numbers c have come up and r are still remaining and we begin to wonder on average how many times does one need to roll the die from that point in the process until all the remaining numbers come up. So, the question we will now address is:
Assuming that of the n numbers of the die c have already come up and r are remaining, on average how many times does one need to roll the fair die until all the remaining r numbers come up?
Let E(n,c,r) denote the expected number of rolls needed until all the remaining r numbers come up.
As we roll the die at this point (i.e., the point (n,c,r)), what is the probability that a new number (i.e., one of the r remaining numbers) comes up? Obviously the answer is r/n. Similarly the probability that an old number (i.e., one of the c numbers that have come up previously) will come up is c/n. If a new number comes up, then from the new point (n,c+1,r-1) in the process, there will remain on average E(n,c+1,r-1) number of rolls before all the (remaining) numbers come up. If an old number comes up, then we are in no better position and from the new but 'unchanged' point (n,c,r) in the process, there will (again) remain on average E(n,c,r) number of rolls before all the (remaining) numbers come up.
So viewing the situation from the point (n, c, r), we have:
E(n,c,r) = 1 + (r/n)*E(n,c+1,r-1) + (c/n)*E(n,c,r). (**1**)
Since c+r=n and n is constant in the entire process, then we may consider E as a function of r only.
So we write equation (**1**) as:
E(r) = 1 + (r/n)*E(r-1) + (c/n)*E(r), which when simplified (using c=n-r) we get:
E(r) = (n/r) + E(r-1). (**2**)
It is obvious that E(0)=0 and that our goal is to compute E(n).
Solving (**2**) iteratively we get:
E(n) = n*{1+(1/2)+(1/3)+(1/4)+...+(1/n)}.