# hardest question I got in an interview!

#### vkaul

This was the hardest question I got in an interview.
Consider there are cereal boxes of a top brand containing one of four different types of toys with equal probability. What is the expected value of the number of cereal boxes which a person will need to buy in order to complete the collection of 4 toys.

#### roni

##### Cornell FE
Job interview or an interview with a MFE director?
hope you did well !

#### Minh Tran

Is that 1/4 * [the total number of cereal boxes]? As the matter of fact that different types of toys have equal probabilities and there is no information about the number of boxes containing one type of toys.

#### vkaul

It is asking for expected value of the number of cereals to be purchased before you get all the four toys.
You can assume all four toys have equal probability of being in any cereal box.

#### quantyst

This was the hardest question I got in an interview.
Consider there are cereal boxes of a top brand containing one of four different types of toys with equal probability. What is the expected value of the number of cereal boxes which a person will need to buy in order to complete the collection of 4 toys.

In an interview environment, just about any question that's sort of unexpected, will look difficult, even if you have confidently done problems of that sort or even more difficult than that sort in the past. This is rather an easy problem (in a sense ...). This problem is akin to rolling a fair die until all the six numbers come up. On average how many times does one have to roll the die until all six numbers come up? Of course, one can generalize this to the case of an n-sided fair die, which makes the problem even easier to do. I may come back to this at a later time and show you the solution that I've done somewhere else. Ciao for now.

#### yzia

24 <--- Highlight that, I didnt want to give the answer away.

#### vkaul

no it is not 24.
And the quantyst has pointed out the right approach to solve it.
Keep trying but not in an interview.
Interviews are painful you just have to do 500 brainteasers and 250 FAQ C questions to get in.

#### quantyst

This was the hardest question I got in an interview.
Consider there are cereal boxes of a top brand containing one of four different types of toys with equal probability. What is the expected value of the number of cereal boxes which a person will need to buy in order to complete the collection of 4 toys.

An n-sided fair die is rolled again and again until all the numbers come up. On average how many times does one need to roll the die until all the numbers come up?

Suppose that at a certain point of this process we notice that of the n numbers c have come up and r are still remaining and we begin to wonder on average how many times does one need to roll the die from that point in the process until all the remaining numbers come up. So, the question we will now address is:

Assuming that of the n numbers of the die c have already come up and r are remaining, on average how many times does one need to roll the fair die until all the remaining r numbers come up?

Let E(n,c,r) denote the expected number of rolls needed until all the remaining r numbers come up.

As we roll the die at this point (i.e., the point (n,c,r)), what is the probability that a new number (i.e., one of the r remaining numbers) comes up? Obviously the answer is r/n. Similarly the probability that an old number (i.e., one of the c numbers that have come up previously) will come up is c/n. If a new number comes up, then from the new point (n,c+1,r-1) in the process, there will remain on average E(n,c+1,r-1) number of rolls before all the (remaining) numbers come up. If an old number comes up, then we are in no better position and from the new but 'unchanged' point (n,c,r) in the process, there will (again) remain on average E(n,c,r) number of rolls before all the (remaining) numbers come up.

So viewing the situation from the point (n, c, r), we have:

E(n,c,r) = 1 + (r/n)*E(n,c+1,r-1) + (c/n)*E(n,c,r). (**1**)

Since c+r=n and n is constant in the entire process, then we may consider E as a function of r only.

So we write equation (**1**) as:

E(r) = 1 + (r/n)*E(r-1) + (c/n)*E(r), which when simplified (using c=n-r) we get:

E(r) = (n/r) + E(r-1). (**2**)

It is obvious that E(0)=0 and that our goal is to compute E(n).

Solving (**2**) iteratively we get:

E(n) = n*{1+(1/2)+(1/3)+(1/4)+...+(1/n)}.

#### Han Z

I could only solve in a regular way, which is:

E[X] = 1 + E[Y|X1 = a]

E[Y|X1 = a] = 1/4 * (1+ E[Y|X1 = a]) + 3/4 * (1 + E[Z|X1 = a, X2 = b])

E[Z|X1 = a, X2 = b] = 1/2 * (1 + E[Z|X1 = a, X2 = b]) + 1/2 * (1 + E[W|X1 = a, X2 = b, X3 = c])

E[W|X1 = a, X2 = b, X3 = c] = 3/4 * (1 + E[W|X1 = a, X2 = b, X3 = c]) + 1/4 * (1)

After many substitutions, will get:

E[W|X1 = a, X2 = b, X3 = c] = 4
E[Z|X1 = a, X2 = b] = 6
E[Y|X1 = a] = 22/3
E[X] = 25/3

...

To be honest, I think if would be much tougher if you were asked to explain what is an eigenvalue or an eigenvector, which I got this totally unexpected question from my interview with ING.

• hixon

#### quantyst

This was the hardest question I got in an interview.
Consider there are cereal boxes of a top brand containing one of four different types of toys with equal probability. What is the expected value of the number of cereal boxes which a person will need to buy in order to complete the collection of 4 toys.

Here's another but more direct solution to the problem.

First we'll address a preliminary issue. Suppose a repeatable experiment X results each time in an outcome E with probability p. Then it is easy to show that the expected number of experiments X that need to be done in order to produce the outcome E is 1/p. The proof involves geometric random variable in which the probability that the event E will occur for the first time on the n-th experiment is p*(1-p)^(n-1). The rest follows in a standard fashion.

First we will take the case n=4 as in the above cereal boxes/toys problem.

On first try/purchase (getting the first box whichever toy it turns out to contain), we definitely get one toy. So far we have accumulated 1 box and 1 unique toy. Now the second different toy has a probability of 3/4 to be had on the next purchase of a box since there are still three other toys outstanding from the collection at hand. So it takes the purchase of 4/3 boxes to get the next different/unique toy. So far we have 1+(4/3) boxes accumulated. The third different/unique toy has a probability of 2/4 to be had on the next purchase of a box since at this time there are two other toys outstanding from the collection at hand. So it takes the purchase of 4/2 boxes to get the next different/unique toy. So far we have 1+(4/3)+(4/2) boxes accumulated. Now the last toy has a probability of 1/4 to be had on the next purchase of a box since at this time there is only one other toy outstanding from the collection at hand. So it takes the purchase of 4/1 boxes to get the next (and last) different/unique toy. So, altogether we have 1+(4/3)+(4/2)+(4/1) purchases of boxes, which can be expressed as

4*{(1/4)+(1/3)+(1/2)+1} = 4*{1+(1/2)+(1/3)+(1/4)}.

The generalization follows along the same lines.

#### quantyst

This was the hardest question I got in an interview.
Consider there are cereal boxes of a top brand containing one of four different types of toys with equal probability. What is the expected value of the number of cereal boxes which a person will need to buy in order to complete the collection of 4 toys.

Now that the problem is solved in at least two different ways, I would like to add to the complexity of the problem by changing some aspects of it. Here it goes:

Suppose each box contains precisely two toys such that (i) each toy has an equal probability to be in the box regardless of the other toy in the same box, (ii) each toy has an equal probability to be in the box but different from the other toy in the same box. Now do the problem.

Here's a generalization of the problem:

Cereal boxes of a top brand contain toys out of a set of n different types of toys. Each box contains precisely two toys. Find the expected value of the number of cereal boxes which a person will need to buy in order to complete the collection of n toys if

(i) each type of toy has an equal probability to be in a box regardless of the other toy in the same box,

(ii) each toy has an equal probability to be in the box but different from the other toy in the same box.

#### bob

##### Faculty (Undercover)
The hardest one I've gotten in an interview was the second part of a 2-part question:

part 1: 5 numbers are chosen at random from the uniform distribution on (0, c) for some unknown positive c. Given the 5 numbers, what is the maximum likelihood estimator of c?

part 2: A lottery is conducted as follows: Each day, 5 numbers are chosen at random from the uniform distribution on (0, c) for some unknown positive c. (Clearly, this number changes from day to day.) The numbers are published. When you purchase a ticket to play, you guess a positive real number; the participant coming closest in absolute terms to that day's c wins the prize. Given that day's 5 numbers, what number do you guess when you buy your ticket?

#### Shlomi

##### SuperDerivatives
The hardest one I've gotten in an interview was the second part of a 2-part question:

part 1: 5 numbers are chosen at random from the uniform distribution on (0, c) for some unknown positive c. Given the 5 numbers, what is the maximum likelihood estimator of c?

Isn't it the largest number of the 5 given numbers?

#### doug reich

##### Some guy
Isn't it the largest number of the 5 given numbers?

"Find x" "<-- here it is"

No, they mean what is the expected value of the maximum.

#### Shlomi

##### SuperDerivatives
"Find x" "<-- here it is"

No, they mean what is the expected value of the maximum.

Why? You are given the 5 numbers.
MLE means maximize the probability of the final observed outcome actually happening.

Clearly c cant be smaller than the largest number of the 5.
If c is equals to the max(5 given numbers) than the chances of drawing that given number is 5/c. if c is bigger there is less likelihood.

#### bob

##### Faculty (Undercover)
Isn't it the largest number of the 5 given numbers?
For part 1, yes. This isn't hard to see, since (P(X_{max} = x; c) = (\frac{x}{c})^5), and this clearly approaches its maximum--1--when x = c.

The second part is the one that stumped me in about hour 7.5 of the interview....

#### Uncle Max

Part 2: first guess is:

(\frac{2}{5}\sum_{i=1}^{i=5}{x_i})

#### bob

##### Faculty (Undercover)
Unsure what the reasoning is here. I don't think that even guarantees a guess greater than the largest X, which it should certainly be.

So that I won't be thought of as a mean person, I should let everyone know that my personal opinion on part 2 is that it doesn't have an answer as posed. Nevertheless, this is the way the question was posed to me in the interview.

#### quantyst

The hardest one I've gotten in an interview was the second part of a 2-part question:

part 1: 5 numbers are chosen at random from the uniform distribution on (0, c) for some unknown positive c. Given the 5 numbers, what is the maximum likelihood estimator of c?

part 2: A lottery is conducted as follows: Each day, 5 numbers are chosen at random from the uniform distribution on (0, c) for some unknown positive c. (Clearly, this number changes from day to day.) The numbers are published. When you purchase a ticket to play, you guess a positive real number; the participant coming closest in absolute terms to that day's c wins the prize. Given that day's 5 numbers, what number do you guess when you buy your ticket?

An intuitive approach: ON AVERAGE, the five numbers will be equally spaced along the interval (0,c), so if L is the largest of the five numbers, then the best estimate for c is (6/5)*L. But then, I don't think this answer is correct. Why not also take advantage of the smallest of the five numbers, S? So, another estimate would be c=6*S. And to do one better, let's set c equal to the average of the two estimates: c=(3/5)*(5S+L). Since the average incorporates the smallest, it does not make sense to do so as any number less than the largest is useless for determining c. See my response to maxrum.

But the problem is much more interesting than this, and more involved. If I have time, I'll get back to this later.

#### Uncle Max

The mean of 5 randomly picked numbers is:

(\mu_1 = \frac{\sum_{i=1}^{i=5}{x_i}}{5})

The mean of a uniform distribution:

(\mu_2 = \frac{c}{2})

If we are assume that (\mu_1 = \mu_2) we will get result I posted above.

As I said, it's just a first guess. Something that comes to my mind first and has some logic behind. But I'm sure, Bob, you can point some flaws here, as usual Replies
13
Views
4K
Replies
5
Views
2K
Replies
6
Views
3K
Replies
1
Views
820
Replies
2
Views
1K