• Countdown to the 2025 QuantNet rankings. Join the list to get the ranking prior to public release!

Quant Internship Interview Questions

Joy Pathak

Swaptionz
Joined
8/20/09
Messages
1,328
Points
73
I thought I would toss some questions I got asked in an interview here since I guess most of the questions asked in the contest will probably be quant interview type questions...

1) The probability that bank will default in any single year is 10%. What is the probability that the bank will default in two years.

2) P value of slope coefficient in regression 1 is 1%. P value of slope coefficient in regression 2 is 5%. In which of the regressions you have more confidence that the slope coefficient is different from zero.

3) Correlation between X and Y is 0.5. X increases by 5 units, by how much Y will increase.

Feel free to post replies...
 
I thought I would toss some questions I got asked in an interview here since I guess most of the questions asked in the contest will probably be quant interview type questions...

1) The probability that bank will default in any single year is 10%. What is the probability that the bank will default in two years.

2) P value of slope coefficient in regression 1 is 1%. P value of slope coefficient in regression 2 is 5%. In which of the regressions you have more confidence that the slope coefficient is different from zero.

3) Correlation between X and Y is 0.5. X increases by 5 units, by how much Y will increase.

Feel free to post replies...

1. P = 0.9 x 0.1 = 9%
3. Don't we need to know what 5 units represent for X?
 
1. P = 0.9 x 0.1 = 9%
3. Don't we need to know what 5 units represent for X?

Intuitively, over a longer period of time, the probability of default should be greater, not smaller. It should be (1-0.9^2=0.19).
 
Peter - it could be either, the question is worded a little ambiguously. Does Joy mean the probability that the bank will default on exactly the second year or the probability that the bank will default by the second year? I'd probably ask that as a follow-up question...

2. 1%.

3. I'll take a guess. It moves by 0.5*(5/standard deviation of X)*standard deviation of Y
 
Peter - it could be either, the question is worded a little ambiguously. Does Joy mean the probability that the bank will default on exactly the second year or the probability that the bank will default by the second year? I'd probably ask that as a follow-up question...

2. 1%.

3. I'll take a guess. It moves by 0.5*(5/standard deviation of X)*standard deviation of Y

2. P-value is the probability of the statistic being at least as extreme as the one observed. So with a greater P-value, you're more confident that your observation wasn't an anomaly. I think 5% is the correct answer.
 
Peter - it could be either, the question is worded a little ambiguously. Does Joy mean the probability that the bank will default on exactly the second year or the probability that the bank will default by the second year? I'd probably ask that as a follow-up question...

2. 1%.

3. I'll take a guess. It moves by 0.5*(5/standard deviation of X)*standard deviation of Y

I am not sure. I just stated exactly what I was told.
 
2. P-value is the probability of the statistic being at least as extreme as the one observed. So with a greater P-value, you're more confident that your observation wasn't an anomaly. I think 5% is the correct answer.
It's a matter of what p-values, and what is tested. In regression, I believe the null hypothesis is that your slope is zero (no relationship between variables). A 1% p value corresponds to a 1% chance that your slope is zero. 5% p value = 5% chance that your slope is indeed zero. I went with 1% based on these assumed mechanics of the test.
 
It's a matter of what p-values, and what is tested. In regression, I believe the null hypothesis is that your slope is zero (no relationship between variables). A 1% p value corresponds to a 1% chance that your slope is zero. 5% p value = 5% chance that your slope is indeed zero. I went with 1% based on these assumed mechanics of the test.

You never talk about the probability of the statistic equalling zero, because that's an event of measure 0. Instead, you talk about the probability of your statistic being within or outside of a given interval. Again, the p-value represents the probability of your statistic being at least as extreme as the one you observed. In this case, that means the probability of the actual slope being at least as far from zero as the slope you observed.
 
3 is unsolvable... there's a correlation of .5, not 1. You can give a confidence interval for Y, if that's what you want...
 
Here are some more...

  1. Stock price today is $20 and it follows a Geometric Brownian Motion. The beta of stock is 1. You are trying to model a stock price a year from today. Which outcome is more likely; stock price >$20 or stock price <$20 (Quant risk modelling)
  2. How many tennis balls are made in China each year? (Junior Trader)
  3. What is the expected value of rolling two dice? (Trading assistant)
  4. What is the chance of drawing two cards from the top of the deck that turns out to be the same number? (Trader)
  5. Two ropes that are non-uniform in composition burn completely in 1 hour. You only have a lighter and those two ropes and you have got to tell me when 45 minutes are done. (Trading internship)
  6. You own property that may or may not have oil underneath. If it does have oil it is worth 100k (20% prob), if it doesn't it is worth 30k. What is the expected value of the land and how much would you sell a contract for this land with a strike of 40k? (S&T internship)
 
1. What's the point of beta > 1? This tells me how it trends with the market, I guess this allows me to assume positive drift? Or not ... depending which way markets are headed. Anyway, based on markets today with positive drift and beta = 1 I expect it to be above $20.

2. Make some assumptions about the number of tennis balls produced in China, 80%, make some assumptions about the amount of tennis balls bought per day for tennis, dogs, walkers, etc.

Say in the US of a population of 300MM, approximately 3000 people will buy a 6-pack of tennis balls today (0.001%). Out of the world's population assume a smaller number (say 0.0001 %). Hence of a population of 6 billion (or so) 6000 people will buy 6 tennis balls per day. Assume a year has 300 days (some days people just wont be buying tennis balls), 300 * 6000 * 6 * 80% = 750,000 approximate tennis balls produced in China. Voila! I hate these questions ....
 
from a google search result...

The P-value is the probability that you would have found the current result if the coefficient were equal to 0 (null hypothesis). If the P-value for one or more coefficients is less than the conventional 0.05, then these coefficients can be called statistically significant, and the corresponding independent variables exert independent effects on the dependent variable Y
 
Common misconceptions about the p-value taken directly from Wikipedia (it's a good read for all):

  1. The p-value is not the probability that the null hypothesis is true.
    In fact, frequentist statistics does not, and cannot, attach probabilities to hypotheses. Comparison of Bayesian and classical approaches shows that a p-value can be very close to zero while the posterior probability of the null is very close to unity (if there is no alternative hypothesis with a large enough a priori probability and which would explain the results more easily). This is the Jeffreys–Lindley paradox.
  2. The p-value is not the probability that a finding is "merely a fluke."
    As the calculation of a p-value is based on the assumption that a finding is the product of chance alone, it patently cannot also be used to gauge the probability of that assumption being true. This is different from the real meaning which is that the p-value is the chance of obtaining such results if the null hypothesis is true.
  3. The p-value is not the probability of falsely rejecting the null hypothesis. This error is a version of the so-called prosecutor's fallacy.
  4. The p-value is not the probability that a replicating experiment would not yield the same conclusion.
  5. 1 − (p-value) is not the probability of the alternative hypothesis being true (see (1)).
  6. The significance level of the test is not determined by the p-value.
    The significance level of a test is a value that should be decided upon by the agent interpreting the data before the data are viewed, and is compared against the p-value or any other statistic calculated after the test has been performed. (However, reporting a p-value is more useful than simply saying that the results were or were not significant at a given level, and allows the reader to decide for himself whether to consider the results significant.)
  7. The p-value does not indicate the size or importance of the observed effect (compare with effect size). The two do vary together however – the larger the effect, the smaller sample size will be required to get a significant p-value.
 
Intuitively, over a longer period of time, the probability of default should be greater, not smaller. It should be (1-0.9^2=0.19).
Peter, could you please explain your logic? is that the probability of defaulting in the first or the second year?
 
Peter, could you please explain your logic? is that the probability of defaulting in the first or the second year?

It's the probability of defaulting at some point over the first two years. It is one minus the probability of not defaulting in both year 1 and year 2.
 
or alternatively probability of defaulting on year 1 (.1) + probability of defaulting on year 2 which is the probability of surviving year one to default on year two (.9*.1)

.1+.09=.19
 
@peterruse and @Alexei Smirnov, 100% agree if you assume the default will happen over the next 2 years. I assumed the default will happen in year 2 only. Now thinking about it, the question makes more sense if it required the default to happen over the next 2 years (the way you answered it).
 
3 is unsolvable... there's a correlation of .5, not 1. You can give a confidence interval for Y, if that's what you want...

I think so too. Unless you know the variance/standard deviation in both x & y, you can not get the slope of the regression line and hence can not give a definite answer to this question. Maybe the interviewer was looking for this kind of an answer!?
 
Here are some more...
  1. What is the expected value of rolling two dice? (Trading assistant)

Here's the probability associated with each outcome of rolling two dice:

Outcome Probability
2 1/36
3 2/36
4 3/36
5 4/36
6 5/36
7 6/36
8 5/36
9 4/36
10 3/36
11 2/36
12 1/36

Hence, expected value = Sum_i (outcome_i*Probability_i) = [Sum_{i=2}^{7} i*(i-1) + Sum_{i=8}^{12} i*(13-i)]/36 = 7.
 
Back
Top Bottom