Statistics for FE

const451 · 8/1/08

Could someone enlighten me on how Statistics is used and useful for Financial Engineering?

I have a Computer Science background and predetermined to apply to MFE program. I am taking missing math classes at my local university; I am going to take PDE and Advanced Calculus courses in the Fall semester and also considering to take a medium level course in Statistics. I am just not sure if it is useful for FE. I have taken the introductory Prob and Stat for Engineers course.

I would eventually prefer to work in trading or research but not risk management.

Thank you!

ianfin · 8/1/08

Hi const451,

Unfortunately I lost 2 page answer to this. But let me rewrite a very short one. For simplicity consider option pricing which is "basic thing" in FE.

You can compute price of an option at least in four different ways:
1. Analytical/semi-analytical solution (closed form preferred)
2. Monte Carlo integration
3. using Partial Differential Equations
4. Binomial trees (or more sophisticated trees)

So what Statistics has to do with all of these 4 methods? Well they are all more or less 100% based in Statistics (and Maths / Mathematical Statistics).

More details:
1. You need to derive "terminal distribution" of underlying asset(s) bind to your option before analytical solution can be (tried to be) found. This requires stochastic differential equations, ITO calculus, probability theory, and IQ

Or a lazy person uses book (provided the model for stochastic process is handled in the book...)

Pros: the nicest solution, at least for a mathematician
Cons: derivation pain (however: no pain - no gain

)
PS. Take a look at: http://media.uow.edu.au/news/2007/0601a/index.html

2. Monte Carlo integration is one elementary tool in Statistics. It is typically about numerical computation of expectation of something. In this example, it is about computing expectation of option price at some time. Importance Sampling is also tackled with MCI to speedup convergence.

Pros:
- somewhat simple (provided terminal distribution is known, if not then more works needs to be done).
- the only option when dimension of underlying asset/time history bind to option is high.
Cons:
- sensitivity analysis (derivates with respect to some parameters) requires somewhat high amount of computations

3. Partial Differential Equations (PDE) are encounted when Stochastic Differential Equations (SDE) are converted into PDEs. SDEs & Ito calculus are faced when dealing with Stochastic Processes which are required in option pricing.

Pros:
- Algorithms available to solve PDEs.
- Sensitivity analysis is given somewhat "free" (atleast compared to MC)
Cons:
- These methods "broke" when dimension of underlying asset(s)/time history is high. Of course this is problem dependent.

4. Binomial trees are "basic models" in many Stochastic Process courses, and they can be used in option pricing for example.

Pros:
- quite nice solutions can be gotten using these.
Cons:
- more difficult than MC

And remember (to study all three below subjects if possible (maybe not in this order) :
Mathematics
Mathematical Statistics ("link between above and below")
Statistics

Why? Because applying everything above "properly" requires quite a some knowledge about these.

Best regards,
ianfin (Ph.D. Statistics, Ms.C. Software Engineering, doing Ph.D. studies in Software Engineering and basic studies in Business & Economics)

olga · 8/1/08

Hi ianfin,

I have coupl of questions:
What is the difference between ways 1 and 3?
Why is the binomial tree method more difficult than MC?
Thanks!

ianfin · 8/1/08

Hello Olga,

"What is the difference between ways 1 and 3?"
Ok, good question. My text was not clear, let me clarify. When listing those 4 methods I thought that method 3 is solved numerically. Of course, if you can solve PDE analytically/in closed-form then it falls to case of method 1 (closed/semi-closed form solution).

"Why is the binomial tree method more difficult than MC?"
At least if terminal/sampling distribution is known for MC then it is just required that you:
i) build a sampling algorithm for that distribution
ii) draw random sample
iii) compute your function with drawn sample
iv) increase sample size (to gather data for drawing/assessing convergence)
v) GOTO ii)

With binomial tree you have tree and its "recursive structure" which you may need to travel in forwards/backwards. In my point of view it is more challenging programming task, in general case. But of course you may have other opinion(s) about this.

Here's a small problem to think for people considering Monte Carlo integration:
(no, I am not saying not to use MC but just saying that be careful...)
=======================================================
For simplicity, let's assume stupid model in which stock price at time t is of form
S(t) = signal(t) + noise,
where signal(t) is some deterministic, non-random, part and noise is random term (and independent of stock price at any time step). Let the noise term be Pareto distributed with parameters xm=0 and k=2 (it is zero-mean, see Pareto distribution - Wikipedia, the free encyclopedia).

Now you are computing expectation E[g(S(T))] for some nicely behaving payoff function g using Monte Carlo integration. You do this by starting with some initial sample size, drawing random sample, computing average of payoff function of the drawn samples, then increase sample sizes, and repeat until "enough" accuracy.

Two questions:
1) what is average behaviour of the Monte Carlo estimate of expectation?
(answering: bad or good is enough)
2) is the any problems with the Monte Carlo estimate?
[I assume computer program is coded without bugs]

HINTS (for second question):
- If Xn is your Monte Carlo estimator for sample size n and let capital X be the value of the expectation being computed (non-random quantity), then is there any issues with convergence in the 2-th mean? (see "Convergence in mean" from Convergence of random variables - Wikipedia, the free encyclopedia)
- check what changes if k is set to 3, does the situation change?

PS. If you are not familiar with terms estimate, estimator, (fixed) parameter. Then think that estimator is "like estimate but it is computed from non-known/random data which still follows the above stochastic process", whereas estimate is computed from a drawn sample (which is fixed as sample has been drawn). This is not good distinguishion between estimator/estimate (look books for proper definitions), but might do here.

const451 · 8/1/08

ianfin,

thank you for your time and thorough answer!

const451 · 8/18/08

What exactly Mathematical Statistics is about? I do not see the difference between "Mathematical Statistics and Data Analysis" by John Rice and "Probability and Statistics for Engineering and the Sciences" by Jay L. Devore .

THX

PatM · 8/19/08

In 2007, we used a book called Statistics for Finance by David Ruppert, which is basically the class notes from a stat class in the Cornell MFE.
I can recommend this -- it touches option pricing, similar to above, but discusses other topics: GARCH, VaR, more sophisticated portfolio theory with resampling.
http://people.orie.cornell.edu/~davidr/StatFinance/index.html
GARCH has really had an impact on the estimation of variance, because it can fit data well and it produces time dependent forecasts. Everything that comes from that, vol, beta, correlation, etc is also affected. It's a hot area and there are dozens of variations. People are moving away from ever using simple averages or exponentially weighted moving averages for variance anymore. RiskMetrics uses GARCH now, and they are the risk industry benchmark.

There is also Analysis of Financial Time Series by Ruey Tsay, which is mentioned alot as a reference work, but I haven't read it.

Statistics for FE

const451

ianfin

olga

ianfin

const451

const451

PatM