MSc Applied Stats - Dissertation Topics

entropica · 9/20/14

I am in the final year of my applied stats masters and I am racking my brain looking for topics for my dissertation. I have twelve months to complete it, and I will be doing it full-time.

Some broad areas that I would like to learn about would be:
- Market microstructure
- Statistical arbitrage techniques
- Detection of predatory trading (perhaps from HFT)
- Detection of seasonality or diurnal effects in intraday data
- Detection of toxicity of order flow

Some potential statistical techniques I would like to use to get proficient at are:
- Kernel methods such as SVM
- Hidden Markov models and Kalman filters
- Multivariate time series and co-integration

The problems I have however are:
1. I don't really know what the interesting or hot questions in these areas are
2. I don't know which of these topics are realistically within the bounds of my current knowledge
3. Availability of data

The last point I think is my biggest problem. For instance if I decide to look into detection of HF predatory trading tactics, chances are I will need Level II market data with full order book depth, which I clearly will not be able to pay for, nor do I suppose will my university have relationships with any exchanges in order to get it!

So in summary, I'm looking for a dissertation topic that would hopefully get me noticed by some quant funds at the end of this (too ambitious?), would enable me to become proficient with modern computationally-intensive statistical techniques that leverage my programming background, and that are still feasible with respect to my personal capabilities and also data availability.

I would be hugely appreciative of any advice anyone could offer. Thanks in advance!

Ian Kaplan · 9/20/14

The development of quantitative investment portfolios involves back-testing. This back-testing is generally iterative. That is, you try a portfolio algorithm and then refine it. Inevitably this results in some degree of over-fitting. For the practitioner this is a critical issue since it is possible to develop a portfolio model that does well in back-test be badly when actually traded. A recent paper, Pseudo-Mathematics and Financial Charlatanism: The Effects of Backtest Overfitting on Out-of-Sample Performance by David H. Bailey, Jonathan M.
Borwein, Marcos L0pez de Prado, and Qiji Jim Zhu (available on SSRN). Looking deeply into the statistical problems inherent in portfolio development would make a good dissertation topic. As far as I know, not much has been written in this area, you your dissertation might get noticed as well.

entropica · 9/22/14

That paper looks really interesting - thanks Ian!

yetanotherquant · 9/24/14

The topics you have mentioned imply that you [should] have high-quality intraday data.
They are, however, expensive and are not generally available. Do not expect that you can easily get them for free as a student
(at least I was not able to get them as I for my Ph.D. thesis).

On the other hand the daily OHLC Data are available from Yahoo.Finance.
Though you should screen them, they are generally good. Once I checked them against Bloomberg data (for the most liquid German stocks) and they were nearly identical.

Have a look how can you extract data from Yahoo in a "batch mode":
http://www.yetanotherquant.com/stockmarket/

You can also use R
(I describe in my book how to do it: http://www.amazon.co.uk/dp/3000465200)

tfors · 9/26/14

you can get intraday level 2 data cheaply from esignal or similar (if things haven't changed)

MSc Applied Stats - Dissertation Topics

entropica

Ian Kaplan

entropica

yetanotherquant

tfors

Similar threads