• C++ Programming for Financial Engineering
    Highly recommended by thousands of MFE students. Covers essential C++ topics with applications to financial engineering. Learn more Join!
    Python for Finance with Intro to Data Science
    Gain practical understanding of Python to read, understand, and write professional Python code for your first day on the job. Learn more Join!
    An Intuition-Based Options Primer for FE
    Ideal for entry level positions interviews and graduate studies, specializing in options trading arbitrage and options valuation models. Learn more Join!

MSc Thesis: Random forest long-term return prediction

Joined
4/13/20
Messages
6
Points
13
Dear all,

I am a Quantitative finance student, and I will start writing my MSc Thesis soon. Right now, my idea is to research if a random forest algorithm could outperform traditional models (like Fama-French factor model) in predicting long-term stock returns. This is the topic I choose, as I am very interested in Machine Learning, and I'd like to learn and apply it.

I am in doubt right now whether this is the right topic to follow. I would like to hear your advice/opinion on this topic, whether it is feasible and/or interesting. + if you have ideas on an interesting touch to the topic, let me know!

Thank you in advance!
 
There's certainly no shortage of fancy buzz words in ML. The proposal is high-risk and there's no guarantee that useful results will be obtained. I could be wrong but a Plan B does no harm.

For MSc I recommend benchmarking existing methods and integrating/improving them with ML. Avoid the perpetual "90% of project finished (the other 10% will take forever) " syndrome.
Here's a good thesis from 2019


I have always thought that that prediction was difficult, especially predicting stock prices.
 
Thank your for the advice!

Do you believe that benchmarking and improving an existing model is more feasible? I have little prior knowledge on Machine Learning. Although i am very interested in learning it, I probably am limited to my capabilities.
 
Thank your for the advice!

Do you believe that benchmarking and improving an existing model is more feasible? I have little prior knowledge on Machine Learning. Although i am very interested in learning it, I probably am limited to my capabilities.
Yes, especially at MSc level and limited time resources you want to take an incremental approach.
Like life, we must walk before we can run. So, some time to learn ML before applying it to finance.

aka how to scope your thesis and a project estimation. I always ask student to write up a 1/2 A4 project summary.

BTW how much time do you have to allocate to the thesis?
 
I agree that my proposal is probably to time consuming to finish in time .. I am considering implementing existing models, e.g. Fama-French five factor model (2015). Do you know an interesting model to implement, and what I can do to improve it?

I have about 2.5 months to finish my thesis.
 
Sorry, don't know Fama-French model.
2.5 months is not much time.
 
I agree that my proposal is probably to time consuming to finish in time .. I am considering implementing existing models, e.g. Fama-French five factor model (2015). Do you know an interesting model to implement, and what I can do to improve it?

I have about 2.5 months to finish my thesis.
Here'a a possible, doable project: take the problem as in McGhee's article, make it more clear (better 'flow') and use the approach taken by Dalvir Mantara in the thesis I posted. For training data, use Heston exact and then maybe move to FDM/FEM.

BTW Dalvir's thesis is extremely well-written.

 
If I understand correctly, the main idea is to implement the Heston model using an ANN using a single layer. (I'm not gonna implement an image-based implicit learning method for the Heston model, right?). By doing this I will research the feasibility of a single layer ANN on calculating implied volatility using the Heston model.

Some questions: Can I use market data as input or should I simulate data? What do you mean by moving to FDM/FEM?

Dalvir's thesis is indeed impressive!
 
You can use simulated/synthetic data (i.e. analytic solution) as training data for the ANN (3-layers?)

FDM == finite difference method
FEM = finite element (more advanced)
 
The main problem I am facing, is that I don't have the computing power to perform this .. I only have my Macbook pro 2015 in my possession, and can't use the university desktops, because it is closed due to corona. Is it possible to compute this in the cloud?
 
The main problem I am facing, is that I don't have the computing power to perform this .. I only have my Macbook pro 2015 in my possession, and can't use the university desktops, because it is closed due to corona. Is it possible to compute this in the cloud?
I don't know, but I reckon so.
You can do version 1 on mac using Python. speed is for later.
 
Back
Top