MSc Thesis: Random forest long-term return prediction

JoostSD

New Member
Dear all,

I am a Quantitative finance student, and I will start writing my MSc Thesis soon. Right now, my idea is to research if a random forest algorithm could outperform traditional models (like Fama-French factor model) in predicting long-term stock returns. This is the topic I choose, as I am very interested in Machine Learning, and I'd like to learn and apply it.

I am in doubt right now whether this is the right topic to follow. I would like to hear your advice/opinion on this topic, whether it is feasible and/or interesting. + if you have ideas on an interesting touch to the topic, let me know!

Thank you in advance!
 

Daniel Duffy

C++ author, trainer
There's certainly no shortage of fancy buzz words in ML. The proposal is high-risk and there's no guarantee that useful results will be obtained. I could be wrong but a Plan B does no harm.

For MSc I recommend benchmarking existing methods and integrating/improving them with ML. Avoid the perpetual "90% of project finished (the other 10% will take forever) " syndrome.
Here's a good thesis from 2019


I have always thought that that prediction was difficult, especially predicting stock prices.
 

JoostSD

New Member
Thank your for the advice!

Do you believe that benchmarking and improving an existing model is more feasible? I have little prior knowledge on Machine Learning. Although i am very interested in learning it, I probably am limited to my capabilities.
 

Daniel Duffy

C++ author, trainer
Thank your for the advice!

Do you believe that benchmarking and improving an existing model is more feasible? I have little prior knowledge on Machine Learning. Although i am very interested in learning it, I probably am limited to my capabilities.
Yes, especially at MSc level and limited time resources you want to take an incremental approach.
Like life, we must walk before we can run. So, some time to learn ML before applying it to finance.

aka how to scope your thesis and a project estimation. I always ask student to write up a 1/2 A4 project summary.

BTW how much time do you have to allocate to the thesis?
 

JoostSD

New Member
I agree that my proposal is probably to time consuming to finish in time .. I am considering implementing existing models, e.g. Fama-French five factor model (2015). Do you know an interesting model to implement, and what I can do to improve it?

I have about 2.5 months to finish my thesis.
 

Daniel Duffy

C++ author, trainer
I agree that my proposal is probably to time consuming to finish in time .. I am considering implementing existing models, e.g. Fama-French five factor model (2015). Do you know an interesting model to implement, and what I can do to improve it?

I have about 2.5 months to finish my thesis.
Here'a a possible, doable project: take the problem as in McGhee's article, make it more clear (better 'flow') and use the approach taken by Dalvir Mantara in the thesis I posted. For training data, use Heston exact and then maybe move to FDM/FEM.

BTW Dalvir's thesis is extremely well-written.

 

JoostSD

New Member
If I understand correctly, the main idea is to implement the Heston model using an ANN using a single layer. (I'm not gonna implement an image-based implicit learning method for the Heston model, right?). By doing this I will research the feasibility of a single layer ANN on calculating implied volatility using the Heston model.

Some questions: Can I use market data as input or should I simulate data? What do you mean by moving to FDM/FEM?

Dalvir's thesis is indeed impressive!
 

Daniel Duffy

C++ author, trainer
You can use simulated/synthetic data (i.e. analytic solution) as training data for the ANN (3-layers?)

FDM == finite difference method
FEM = finite element (more advanced)
 

JoostSD

New Member
The main problem I am facing, is that I don't have the computing power to perform this .. I only have my Macbook pro 2015 in my possession, and can't use the university desktops, because it is closed due to corona. Is it possible to compute this in the cloud?
 

Daniel Duffy

C++ author, trainer
The main problem I am facing, is that I don't have the computing power to perform this .. I only have my Macbook pro 2015 in my possession, and can't use the university desktops, because it is closed due to corona. Is it possible to compute this in the cloud?
I don't know, but I reckon so.
You can do version 1 on mac using Python. speed is for later.
 
Top