Imagine you have it all ... all the data, what do you do next ?

Pawel · 12/15/11

If you had all the data you need (what would it be?), how would you start building models to do algorithmic trading ?

Jacek Podlewski · 12/15/11

I'm far from an experienced algo-trader, yet having some experience with strategy testing I might share my point of view. A good idea is to develop a general backtesting framework, which would read in all the necessary data and be flexible enough to analyze the data using different models. Particular models or strategies should be implemented as separate functions or programs and called from the 'main' framework, same applies to performance analysis tools - this can really make testing a lot easier. You might start with very simple strategies just to see if the code does exactly what you want it to do and then proceed to more sophisticated methods.
In my opinion it's also important that the code's output files should present all necessary information in a clear way.
I think that what I'm writing here are very general principles, as details might considerably differ depending on what are your 'target' markets and what specific models and IT tools you'd like to use.
If you find those information useful, feel free to PM me for more detailed questions, could be in Polish as well.

Alexander Krosky · 12/15/11

I agree with Jacek. There are infinitely many ways to "cheat" a backtest. Here are some questions you should ask yourself about the backtesting engine:

How can I realistically simulate limit orders?
Where are my fills coming from?
Do I have the low latency to actually realize the return?
T-costs, market impact, liquidity rebates?

As for the model, you should conduct preliminary alpha research. Abstract it away from finance as much as possible because you don't have to run realistic simulations to see what predictive ability your model has. You have your model's predictions and the corresponding realized values. There will be fundamental questions you'll need to ask yourself, such as:

What "clock" should the model use?
What trading frequency? -> What investment horizon? -> What data structure?
Can I trust the data? -> How do I clean the data? -> What data structure?
What are my metrics for measuring the performance of the model?
What can I benchmark the model against?
Am I using the proper number of data points for the number of dof in the model?

For the data itself, the first step might be to do some event studies in order to generate ideas.

Pawel · 12/16/11

Thanks for the replies ...

I work on end-of-day stock data (which i already have in a database), i also have some basic fundamental data on the companies (quarterly reports data, dividends, etc).

As to the backtesting, I have Amibroker which is quite useful for TA, and R cluster with packages quantstrat and PerformanceAnalytics - for backtesting some models created in R (which i don't have now).
For me, the results of backtesting only give some extra information for further modeling, because when you take your data, and your model (which probably has at least several parameters) you get lots of different statistics after backtesting, for example:
1. for each asset you model probably gices different results - looking obly at returns, on some it works, and probably on most you go below 0
2. you have lots of ratios: Returns, Sharpe, Kelly, winners, losers, winers/losers, min/max/avg drawdown, min/max/avg time to recover, and so on ...

So backtesting gives you even more things to thing about

How do you deal with it ?

Imagine you have it all ... all the data, what do you do next ?

Pawel

Jacek Podlewski

Alexander Krosky

Pawel

Similar threads