• C++ Programming for Financial Engineering
    Highly recommended by thousands of MFE students. Covers essential C++ topics with applications to financial engineering. Learn more Join!
    Python for Finance with Intro to Data Science
    Gain practical understanding of Python to read, understand, and write professional Python code for your first day on the job. Learn more Join!
    An Intuition-Based Options Primer for FE
    Ideal for entry level positions interviews and graduate studies, specializing in options trading arbitrage and options valuation models. Learn more Join!

Scanning the whole market using R (Stat.Arb)

Joined
8/26/15
Messages
12
Points
11
Hi All ,

Thanks for the previous posts , I was able to code my strategy on to R . But now I have a small problem or rather a hick up in the process . As Stat.Arb is basically a pair trading strategy . There are around 2000 stocks in the market , so how can I make this strategy run for all the stocks (i,e it backtests and shows the the results for all the combination of stocks in the market and shows as per the stratergy which trades are open and which are closed . ) . It would be wonderful if someone could help me out with this .

Awaiting reply ,

Hedge
 
As Stat.Arb is basically a pair trading strategy . There are around 2000 stocks in the market , so how can I make this strategy run for all the stocks (i,e it backtests and shows the the results for all the combination of stocks in the market and shows as per the stratergy which trades are open and which are closed . ) . It would be wonderful if someone could help me out with this.

Hi hedgefuge,

usually a trading strategy is backtestet by transforming it into trades. See this list of backtesting books. You will find further backtesting books on Amazon, if you type in "Backtesting" or "Backtest" or "Back-Testing".

When you are backtesting an arbitrage startegy, you have to deal with paired trades (one long, one short).

As far you do not provide insight in your strategy, the only advice we can provide to you is to read an R book, so that you get able to do backtesting in R. I myself used the first three of these R programming books. These three books have been very helpful to me for quickly learning R programming.

PM, in case.

Martin
 
Last edited:
fair enough . I'll give you a break down of the process , i use a very very basic and simple arbitrage model . So following is the process breakdown :
1. Find the correlation coefficient of all the stocks in the market and select the ones which have a coefficient of 0.80 or above .
2. Then Draw the price chart for both to visually see the correlation (Done this in R , its a part of the code) .
3. Identify a mean or a middle line and add a standard deviation from the mean a.k.a spread .
4. wait for the mispricing to occur , that is . let the prices hit the Standard deviation and when it does , take a pair trade , where you are long in one stock and short in another .

Now , i've been able to program the above into R . But now my problem is , i've been able to identify approximately 400 stocks in the market who are highly correlated to eachother . (Either through the index , to form a triangle arbitrage or with one another) . I would like to know that , can I run the above 4 step stratergy in R , in such a way that , it actually runs and backtests each and every pair combination from the above 400 stocks and give me a break down(in the form of a table ) :
(I use EOD data , so if it can provide a table which contains the follows ):
1. Spread between the 2 stocks .
2. Standard deviation from the mean .
3. The mean . (For example : i am using the P/E ratio )
4. another column where it shows if a trade is open , closed or running or non-existent based on the above break up .
NOTE: the above 4 points is to give indication of the trade .

5. Avg. No . of days the trade was open in the past . (i,e in the past 15years if there were 3k trades in a pair , then what was the Avg. no of days the trade took to close )
6. and a Stoploss of 0.5 Standard deviations above the entry point .
7. And while it back tests , it provides me with a list of all the trades taken . profitable or not profitable .
Note : point 5,6 and 7 form part of the backtesting process .

Now , this process needs to be followed for all the combinations possible for all the 400 stocks .

This is like a super detailed breakdown of what kind of arbitrage i do . i use correlation and once there is diversion in the correlation , i take advantage of it .

Hope this is helpful Martin ,

HedgeFudge
 
At the moment I do not have the time to help you with R programming. But I am asking consuli, if he knows R Packages for backtesting.
 
From what you write it seems to me that you want to have a large data set with all 400 stocks, however you don't know how to automatically download them. If this is the case, you can create a list in R with all 400 quotes you are interested in and then loop through them and dowload the data for each stock separately (something you can do right now) and append them to already existing data set.
 
Well automating it , is somethhing i am looking for . but also what i am asking help regarding is ,
for example : i have a set of 6 stocks I want to use this strategy for so . so i want the backtesting and I also want it to display the buy , sell or hold signals for all the combination of the 6 stocks . so lets say i have a stocks a,b,c,d,e,f . so a+b is one combo , a+c is another one and so on ....
so what i am asking is , if i upload 400 stocks like you have stated above , would it compute all the combination of the 400 stocks ? like 1st stock + 2nd , 1+3 ...1+400 .. like that 2nd + 3rd , 2nd + 4th ... till 2nd + 400th stock .

Getting what i am saying ? would it compute for all the possible combination of pairs available within the 400 stock data set .?
 
It's definitely possible. Once you have the large data set with all data you need (you can construct it as I proposed), doing something you want should be something like this:
for (i in 1:numberOfStocks)
for (j in i:numberOfStocks)
{
if (i != j) { stockAnalysis(stocks[i ],stocks[j])} ;
if criteria satisfied, save stock pair i,j to another data.table, so you have all the pairs satisfying
your criteria;
etc...
}

I suggest to use the data.table library, since all operations in data tables are much much quicker than in regular data frames. You will also most probably want to introduce some heuristics to make the algorithm less computationally demanding (let's say if stock A is highly correlated with stock B and stock B is basically uncorrelated with C,D and E, you can probably skip the analysis comparing A to C,D and E, etc.).
 
fair enough . I'll give you a break down of the process , i use a very very basic and simple arbitrage model . So following is the process breakdown :
1. Find the correlation coefficient of all the stocks in the market and select the ones which have a coefficient of 0.80 or above .
2. Then Draw the price chart for both to visually see the correlation (Done this in R , its a part of the code) .
3. Identify a mean or a middle line and add a standard deviation from the mean a.k.a spread .
4. wait for the mispricing to occur , that is . let the prices hit the Standard deviation and when it does , take a pair trade , where you are long in one stock and short in another .

Now , i've been able to program the above into R . But now my problem is , i've been able to identify approximately 400 stocks in the market who are highly correlated to eachother . (Either through the index , to form a triangle arbitrage or with one another) . I would like to know that , can I run the above 4 step stratergy in R , in such a way that , it actually runs and backtests each and every pair combination from the above 400 stocks and give me a break down(in the form of a table ) :
(I use EOD data , so if it can provide a table which contains the follows ):
1. Spread between the 2 stocks .
2. Standard deviation from the mean .
3. The mean . (For example : i am using the P/E ratio )
4. another column where it shows if a trade is open , closed or running or non-existent based on the above break up .
NOTE: the above 4 points is to give indication of the trade .

5. Avg. No . of days the trade was open in the past . (i,e in the past 15years if there were 3k trades in a pair , then what was the Avg. no of days the trade took to close )
6. and a Stoploss of 0.5 Standard deviations above the entry point .
7. And while it back tests , it provides me with a list of all the trades taken . profitable or not profitable .
Note : point 5,6 and 7 form part of the backtesting process .

Now , this process needs to be followed for all the combinations possible for all the 400 stocks .

This is like a super detailed breakdown of what kind of arbitrage i do . i use correlation and once there is diversion in the correlation , i take advantage of it .

Hope this is helpful Martin ,

HedgeFudge


I don't think you understand what pairs trading is. Correlation does not imply cointegration. You need to regress the relative prices of each pair of stocks, and then look for stationarity in the residuals via a ADF test.
 
I don't think you understand what pairs trading is. Correlation does not imply cointegration. You need to regress the relative prices of each pair of stocks, and then look for stationarity in the residuals via a ADF test.

Not necessarily , u can also make stat.arb strategies using correlation alone . cointergration tells us if the stocks would be in sync even in the near future . I dont need that , for my stratergy correlation alone is fine , cause i dont follow the strategy blindly , I also use the correlation data , to hedge my directional bets or if I feel a stock in the pair trade , is not going to move towars the mean much , then I substitute it with options ...
People use correlation , cointergration and few use both together (Although i dont see why anyone would want to use it together ) .

Secondly , why would i want to regress the prices ?
 
It's definitely possible. Once you have the large data set with all data you need (you can construct it as I proposed), doing something you want should be something like this:
for (i in 1:numberOfStocks)
for (j in i:numberOfStocks)
{
if (i != j) { stockAnalysis(stocks[i ],stocks[j])} ;
if criteria satisfied, save stock pair i,j to another data.table, so you have all the pairs satisfying
your criteria;
etc...
}

I suggest to use the data.table library, since all operations in data tables are much much quicker than in regular data frames. You will also most probably want to introduce some heuristics to make the algorithm less computationally demanding (let's say if stock A is highly correlated with stock B and stock B is basically uncorrelated with C,D and E, you can probably skip the analysis comparing A to C,D and E, etc.).


Now what u said above , just went above my head . Basically this is my problem , i made like a sample code for the above stratergy (pretty sure , its not perfect ) but to test it , i uploaded only data for 2 stocks and made a vector using thier symbols .
Example : c(APPL , Yahoo) etc..
so , bascially made a data frame for 2 items. but now i want it to scan the entire market (400stocks) . how can i make a data set of 400 stocks , and find the correlation coefficient , correlation of each possible pair from those 400 stocks in form of graph , and also adding the mean and std.deviation (spread) ...........


basically , i need the freaking codes for this kinda of strategy .. and how to make it run automatically ...

Is anyone able to understand what i am trying to say ?
 
fair enough . I'll give you a break down of the process , i use a very very basic and simple arbitrage model . So following is the process breakdown :
1. Find the correlation coefficient of all the stocks in the market and select the ones which have a coefficient of 0.80 or above .
2. Then Draw the price chart for both to visually see the correlation (Done this in R , its a part of the code) .
3. Identify a mean or a middle line and add a standard deviation from the mean a.k.a spread .
4. wait for the mispricing to occur , that is . let the prices hit the Standard deviation and when it does , take a pair trade , where you are long in one stock and short in another .

Now , i've been able to program the above into R . But now my problem is , i've been able to identify approximately 400 stocks in the market who are highly correlated to eachother . (Either through the index , to form a triangle arbitrage or with one another) . I would like to know that , can I run the above 4 step stratergy in R , in such a way that , it actually runs and backtests each and every pair combination from the above 400 stocks and give me a break down(in the form of a table ) :
(I use EOD data , so if it can provide a table which contains the follows ):
1. Spread between the 2 stocks .
2. Standard deviation from the mean .
3. The mean . (For example : i am using the P/E ratio )
4. another column where it shows if a trade is open , closed or running or non-existent based on the above break up .
NOTE: the above 4 points is to give indication of the trade .

5. Avg. No . of days the trade was open in the past . (i,e in the past 15years if there were 3k trades in a pair , then what was the Avg. no of days the trade took to close )
6. and a Stoploss of 0.5 Standard deviations above the entry point .
7. And while it back tests , it provides me with a list of all the trades taken . profitable or not profitable .
Note : point 5,6 and 7 form part of the backtesting process .

Now , this process needs to be followed for all the combinations possible for all the 400 stocks .

This is like a super detailed breakdown of what kind of arbitrage i do . i use correlation and once there is diversion in the correlation , i take advantage of it .

Hope this is helpful Martin ,

HedgeFudge
I like your post. I have done most of what you have pointed out using C++
 
Back
Top