# Cloud services to backtest minute data

#### donny

Hi Quantnet,

My current tasks is to backtest a 5 min bar trading strategy over 10 years. It uses regression and as expected, it's very slow. At 1 hour needed for a 1 asset 10 year result, it's obvious that I need more computational power. I have a few virtual machines at my disposal but they won't cut it.

Question: Are there any free cloud services where I can rent and backtest my strategies? Something like AWS and Azure. Emphasis is free. Scale is a priority here, so really I need like 100 free virtual machines to order to say that implementation is worth while.

Relatedly, have any of you wrote your own linear regression function that performs orders of magnitude faster than a library? I'm using accord-framework.net. I also believe third party libraries are already optimized to be fastest.

Donny

Last edited:

#### BurgerKing

Actually for simple rolling linear regression of a fixed number of variables, you can implement it on CUDA C. It will speed up your regression at least an order of magnitude.

#### donny

Thanks everyone. I heard of CUDA before. I think that's what I was looking for.

Out of the multitude of options - Cuda, own C# implementation, FPGA, Matlab, R - looks like CUDA is the way to go.

I trust your word on the CUDA implementation and look into it.

#### BurgerKing

Well I actually wrote something like that a few years ago. Just paste it here in case it may help.

C++:
#include <cuda_runtime.h>

#define BLOCK_SIZE 32
__global__ void RollingSimpleLinearRegression_kernel(float* x_dev,float* y_dev,float* beta_dev, float* alpha_dev, float* residual_dev, int n, int window){
float Sx=0;
float Sy=0;
float Sxx=0;
float Sxy=0;
float Syy=0;

//First regresssion

float x=0;
float y=0;

for (size_t i=0;i<window;i++){
Sx+=x;
Sy+=y;
Sxx+=x*x;
Sxy=x*y;
Syy=y*y;
}
float beta=(window*Sxy-Sx*Sy)/(n*Sxx-Sx*Sx);
float alpha=1.0f/window*Sy-beta*Sx*1.0f/n;
float residual=0;

if (dataId < n){
x = x_dev[dataId];
y = y_dev[dataId];
residual = y - alpha - beta*x;
//Output
beta_dev[dataId] = beta;
alpha_dev[dataId] = alpha;
residual_dev[dataId] = residual;
//Update metrics
Sx += x - lastX;
Sy += y - lastY;
Sxx += x*x - lastX*lastX;
Sxy = x*y - lastX*lastY;
Syy = y*y - lastY*lastY;
lastY = y;
lastX = x;
beta = (window*Sxy - Sx*Sy) / (n*Sxx - Sx*Sx);
alpha = 1.0f / window*Sy - beta*Sx*1.0f / n;
}
}

}
void RollingSimpleLinearRegression(float* x,float* y,float* beta, float* alpha, float* residual, int n, int window){
cudaSetDevice(0);
cudaFree(0);

cudaStream_t stream;
cudaError_t result;

result = cudaStreamCreate ( &stream) ;

if (result!=cudaSuccess){
PrintError("Create stream");
return;
}
size_t vecSzInBytes_float=n * sizeof(float);

//Allocate vectors in device memory
float* x_dev=0;
float* y_dev=0;
float* alpha_dev=0;
float* beta_dev=0;
float* residual_dev=0;

cudaMalloc(&x_dev               ,vecSzInBytes_float);
cudaMalloc(&y_dev               ,vecSzInBytes_float);
cudaMalloc(&alpha_dev            ,vecSzInBytes_float);
cudaMalloc(&beta_dev            ,vecSzInBytes_float);
cudaMalloc(&residual_dev        ,vecSzInBytes_float);

cudaMemset(alpha_dev           ,0,vecSzInBytes_float);
cudaMemset(beta_dev               ,0,vecSzInBytes_float);
cudaMemset(residual_dev           ,0,vecSzInBytes_float);

cudaMemcpyAsync(x_dev          ,x,vecSzInBytes_float,cudaMemcpyHostToDevice,stream);
cudaMemcpyAsync(y_dev          ,y,vecSzInBytes_float,cudaMemcpyHostToDevice,stream);

cudaStreamSynchronize(stream);

PrintError("cudaMemcpyAsync and cudaMemset");
//Calculate number of blocks
RollingSimpleLinearRegression_kernel<<<noBlock,BLOCK_SIZE>>>(x_dev,y_dev,beta_dev,alpha_dev,residual_dev,n,window);
cudaStreamSynchronize(stream);
//Copy back to host
cudaMemcpyAsync(beta            ,beta_dev,vecSzInBytes_float,cudaMemcpyDeviceToHost,stream);
cudaMemcpyAsync(alpha            ,alpha_dev,vecSzInBytes_float,cudaMemcpyDeviceToHost,stream);
cudaMemcpyAsync(residual        ,residual_dev,vecSzInBytes_float,cudaMemcpyDeviceToHost,stream);

cudaStreamSynchronize(stream);

//Free memory
cudaFree(x_dev);
cudaFree(y_dev);
cudaFree(beta_dev);
cudaFree(alpha_dev);
cudaFree(residual_dev);

cudaStreamSynchronize(stream);
PrintError("End function");
}

#### donny

Any idea why there isn't a simple linear regression library for CUDA?

I'm been searching the internet and only found -
https://devtalk.nvidia.com/default/topic/494482/linear-regression-cuda-code/

I'm being pushed for results so if it'll take time, I need to go back to my manager and propose a tangent from our infrastructure to develop on CUDA, which may not be taken well.

Did a few more searches. Unless there's no such thing. I found something like this: Home

Basically, can I just go
CUDALinearModel aModel = new CUDALinearModel()
aModel.Regress(inputs, outputs)

Or I'll have to learn this CUDA thing from scratch.

Last edited:

#### BurgerKing

Last time I checked there was no such library widely available (maybe there is a proprietary one). And also it will be quite inefficient as you will have to spend a round-trip to copy data to GPU memory, linear-regressing it, then copy it back for each regression, i.e. no batch regression. If you only do simple linear regression then I think my codes are ok. Otherwise, you will have to do something along the line of batch QR decomposition to speed up your multiple regression.

#### donny

Thank you BurgerKing. Actually, I did a quick prototype on Matlab to backtest 10 years 5 mins with linear regression. You know what, it's obviously faster. So quick question to confirm:

In general, which is faster in linear regression - scientific languages (Matlab, R) or C# with library Accord.NET Machine Learning Framework

A discussion with my colleague said that scientific languages are meant for that sort of work. If what seems to be a 10x speed increase is true, I can quickly switch backtesting to Matlab, check results, then recode the strategy in C# for execution.

#### BurgerKing

Actually, if you only do regression, then MATLAB is highly optimized for that kind of work, and is likely faster than the C# codes (I have no idea how good is R though). 10x is a bit too much, but maybe 2-3 times. I think you can try a simple prototype with MATLAB parfor command first with all your CPU cores. If still not enough then either rack-up a MATLAB computing cluster or code a MATLAB mex file to leverage CUDA in C, or both.

I think the trick is when you backtest your strategy, it's likely that you will have to do a lot of rolling regression with 5 minute increment, so try to reuse/online update the past estimation instead of doing the whole regression again.

#### donny

I did a quick prototype to test speed for C#, Matlab and R. The set up is regress y ~ x1 + x2 where y, x1 and x2 are each 8,000 randomly generated numbers. Then repeat the regression 5,000 times.

Here's what I got. I'm sharing it with you since everyone else in Quantnet seems quiet.

Matlab: 17 seconds.
R: 55 seconds.
C#: 24 seconds.

Of course, you have to consider the time taken to generate the random numbers. I decided on new random numbers each pass just to be sure the language does a new regression in case the previous one was saved.

Looks like Matlab it is.

Though the problem I see here is that I'm not sure the differences are amplified if I used actual data and not random data. The scientific library might be able to pick up on the commonalities at each pass and optimized further.

I guess that's why I'm a quantitative research or not a software engineer. Thanks for the good discussion. At least I have something to back to my boss with.

#### ExSan

what about bitcoin mining using the cloud ?

#### BurgerKing

@donny I think it should be faster if you are using real data, cos the data will be pre-allocated in memory and MATLAB memory performance is actually quite good for a scripting language.

@ExSan IMO, cloud computing has very poor price/performance ratio, which means you won't be able to profit from mining using cloud. Most people/companies uses cloud because it saves them maintenance and setup cost, not because of its performance.

#### pingu

what about bitcoin mining using the cloud ?
This is a money losing proposition. Even using ASICs (which I have done) won't justify the cost today. (It didn't justify the cost a year ago when I did it but it was a fun project)

#### ExSan

This is a money losing proposition. Even using ASICs (which I have done) won't justify the cost today. (It didn't justify the cost a year ago when I did it but it was a fun project)
ASICs ?

#### pingu

ASICs ?
Application-specific integrated circuit - Wikipedia, the free encyclopedia
You can buy specific circuits to mine bitcoin. They have been around sort of cheap for a while. You can buy highly sophisticated ones or small ones depending on your taste. These are specifically designed to only solve the bitcoin problem so they don't waste time/energy in anything else. If you aren't doing ASIC miners, you can pretty much kiss the odds of finding a block goodbye.

Mining hardware comparison - Bitcoin Wiki
Non-specialized hardware comparison - Bitcoin Wiki
Bitcoin Mining Hardware - ASIC Bitcoin Miner - Butterfly Labs
Can Hobbyist Bitcoin Miners Still Make a Buck?
Blockchain Smashers

Last edited:

#### ExSan

Application-specific integrated circuit - Wikipedia, the free encyclopedia
You can buy specific circuits to mine bitcoin. They have been around sort of cheap for a while. You can buy highly sophisticated ones or small ones depending on your taste. These are specifically designed to only solve the bitcoin problem so they don't waste time/energy in anything else. If you aren't doing ASIC miners, you can pretty much kiss the odds of finding a block goodbye.

Mining hardware comparison - Bitcoin Wiki
Non-specialized hardware comparison - Bitcoin Wiki
Bitcoin Mining Hardware - ASIC Bitcoin Miner - Butterfly Labs
Can Hobbyist Bitcoin Miners Still Make a Buck?
Blockchain Smashers
thanks a lot !

#### donny

Hmmm, I repeated the test again using actual prices and the improvement wasn't as much. For linear regression of 8000 data points, repeated 5000 times using real AUDUSD and ES prices, Matlab took 70% of the time C# took to perform it. R was slower than either Matlab and C#.

And I went to my boss and said linear regression done on Matlab was faster than C#, and he was shocked. Argh.

#### pingu

Hmmm, I repeated the test again using actual prices and the improvement wasn't as much. For linear regression of 8000 data points, repeated 5000 times using real AUDUSD and ES prices, Matlab took 70% of the time C# took to perform it. R was slower than either Matlab and C#.

And I went to my boss and said linear regression done on Matlab was faster than C#, and he was shocked. Argh.
did you use loops in your R version?

#### diegosanaz

##### BU MSMF
yeah, there are many tricks that could be used in R to improve performance, such as using some parallel computing library.

#### donny

In R, you can't vectorize regression where each regression uses a vector in itself.

Or at least my knowledge in R believe it can't be done without loops.

The problem is loop X times where each time is a regression of Y points.

Replies
2
Views
5K
Replies
4
Views
2K
Replies
0
Views
1K
Replies
0
Views
1K
Replies
11
Views
9K