Import data to Visual Studio

  • Thread starter Thread starter Mensa
  • Start date Start date
Joined
4/21/15
Messages
87
Points
28
Dear All,

I am doing a project about high frequency trading strategy. My data package is extremely huge (intraday prices for every minute from 1999-), and I want to do some tests for various time interval from 1 min to 10000 min. I have tried Matlab and Python, which are both inefficient, very slow.

I want to implement it in Visual Studio using C++. But I don't know how can I import data file to visual studio, and test my functions. The function is expected to return a huge matrix.

@Daniel Duffy
 
Last edited:
Would it fit into a database application like Oracle for example? If it does I would
a. push the data into a database
b. use some language (python, R or c++) to connect to the database
c. Write an sql to query out the data I wanted to work on and store the data into an STL vector for example (if your using c++) or a data.frame (if your using R)....
 
Dear All,

I am doing a project about high frequency trading strategy. My data package is extremely huge (intraday prices for every minute from 1999-), and I want to do some tests for various time interval from 1 min to 10000 min. I have tried Matlab and Python, which are both inefficient, very slow.

I want to implement it in Visual Studio using C++ to do that. But I don't know how can I import data file to visual studio, and test my functions. The function is expected to return a huge matrix.

@Daniel Duffy
Can you describe the data format? Does the matrix get built up in real time (streaming) or is it one big CSV file that needs to be put in SQL database and then into a matrix?

What's the I/O data flow in the system?

==
I have a standard solution that might be good but it is useful first to know the context.
 
Last edited:
Dear All,

I am doing a project about high frequency trading strategy. My data package is extremely huge (intraday prices for every minute from 1999-), and I want to do some tests for various time interval from 1 min to 10000 min. I have tried Matlab and Python, which are both inefficient, very slow.

I want to implement it in Visual Studio using C++ to do that. But I don't know how can I import data file to visual studio, and test my functions. The function is expected to return a huge matrix.

@Daniel Duffy
How much data? What piece of the process is slow? Reading the data? Operating on the data? writing the matrix?

Did you profile your python and/or Matlab code before jumping to C++? You most probably have to roll most of the C++ implementation.
 
Can you describe the data format? Does the matrix get built up in real time (streaming) or is it one big CSV file that needs to be put in SQL database and then into a matrix?

What's the I/O data flow in the system?

==
I have a standard solution that might be good but it is useful first to know the context.

Thank you Daniel!

The data format is xxx.asc. It looks like this:
Code:
Date,Time,Open,High,Low,Close,Volume
09/10/97,08:31,1120,1120.25,1120,1120.25,0
09/10/97,08:32,1120,1120,1119.75,1120,0
09/10/97,08:33,1119.75,1120,1119.5,1119.75,0
09/10/97,08:34,1120,1120.5,1120,1120.25,0
09/10/97,08:35,1120.25,1120.25,1119.5,1119.5,0
............................................................................
which are market prices from 1997-2016, every minute.

My project is to test a trend following trading strategy. The most important part is to optimize the parameters (details omitted), which means I will need to run multiple loop for all parameters numerous times, and test the P&L out of the sample, then choose the optimal. So the computation will be very expensive.
 
How much data? What piece of the process is slow? Reading the data? Operating on the data? writing the matrix?

Did you profile your python and/or Matlab code before jumping to C++? You most probably have to roll most of the C++ implementation.

Thank you for your reply.

The following is a sample piece of my data:
Code:
Date,Time,Open,High,Low,Close,Volume
09/10/97,08:31,1120,1120.25,1120,1120.25,0
09/10/97,08:32,1120,1120,1119.75,1120,0
09/10/97,08:33,1119.75,1120,1119.5,1119.75,0
09/10/97,08:34,1120,1120.5,1120,1120.25,0
09/10/97,08:35,1120.25,1120.25,1119.5,1119.5,0
............................................................................
which are market prices from 1997-2016, in every minute.

I haven't built the whole structure, but I have done some tests in Python and Matlab, which are indeed inefficient. I already have pseudocode for my strategy, so it is easy to implement in any language.

I will do python to plot some related graphs or R for statistical analysis.

But the most important part will be optimization, in which I will need to run multiple loop for all parameters numerous times, and test the P&L out of sample, then choose the optimal.
 
Would it fit into a database application like Oracle for example? If it does I would
a. push the data into a database
b. use some language (python, R or c++) to connect to the database
c. Write an sql to query out the data I wanted to work on and store the data into an STL vector for example (if your using c++) or a data.frame (if your using R)....

The data format is xxx.asc . So I have to build a database and connect that in visual studio?
 
Thank you for your reply.

The following is a sample piece of my data:
Code:
Date,Time,Open,High,Low,Close,Volume
09/10/97,08:31,1120,1120.25,1120,1120.25,0
09/10/97,08:32,1120,1120,1119.75,1120,0
09/10/97,08:33,1119.75,1120,1119.5,1119.75,0
09/10/97,08:34,1120,1120.5,1120,1120.25,0
09/10/97,08:35,1120.25,1120.25,1119.5,1119.5,0
............................................................................
which are market prices from 1997-2016, in every minute.

I haven't built the whole structure, but I have done some tests in Python and Matlab, which are indeed inefficient. I already have pseudocode for my strategy, so it is easy to implement in any language.

I will do python to plot some related graphs or R for statistical analysis.

But the most important part will be optimization, in which I will need to run multiple loop for all parameters numerous times, and test the P&L out of sample, then choose the optimal.

This ascii file has ~6 billions records so I reckon database is needed because it probably won't fit into memory?

One easy way I find is that on Windows you can use C++/CLI and ADO.NET to extract the records from the file and store in a table (date/time is a key) in combination with regex (incidentally, this is an exercise in my new C++ Baruch course).

For the optimization part, do you need all 10-year data in memory?
 
Last edited:
Store the information in a fast <key, value> store and then manipulate from other language. I did this before with BDB and Java and it was lightning fast. Check Tokyo Cabinet and its successor Kyoto Cabinet.

You could use HDF5 and then use Java, Python, C, C++ or whatever. This is how CERN does some of its processing.

Or you can partition the data by chunks and use free KDB.
 
Store the information in a fast <key, value> store and then manipulate from other language. I did this before with BDB and Java and it was lightning fast. Check Tokyo Cabinet and its successor Kyoto Cabinet.

You could use HDF5 and then use Java, Python, C, C++ or whatever. This is how CERN does some of its processing.

Or you can partition the data by chunks and use free KDB.
The OP said it is in Visual Studio.
 
The OP said it is in Visual Studio.
He wants to do the coding in Visual Studio. That's the IDE. He can pick any language that works within Visual Studio.

I gave him storage options. These are orthogonal concepts.
 
He wants to do the coding in Visual Studio. That's the IDE. He can pick any language that works within Visual Studio.

I gave him storage options. These are orthogonal concepts.
My bad. I misquoted.

I want to implement it in Visual Studio using C++ to do that. But I don't know how can I import data file to visual studio, and test my functions. The function is expected to return a huge matrix.

@Mensa
Is C++ a hard requirement?
 
Thank you so much for such useful suggestion.

For the optimization part, do you need all 10-year data in memory?
Yes, I think I will need to keep all data in memory.

Is C++ a hard requirement?
Not quite strict. My professor just said he recommends C/C++/C# for high performance computing. Since multiple loops are necessary, Python or Matlab might be time consuming.

Why I said "in visual studio" is because my c++ experience was all in VS... But I will consider any other feasible plan.
 
Store the information in a fast <key, value> store and then manipulate from other language. I did this before with BDB and Java and it was lightning fast. Check Tokyo Cabinet and its successor Kyoto Cabinet.

You could use HDF5 and then use Java, Python, C, C++ or whatever. This is how CERN does some of its processing.

Or you can partition the data by chunks and use free KDB.

Thank you Alain.

Actually, the Date/Time data are not needed, since the strategy is only based on time intervals. Simply speaking, I just need to have a huge matrix of float numbers. I thought I could do some way not such complex?
 
How about using ifstream/ofstream to read the table from my data file and output my results to another file?

Since I don't need the date/time, I can clean the original data to has only numbers. As long as I get the 2-D array, I can do what I want, because that's just computation in multiple loops.

@Daniel Duffy
 
Back
Top Bottom