• C++ Programming for Financial Engineering
    Highly recommended by thousands of MFE students. Covers essential C++ topics with applications to financial engineering.
    Python for Finance with Intro to Data Science
    Gain practical understanding of Python to read, understand, and write professional Python code for your first day on the job. Coming soon.
    An Intuition-Based Options Primer for FE
    Ideal for entry level positions interviews and graduate studies, specializing in options trading arbitrage and options valuation models.

Import data to Visual Studio

Mensa

Member
Dear All,

I am doing a project about high frequency trading strategy. My data package is extremely huge (intraday prices for every minute from 1999-), and I want to do some tests for various time interval from 1 min to 10000 min. I have tried Matlab and Python, which are both inefficient, very slow.

I want to implement it in Visual Studio using C++. But I don't know how can I import data file to visual studio, and test my functions. The function is expected to return a huge matrix.

@Daniel Duffy
 
Last edited:

elektor

Active Member
C++ Student
Would it fit into a database application like Oracle for example? If it does I would
a. push the data into a database
b. use some language (python, R or c++) to connect to the database
c. Write an sql to query out the data I wanted to work on and store the data into an STL vector for example (if your using c++) or a data.frame (if your using R)....
 

Daniel Duffy

C++ author, trainer
Dear All,

I am doing a project about high frequency trading strategy. My data package is extremely huge (intraday prices for every minute from 1999-), and I want to do some tests for various time interval from 1 min to 10000 min. I have tried Matlab and Python, which are both inefficient, very slow.

I want to implement it in Visual Studio using C++ to do that. But I don't know how can I import data file to visual studio, and test my functions. The function is expected to return a huge matrix.

@Daniel Duffy
Can you describe the data format? Does the matrix get built up in real time (streaming) or is it one big CSV file that needs to be put in SQL database and then into a matrix?

What's the I/O data flow in the system?

==
I have a standard solution that might be good but it is useful first to know the context.
 
Last edited:

pingu

Well-Known Member
Dear All,

I am doing a project about high frequency trading strategy. My data package is extremely huge (intraday prices for every minute from 1999-), and I want to do some tests for various time interval from 1 min to 10000 min. I have tried Matlab and Python, which are both inefficient, very slow.

I want to implement it in Visual Studio using C++ to do that. But I don't know how can I import data file to visual studio, and test my functions. The function is expected to return a huge matrix.

@Daniel Duffy
How much data? What piece of the process is slow? Reading the data? Operating on the data? writing the matrix?

Did you profile your python and/or Matlab code before jumping to C++? You most probably have to roll most of the C++ implementation.
 

Mensa

Member
Can you describe the data format? Does the matrix get built up in real time (streaming) or is it one big CSV file that needs to be put in SQL database and then into a matrix?

What's the I/O data flow in the system?

==
I have a standard solution that might be good but it is useful first to know the context.
Thank you Daniel!

The data format is xxx.asc. It looks like this:
Code:
Date,Time,Open,High,Low,Close,Volume
09/10/97,08:31,1120,1120.25,1120,1120.25,0
09/10/97,08:32,1120,1120,1119.75,1120,0
09/10/97,08:33,1119.75,1120,1119.5,1119.75,0
09/10/97,08:34,1120,1120.5,1120,1120.25,0
09/10/97,08:35,1120.25,1120.25,1119.5,1119.5,0
............................................................................
which are market prices from 1997-2016, every minute.

My project is to test a trend following trading strategy. The most important part is to optimize the parameters (details omitted), which means I will need to run multiple loop for all parameters numerous times, and test the P&L out of the sample, then choose the optimal. So the computation will be very expensive.
 

Mensa

Member
How much data? What piece of the process is slow? Reading the data? Operating on the data? writing the matrix?

Did you profile your python and/or Matlab code before jumping to C++? You most probably have to roll most of the C++ implementation.
Thank you for your reply.

The following is a sample piece of my data:
Code:
Date,Time,Open,High,Low,Close,Volume
09/10/97,08:31,1120,1120.25,1120,1120.25,0
09/10/97,08:32,1120,1120,1119.75,1120,0
09/10/97,08:33,1119.75,1120,1119.5,1119.75,0
09/10/97,08:34,1120,1120.5,1120,1120.25,0
09/10/97,08:35,1120.25,1120.25,1119.5,1119.5,0
............................................................................
which are market prices from 1997-2016, in every minute.

I haven't built the whole structure, but I have done some tests in Python and Matlab, which are indeed inefficient. I already have pseudocode for my strategy, so it is easy to implement in any language.

I will do python to plot some related graphs or R for statistical analysis.

But the most important part will be optimization, in which I will need to run multiple loop for all parameters numerous times, and test the P&L out of sample, then choose the optimal.
 

Mensa

Member
Would it fit into a database application like Oracle for example? If it does I would
a. push the data into a database
b. use some language (python, R or c++) to connect to the database
c. Write an sql to query out the data I wanted to work on and store the data into an STL vector for example (if your using c++) or a data.frame (if your using R)....
The data format is xxx.asc . So I have to build a database and connect that in visual studio?
 

Daniel Duffy

C++ author, trainer
Thank you for your reply.

The following is a sample piece of my data:
Code:
Date,Time,Open,High,Low,Close,Volume
09/10/97,08:31,1120,1120.25,1120,1120.25,0
09/10/97,08:32,1120,1120,1119.75,1120,0
09/10/97,08:33,1119.75,1120,1119.5,1119.75,0
09/10/97,08:34,1120,1120.5,1120,1120.25,0
09/10/97,08:35,1120.25,1120.25,1119.5,1119.5,0
............................................................................
which are market prices from 1997-2016, in every minute.

I haven't built the whole structure, but I have done some tests in Python and Matlab, which are indeed inefficient. I already have pseudocode for my strategy, so it is easy to implement in any language.

I will do python to plot some related graphs or R for statistical analysis.

But the most important part will be optimization, in which I will need to run multiple loop for all parameters numerous times, and test the P&L out of sample, then choose the optimal.
This ascii file has ~6 billions records so I reckon database is needed because it probably won't fit into memory?

One easy way I find is that on Windows you can use C++/CLI and ADO.NET to extract the records from the file and store in a table (date/time is a key) in combination with regex (incidentally, this is an exercise in my new C++ Baruch course).

For the optimization part, do you need all 10-year data in memory?
 
Last edited:

alain

Older and Wiser
Store the information in a fast <key, value> store and then manipulate from other language. I did this before with BDB and Java and it was lightning fast. Check Tokyo Cabinet and its successor Kyoto Cabinet.

You could use HDF5 and then use Java, Python, C, C++ or whatever. This is how CERN does some of its processing.

Or you can partition the data by chunks and use free KDB.
 

Daniel Duffy

C++ author, trainer
Store the information in a fast <key, value> store and then manipulate from other language. I did this before with BDB and Java and it was lightning fast. Check Tokyo Cabinet and its successor Kyoto Cabinet.

You could use HDF5 and then use Java, Python, C, C++ or whatever. This is how CERN does some of its processing.

Or you can partition the data by chunks and use free KDB.
The OP said it is in Visual Studio.
 

alain

Older and Wiser
The OP said it is in Visual Studio.
He wants to do the coding in Visual Studio. That's the IDE. He can pick any language that works within Visual Studio.

I gave him storage options. These are orthogonal concepts.
 

Daniel Duffy

C++ author, trainer
He wants to do the coding in Visual Studio. That's the IDE. He can pick any language that works within Visual Studio.

I gave him storage options. These are orthogonal concepts.
My bad. I misquoted.

I want to implement it in Visual Studio using C++ to do that. But I don't know how can I import data file to visual studio, and test my functions. The function is expected to return a huge matrix.

@Mensa
Is C++ a hard requirement?
 

Mensa

Member
Thank you so much for such useful suggestion.

For the optimization part, do you need all 10-year data in memory?
Yes, I think I will need to keep all data in memory.

Is C++ a hard requirement?
Not quite strict. My professor just said he recommends C/C++/C# for high performance computing. Since multiple loops are necessary, Python or Matlab might be time consuming.

Why I said "in visual studio" is because my c++ experience was all in VS... But I will consider any other feasible plan.
 

Mensa

Member
Store the information in a fast <key, value> store and then manipulate from other language. I did this before with BDB and Java and it was lightning fast. Check Tokyo Cabinet and its successor Kyoto Cabinet.

You could use HDF5 and then use Java, Python, C, C++ or whatever. This is how CERN does some of its processing.

Or you can partition the data by chunks and use free KDB.
Thank you Alain.

Actually, the Date/Time data are not needed, since the strategy is only based on time intervals. Simply speaking, I just need to have a huge matrix of float numbers. I thought I could do some way not such complex?
 

Mensa

Member
How about using ifstream/ofstream to read the table from my data file and output my results to another file?

Since I don't need the date/time, I can clean the original data to has only numbers. As long as I get the 2-D array, I can do what I want, because that's just computation in multiple loops.

@Daniel Duffy
 
Top