• C++ Programming for Financial Engineering
    Highly recommended by thousands of MFE students. Covers essential C++ topics with applications to financial engineering. Learn more Join!
    Python for Finance with Intro to Data Science
    Gain practical understanding of Python to read, understand, and write professional Python code for your first day on the job. Learn more Join!
    An Intuition-Based Options Primer for FE
    Ideal for entry level positions interviews and graduate studies, specializing in options trading arbitrage and options valuation models. Learn more Join!

Matrix Classes in 2012

Joined
3/7/10
Messages
69
Points
18
I thought it would be interesting to start a thread discussing the issues around implementing a Matrix class in C++/CUDA/C#/Python. Although this topic has been talked about before, I'd like to see how people have approached the problem recently, now that we're heading into 2013. Some topics:

C# and Python/NumPy are encroaching more heavily on the financial space, but does C++ still remain the leading implementation?

As a learning exercise, constructing a Matrix class can teach a lot about C++ concepts. There is definitely tangible benefit in writing your own. Efficient storage and iteration in full, sparse, banded, block-banded and tridiagonal matrices make for interesting implementation problems. How have people gone about this in their own work?

Also, I'd be interested to hear how people are implementing numerical algorithms (decomposition, iterative etc) , such as LU, QR, Cholesky, Gauss-Seidel/SOR...

Once past the "learning stage" what production libraries are being used? uBlas et al? How is CUDA/GPU changing the landscape?

Looking forward to some interesting thoughts.
 
I ended up using a custom Matrix class for my Numerical Methods class in my MFE Program. It was mostly because I was familiar with it, having been assigned to create a Matrix class in a previous intro C++ class. But I could have used Eigen, or boost uBlas. In the Numerical Methods class we went over many of the algorithms you mentioned (relying mostly on the Professor's lecture notes and Numerical Linear Algebra by Trefethen). I haven't had the opportunity to use these algos in the real world yet, so can't comment on that.
 
check these 2 links:
If you are looking for raw speed in this space, FORTRAN and C are still kings of the mountain.

CUDA/OpenCL have their own set of headaches. You will need to be really careful in order to use them... and up to one year ago, if you didn't have a Tesla board or a server, it was sort of a waste of time.
 
So uBlas still seems like a good contender?

I haven't actually utilised CUDA much beyond a few "toy" problems from some introductory books. Although, Alain, I'd be interested in why you thought it was a waste of time? Do you mean in terms of productivity or just that the speed-up wasn't worth it?
 
So uBlas still seems like a good contender?

I haven't actually utilised CUDA much beyond a few "toy" problems from some introductory books. Although, Alain, I'd be interested in why you thought it was a waste of time? Do you mean in terms of productivity or just that the speed-up wasn't worth it?
I should've been clearer. It all depends on the size of the problem. CUDA cards are very fast at a very specific set of problems (a lot of add multiply with small size). If your problem grows in size, the issue becomes moving data between the host and the CUDA cards and that it's another problem completely.
 
It all depends on the size of the problem. CUDA cards are very fast at a very specific set of problems (a lot of add multiply with small size). If your problem grows in size, the issue becomes moving data between the host and the CUDA cards and that it's another problem completely.

These are pretty broad statements, true for some problems, untrue for many others...

In any case: CUBLAS is BLAS implementation delivered along with CUDA toolkit. There are also several high-quality libraries implementing LAPACK-type stuff, like CULA or MAGMA. So dense matrix operations are covered really well, and for sparse matrices there is also CUSPARSE library delivered along with CUDA toolkit and covering most of BLAS functionality and going even further, as well as lots of very interesting external libraries, like Cusp.

All of these codes are typically showing significant speedup when compared to best CPU codes (like MKL). Furthermore, coming from rather large project related with sparse solvers, I can confirm from first-hand experience that if you know your CUDA stuff, than through coding most of your algorithms in CUDA C by yourself, you can achieve even more significant speedups (vs. heavily optimized CPU code).

Note also that there exist many other approaches to exploit GPU compute resources, that could be used to achieve some speedups over CPU code even if you don't code to the bare metal like with CUDA C. For example, for Matlab guys out there, there is excellent Jacket add-on, that makes it possible to execute most of Matlab array operations on GPU, and thus speedup even existing Matlab codes (with minimal modifications) significantly.
 
As far as CPU-based computing is concerned, I like Eigen: http://eigen.tuxfamily.org/ -- and they've just relicensed to Mozilla Public License 2, so no more LGPL issues :]. I also find the syntax more likeable than, say, uBlas (which happens to be very, very slow).

Blaze also looks promising: http://code.google.com/p/blaze-lib/

As for raw BLAS and LAPACK, for all their prominence, by today's standards their raw APIs are a joke w.r.t. code elegance (read: maintainability) -- in particular, interface design / naming scheme (encoding data type and matrix type in the function name) smells of Systems Hungarian (not to mention code readability issues necessarily associated with having to stay "within the very tight limits of standard Fortran 77 6-character names"). Even if you're using (and used to) Fortran, I'd recommend a wrapper like LAPACK95 in this environment, at least it improves upon the data type encoding. (Note, however, you might still want to pick up the syntax if you're interested in GPGPU using C for CUDA, as cuBLAS follows the interface conventions; although higher-level/wrapper libs should become more widespread with time).

The cherry on the icing on the cake is that all this low-level syntax not only doesn't buy you anything, but it may even make your overall computations effectively slower: http://eigen.tuxfamily.org/index.php?title=FAQ#How_does_Eigen_compare_to_BLAS.2FLAPACK.3F

Somehow, this reminds me of the "Ghastly Style" and "Low-level != efficient" sections of Bjarne's recent talk:
http://ecn.channel9.msdn.com/events/GoingNative12/GN12Cpp11Style.pdf

Other than that, in general (for C++) take a look at the following thread -- the choice usually strongly depends on your needs, there's no one-size-fits-all solution (and, arguably, there never will be): http://www.wilmott.com/messageview.cfm?catid=44&threadid=87551

Writing your own matrix class might be a good educational experience, but after that, if your needs are spanned by the available libs (see above) it's a waste of time (and, chances are, performance) and/or might be a signal of some deeper organizational issues at your workplace (NIH syndrome anyone?).
 
As far as CPU-based computing is concerned, I like Eigen: http://eigen.tuxfamily.org/ -- and they've just relicensed to Mozilla Public License 2, so no more LGPL issues :]. I also find the syntax more likeable than, say, uBlas (which happens to be very, very slow).
Hi Polter,
I use uBLAS at the moment but speed is not a concern as I am focusing on accuracy of numerical methods, which in itself is extremely time-consuming, (and I have eliminated the need for LU decompositon). All I need is to store and retrieve data.

What is useful is uBLAS matrices proxies that I can use with parallel data decomposition when it is needed. They also allow a 1:1 mapping to a number of matrix algos. But maybe Eigen can do that as well?

Let'e say I want to port code from uBLAS to Eigen. Would that be an easy process?

BTW the LU test code delivered in Boost uBLAS _is_ very slow and it's wrong implementation. It should take no more 10 lines and it spits out an L and a U matrix. There is a discussion on the W forum.

http://www.wilmott.com/messageview.cfm?catid=10&threadid=86255&FTVAR_MSGDBTABLE=

Is slowness also in the internal data structure as well?

What would be very useful is the ability to model block matrices easily and efficiently.
 
Also, I'd be interested to hear how people are implementing numerical algorithms (decomposition, iterative etc) , such as LU, QR, Cholesky, Gauss-Seidel/SOR...

I've coded these all up using Eigen. It was faily simple.
 
I've coded these all up using Eigen. It was faily simple.

You mean you have called LU, QR from your code, or did you implement them from scratch?
There's a lot of needed stuff that is not in Eigen!
 
Back
Top