Matrix Classes in 2012

quantstart · 9/26/12

I thought it would be interesting to start a thread discussing the issues around implementing a Matrix class in C++/CUDA/C#/Python. Although this topic has been talked about before, I'd like to see how people have approached the problem recently, now that we're heading into 2013. Some topics:

C# and Python/NumPy are encroaching more heavily on the financial space, but does C++ still remain the leading implementation?

As a learning exercise, constructing a Matrix class can teach a lot about C++ concepts. There is definitely tangible benefit in writing your own. Efficient storage and iteration in full, sparse, banded, block-banded and tridiagonal matrices make for interesting implementation problems. How have people gone about this in their own work?

Also, I'd be interested to hear how people are implementing numerical algorithms (decomposition, iterative etc) , such as LU, QR, Cholesky, Gauss-Seidel/SOR...

Once past the "learning stage" what production libraries are being used? uBlas et al? How is CUDA/GPU changing the landscape?

Looking forward to some interesting thoughts.

Tom Maloney · 9/26/12

I ended up using a custom Matrix class for my Numerical Methods class in my MFE Program. It was mostly because I was familiar with it, having been assigned to create a Matrix class in a previous intro C++ class. But I could have used Eigen, or boost uBlas. In the Numerical Methods class we went over many of the algorithms you mentioned (relying mostly on the Professor's lecture notes and Numerical Linear Algebra by Trefethen). I haven't had the opportunity to use these algos in the real world yet, so can't comment on that.

Andy Nguyen · 9/26/12

I wrote the whole matrix class back in 2006 and implemented all the numerical methods you mention for our C++ pricing class back then because it was required and we don't have a choice of using any external library.

alain · 9/27/12

check these 2 links:

If you are looking for raw speed in this space, FORTRAN and C are still kings of the mountain.

CUDA/OpenCL have their own set of headaches. You will need to be really careful in order to use them... and up to one year ago, if you didn't have a Tesla board or a server, it was sort of a waste of time.

quantstart · 9/27/12

So uBlas still seems like a good contender?

I haven't actually utilised CUDA much beyond a few "toy" problems from some introductory books. Although, Alain, I'd be interested in why you thought it was a waste of time? Do you mean in terms of productivity or just that the speed-up wasn't worth it?

alain · 9/27/12

quantstart said:
So uBlas still seems like a good contender?

I haven't actually utilised CUDA much beyond a few "toy" problems from some introductory books. Although, Alain, I'd be interested in why you thought it was a waste of time? Do you mean in terms of productivity or just that the speed-up wasn't worth it?

I should've been clearer. It all depends on the size of the problem. CUDA cards are very fast at a very specific set of problems (a lot of add multiply with small size). If your problem grows in size, the issue becomes moving data between the host and the CUDA cards and that it's another problem completely.

cgorac · 9/27/12

alain said:
It all depends on the size of the problem. CUDA cards are very fast at a very specific set of problems (a lot of add multiply with small size). If your problem grows in size, the issue becomes moving data between the host and the CUDA cards and that it's another problem completely.

These are pretty broad statements, true for some problems, untrue for many others...

In any case: CUBLAS is BLAS implementation delivered along with CUDA toolkit. There are also several high-quality libraries implementing LAPACK-type stuff, like CULA or MAGMA. So dense matrix operations are covered really well, and for sparse matrices there is also CUSPARSE library delivered along with CUDA toolkit and covering most of BLAS functionality and going even further, as well as lots of very interesting external libraries, like Cusp.

All of these codes are typically showing significant speedup when compared to best CPU codes (like MKL). Furthermore, coming from rather large project related with sparse solvers, I can confirm from first-hand experience that if you know your CUDA stuff, than through coding most of your algorithms in CUDA C by yourself, you can achieve even more significant speedups (vs. heavily optimized CPU code).

Note also that there exist many other approaches to exploit GPU compute resources, that could be used to achieve some speedups over CPU code even if you don't code to the bare metal like with CUDA C. For example, for Matlab guys out there, there is excellent Jacket add-on, that makes it possible to execute most of Matlab array operations on GPU, and thus speedup even existing Matlab codes (with minimal modifications) significantly.

Polter · 10/4/12

As far as CPU-based computing is concerned, I like Eigen: http://eigen.tuxfamily.org/ -- and they've just relicensed to Mozilla Public License 2, so no more LGPL issues :]. I also find the syntax more likeable than, say, uBlas (which happens to be very, very slow).

Blaze also looks promising: http://code.google.com/p/blaze-lib/

As for raw BLAS and LAPACK, for all their prominence, by today's standards their raw APIs are a joke w.r.t. code elegance (read: maintainability) -- in particular, interface design / naming scheme (encoding data type and matrix type in the function name) smells of Systems Hungarian (not to mention code readability issues necessarily associated with having to stay "within the very tight limits of standard Fortran 77 6-character names"). Even if you're using (and used to) Fortran, I'd recommend a wrapper like LAPACK95 in this environment, at least it improves upon the data type encoding. (Note, however, you might still want to pick up the syntax if you're interested in GPGPU using C for CUDA, as cuBLAS follows the interface conventions; although higher-level/wrapper libs should become more widespread with time).

The cherry on the icing on the cake is that all this low-level syntax not only doesn't buy you anything, but it may even make your overall computations effectively slower: http://eigen.tuxfamily.org/index.php?title=FAQ#How_does_Eigen_compare_to_BLAS.2FLAPACK.3F

Somehow, this reminds me of the "Ghastly Style" and "Low-level != efficient" sections of Bjarne's recent talk:
http://ecn.channel9.msdn.com/events/GoingNative12/GN12Cpp11Style.pdf

Other than that, in general (for C++) take a look at the following thread -- the choice usually strongly depends on your needs, there's no one-size-fits-all solution (and, arguably, there never will be): http://www.wilmott.com/messageview.cfm?catid=44&threadid=87551

Writing your own matrix class might be a good educational experience, but after that, if your needs are spanned by the available libs (see above) it's a waste of time (and, chances are, performance) and/or might be a signal of some deeper organizational issues at your workplace (NIH syndrome anyone?).

Daniel Duffy · 10/24/12

As far as CPU-based computing is concerned, I like Eigen: http://eigen.tuxfamily.org/ -- and they've just relicensed to Mozilla Public License 2, so no more LGPL issues :]. I also find the syntax more likeable than, say, uBlas (which happens to be very, very slow).
Hi Polter,
I use uBLAS at the moment but speed is not a concern as I am focusing on accuracy of numerical methods, which in itself is extremely time-consuming, (and I have eliminated the need for LU decompositon). All I need is to store and retrieve data.

What is useful is uBLAS matrices proxies that I can use with parallel data decomposition when it is needed. They also allow a 1:1 mapping to a number of matrix algos. But maybe Eigen can do that as well?

Let'e say I want to port code from uBLAS to Eigen. Would that be an easy process?

BTW the LU test code delivered in Boost uBLAS _is_ very slow and it's wrong implementation. It should take no more 10 lines and it spits out an L and a U matrix. There is a discussion on the W forum.

http://www.wilmott.com/messageview.cfm?catid=10&threadid=86255&FTVAR_MSGDBTABLE=

Is slowness also in the internal data structure as well?

What would be very useful is the ability to model block matrices easily and efficiently.

MRoss · 10/24/12

quantstart said:
Also, I'd be interested to hear how people are implementing numerical algorithms (decomposition, iterative etc) , such as LU, QR, Cholesky, Gauss-Seidel/SOR...

I've coded these all up using Eigen. It was faily simple.

Daniel Duffy · 10/25/12

Somehow, this reminds me of the "Ghastly Style" and "Low-level != efficient" sections of Bjarne's recent talk:
http://ecn.channel9.msdn.com/events/GoingNative12/GN12Cpp11Style.pdf

Seems kind of obvious to me. Bertrand Meyer (Eiffel) was saying exactly the same things in 1987.
//

Underspecified / overly general:
Rectangle(int,int,int,int);

•Better:
–Rectangle(Point top_left, Box_hw b);

OOD 101

Daniel Duffy · 10/25/12

MRoss said:
I've coded these all up using Eigen. It was faily simple.

You mean you have called LU, QR from your code, or did you implement them from scratch?
There's a lot of needed stuff that is not in Eigen!

Polter · 10/25/12

Daniel Duffy said:
Let'e say I want to port code from uBLAS to Eigen. Would that be an easy process?

Shouldn't be hard for a general numeric linear algebra code (i.e., unless you're using some highly-uBLAS-specific functionality) , just take a look:
http://eigen.tuxfamily.org/dox/QuickRefPage.html
http://eigen.tuxfamily.org/dox/AsciiQuickReference.txt

soliton · 10/25/12

some interesting answers

http://scicomp.stackexchange.com/qu...matrix-library?newsletter=1&nlcode=92976|117b

Daniel Duffy · 10/25/12

Polter said:
Shouldn't be hard for a general numeric linear algebra code (i.e., unless you're using some highly-uBLAS-specific functionality) , just take a look:
http://eigen.tuxfamily.org/dox/QuickRefPage.html
http://eigen.tuxfamily.org/dox/AsciiQuickReference.txt

I'm using +, * and so on.

But uBLAS has 1) patterned 2) views for matrices and that is certainly useful.

Matrix Classes in 2012

quantstart

Tom Maloney

Andy Nguyen

alain

Older and Wiser

quantstart

alain

Older and Wiser

cgorac

Polter

Daniel Duffy

C++ author, trainer

MRoss

Daniel Duffy

C++ author, trainer

Daniel Duffy

C++ author, trainer

Polter

soliton

Daniel Duffy

C++ author, trainer