BLAS library for matrix multiplication. You cannot come even close (if you didn't get it - I was ironic in my previous post about "impressive" speeds of C# codes posted here) to this speed through coding matrix multiplication in three for loops. To come up to this level of speed, you'd have to utilize SSE processing units, take great care about re-organizing multiplication code with regards to caching, etc. - so you should be very, very knowledgeable about code optimization before even thinking about approaching such sort of task (alternatively, if you're C++ wizard, maybe you could come close through employing some template meta-programming magic, like in

Eigen or alike libraries). For these reasons, for vector/matrix operations, one should always stick to using its vendor supplied version of BLAS library.