SQL interraction with VS

Tsotne · 5/5/11

I have installed DexExpress on VS 2010 and I want to have a good intellisence support on my code side which Devexpress has and I'm not sure where I can switch it on. Anyone aware of this? Thanks

euroazn · 5/5/11

Tsotne said:
I wonder if anyone has a matrix multiplication algorithm made in SQL. I know you'll wonder why I need it: to compare the speed with C# algorithm and also to see the complexity associated with SQL script defining a matrix multiplication. I'm intending to move to SQL and define many methods I have already done in C# to SQL since I find it faster in my specific cases.

Thanks

Doesn't seem appropriate, imo.

Tsotne · 5/5/11

I know it is not common, but allocating array everytime you want to manipulate on it seems a bit tedious for processing.

tobias elbert · 5/5/11

Sometimes a plain old binary file is the better option over SQL depending on what your data looks like. Ask yourself:

- Do you have many updates to a record set?
- What do my selects look like? Am I targeting specific records?

If no to both, you are often better off using binary files where you store your data in columnar format (ie a file per column). Since you are mentioning matrices, I'd think that the dimensions won't change once inserted to permanent storage and hence no further updates required.

John Wu from Berkeley Labs has built an open source database that basically sits on top of your file system and works with columnar binary files. It's a great piece of software as it's not intrusive at all. I have been using it for a while and am pretty happy with it. Once it's built its indexes it's extremely fast at querying the data you are after - much faster than MSSQL or MySQL in my experience/ use case.

Here is the link:
http://crd.lbl.gov/~kewu/fastbit/

It has a bit of a learning curve though and you might have to build some C# wrappers for it.

Tsotne · 5/6/11

I'm in search of faster platform to host my algorithms. I can translate then into many languages but the concern is speed. I think my most optimal option are (first) to depend on garbage collection and perform matrix multiplication (for example) in C# and assume that all the allocated arrays will be garbage collected and leave no harm to the overall application in terms of loading and processing time. And the second option is to directly forward the array to the database and after I have allocated a static table which requires no loading/processing time (for use) since it is allocated only once unlike from C# where each compilation requires the array to be allocated each time. And then apply some matrix multiplication algorithm (Which I mentioned above). I'm not still sure if I can do something to the processing time if I translate my codes into SQL script.

tobias elbert · 5/6/11

Ok, I misunderstood. I thought that retrieving the dataset itself from the DB was the problem or a big part of it. But I don't see how SQL or doing arithmetics on SQL Server could do matrix multiplications/ or general arithmetics faster? I am quite certain that you won't get any speed up from doing that on the SQL side of things and you should be looking at other ways to improve performance. I think I mentioned this before, but the best option is to multithread this (if you arent already doing this).

Tsotne · 5/7/11

Yeah I read your suggestion about multithread. I have done it. But the thing is that I have defined a method doing a matrix inversion and one doing multiplication. I'm writing some programs on Markowitz portfolio selection model and need to manipulate a big matrices. So it comes out to be slow on C#. I know other languages such as C/C++, Matlab and have already done those algorithms there but for some reasons (which concerns the flexibility of other parts in the program) I prefer the entire project to be done in C#. Thanks

tobias elbert · 5/7/11

How about using PCA to reduce the rank of your covariance matrix and then use a solver to determine the weights of the portfolios on the frontier? There are lots of open-source solver packages out there for C# ie

ILNumerics.Net - numeric computing for .NET
Optimization.
Numerical Methods on C#
dnAnalytics
Math.NET Numerics

That way you dont have to worry about matrix multiplication and inversion any longer.

Tsotne · 5/7/11

I'm using the Principal Component Analysis very frequently when dealing with such big matrices. I'm very well aware of it. I'm not doing such big matrix manipulation for some deadline or particular purpose. I can do it many ways (including PCA you mentioned). I'm just mastering my math based programming skills. Thanks a lot for the pointed resources. I know and have many libraries already defined and proven to be fast and efficient but I never use them. I use only my defined libraries which sometimes is less efficient and fast but still I prefer defining libraries myself as I progress through some particular subjects. I'll go on searching for the best way I can streamline my current code of matrix multiplication (this one is my concern at this time). Thanks again

Tsotne · 5/7/11

Here's an extract from my defined Lin_Alg library. Don't pay attention to try/catch block. I have defined special exception class for matrices and it is referring to that.

C++:

  public static double[,] MMult(double[,] matrixA, double[,] matrixB)
        {
            int n = matrixA.GetLength(0);  //rows
            int m = matrixA.GetLength(1);  //columns
            int a = matrixB.GetLength(0);  //rows
            int b = matrixB.GetLength(1);  //columns
            double[,] MM = new double[n, b];

            if (m == a)
            {
                int k = 0;

                for (int i = 0; i < n; i++)
                {
                    for (int j = 0; j < b; j++)
                    {
                        while (k < m)
                        {
                            MM[i, j] += matrixA[i, k] * matrixB[k, j];
                            k++;
                        }
                        k = 0;
                    }
                }
                return MM;
            }
            else
            {
                MatrixDimensionsException ex = new MatrixDimensionsException(DateTime.Now,
                    "Columns of the first matrix is not equal to the rows of the second matrix!",
                    "Matrix-Multiplication Error Message");
                throw ex;
            }
        }

tobias elbert · 5/7/11

I understand where you are coming from. I agree it's good to start from first principles on these things and what one should be doing in their own time or whilst studying. For production purposes, I find it's (a) faster and (b) safer to use code/ libraries that are already out there. With safer I mean that those libraries are often very well tested and reviewed by a large number of people, hence less surprises.

Tsotne · 5/7/11

Yes completely agree. Libraries already defined are safe and sound. They survived criticisms and have been becoming better and better as time went by. My life is such: I do the algorithms and libraries for the university use. From lecture to lecture I add new functionality to the existing codes and we sometimes discuss them on lectures. Only users of those codes are my groupmates and some professors of the university. I have defined a new excel functionality for statistical package and that is working well. For production purposes it's better for sure to use already optimized and existing ones. But sometime my codes will also become existing one which will have been tested by many people and walked through several modifications to help become optimized.

tobias elbert · 5/7/11

Sticking to your example, this is a way to parallelize it. It's not perfect and could be improved. I just cooked this up so it isn't tested. It should work, but assumes that the number of rows in matrixA are even. Not difficult to generalise it.

C++:

        static double[,] Multiply(double[,] matrixA, double[,] matrixB, int nPartitions)
        {
            int nrowsA = matrixA.GetLength(0);
            int ncolsA = matrixA.GetLength(1);

            int nrowsB = matrixB.GetLength(0);
            int ncolsB = matrixB.GetLength(1);

            double[,] resM = new double[nrowsA, ncolsB];
            int rowPartitionSize = nrowsA / nPartitions;

            Task[] tasks = new Task[nPartitions];

            // I prefer to use this in case I want to add a scheduler later on, i.e. to restrict
            // the degree of parallelism
            TaskFactory factory = new TaskFactory();

            Action<int, int> multiplier = new Action<int, int>
            (
                (int fromIndex, int toIndex) =>
                {
                    for (int i = fromIndex; i < toIndex; i++)
                    {
                        for (int k = 0; k < ncolsB; k++)
                        {
                            double sum = 0;
                            for (int l = 0; l < nrowsB; l++)
                            {
                                sum += matrixA[i, l] * matrixB[l, k];
                            }

                            // locking shouldn't be required - no concurrent access to
                            // same element in array possible
                            resM[i, k] = sum;
                        }
                    }
                }
            );

            for (int iTask = 0; iTask < nPartitions; iTask++)
            {
                int start = iTask * rowPartitionSize;
                int end = (iTask + 1) * rowPartitionSize;
                tasks[iTask] = factory.StartNew(() => multiplier(start, end));
            }

            Task.WaitAll(tasks );
            return resM;

        }

tobias elbert · 5/8/11

tobias elbert said:
It should work, but assumes that the number of rows in matrixA are even.

Meant to say: are divisible by the nPartitions argument.

Tsotne · 5/8/11

tobias elbert said:

Sticking to your example, this is a way to parallelize it. It's not perfect and could be improved. I just cooked this up so it isn't tested. It should work, but assumes that the number of rows in matrixA are even. Not difficult to generalise it.

C++:

        static double[,] Multiply(double[,] matrixA, double[,] matrixB, int nPartitions)
        {
            int nrowsA = matrixA.GetLength(0);
            int ncolsA = matrixA.GetLength(1);

            int nrowsB = matrixB.GetLength(0);
            int ncolsB = matrixB.GetLength(1);

            double[,] resM = new double[nrowsA, ncolsB];
            int rowPartitionSize = nrowsA / nPartitions;

            Task[] tasks = new Task[nPartitions];

            // I prefer to use this in case I want to add a scheduler later on, i.e. to restrict
            // the degree of parallelism
            TaskFactory factory = new TaskFactory();

            Action<int, int> multiplier = new Action<int, int>
            (
                (int fromIndex, int toIndex) =>
                {
                    for (int i = fromIndex; i < toIndex; i++)
                    {
                        for (int k = 0; k < ncolsB; k++)
                        {
                            double sum = 0;
                            for (int l = 0; l < nrowsB; l++)
                            {
                                sum += matrixA[i, l] * matrixB[l, k];
                            }

                            // locking shouldn't be required - no concurrent access to
                            // same element in array possible
                            resM[i, k] = sum;
                        }
                    }
                }
            );

            for (int iTask = 0; iTask < nPartitions; iTask++)
            {
                int start = iTask * rowPartitionSize;
                int end = (iTask + 1) * rowPartitionSize;
                tasks[iTask] = factory.StartNew(() => multiplier(start, end));
            }

            Task.WaitAll(tasks );
            return resM;

        }

Nice you made an effort. But the code seems a bit long for matrix multiplication algorithm. Also there are many logic to be checked and executed when there is not need for it and it increases the processing time. i.e. "for"s. And also it could definitely be improved. Why restricting the matrix rows to be even???

tobias elbert · 5/8/11

The code essentially does what you are doing , but instead of your most inner "while" I used a "for". It wraps the multiplication in an Action function object, partitions the rowspace of matrixA into nPartitions and does the multiplication on nPartition Threads in parallel. Your code does the same, but in sequence on the main thread (ie single thread). All else being equal, mine should run considerably faster for matrixes A and B.

I am not restricting the rows of the matrix to be even - that was a logical error on my part (sorry just got up when I posted this and still half asleep). I meant to say should be "divisible by nPartitions".

Tsotne · 5/8/11

Let's check the speed

I appreciate your work.

tobias elbert · 5/8/11

Yes, please. I'd be interested to know. As I said, I haven't checked this (in this particular use case) and would be curious to know. Note that the matrices should be large - otherwise you won't see much of a difference.

Tsotne · 5/8/11

I know. I'm just doing it and post the visual results in 1 minute.

EDIT: There is one problem: I don't have the task and taskfactory types defined. Could you supply the relevant ones? @tobias elbert

tobias elbert · 5/8/11

I am assuming you have access to .Net Framework 4 right?

Then just add the following to your namespaces:

using System.Threading;
using System.Threading.Tasks;