SQL interraction with VS

Tsotne · 4/21/11

alain said:
IMHO, IT IS NEVER USEFUL!!! It will lead to headaches down the path.

Some components may not be useful just because you don't use them or have an alternative way to do things(maybe better). Like dataset in VS IS NOT USEFUL = HAS AN ALTERNATIVE like LINQ. You can replace dataset with LINQ if needed and get rid of dataset once and for all. But that doesn't mean it is not useful. I still find it more comfortable even though I know LINQ simplifies things 100000 times.

euroazn · 4/21/11

alain said:
IMHO, IT IS NEVER USEFUL!!! It will lead to headaches down the path.

It is almost always bad, I agree. What's worse is when the stored procs are labeled incorrectly and the current DBA's don't know what they do... =_=

That being said, there are a few instances when they are quasi-neccessary. However, they should be very simple - there's no need for a "complicated" stored procedure.

Tsotne · 4/22/11

It is almost always bad, I agree. What's worse is when the stored procs are labeled incorrectly and the current DBA's don't know what they do... =_=

It is always up to you to control the process. Incorrect labeling might happen to any procedures defined inside the programming language itself. (if I got your point correctly)

daleholborow · 4/22/11

Briefly:
- Always avoid moving your application logic into your database layer, if at all possible. Alain is right.
- Whereever possible reduce the amount of data you send to your application as early as possible by trimming out unneeded records at the database layer (which is part of why a lot of .net dataset and MS-designed GUI elements - like drag-and-drop-auto-wired gridviews etc - suck so badly once you have more than a few records)
- Obviously, sometimes these first two points are contradictory in nature...

Experience will help you know when to do what, but sometimes there are no perfect answers. But in general, always separate your concerns, as much as possible.

Tsotne · 4/22/11

Thanks daleholborow.

Tsotne · 4/22/11

Now, it's clear. But one very simple example. You want to calculate the average of the column vector. Would you do it in SQL and then import the result or import and allocate the array and then find its average?

nielsb · 4/22/11

Tsotne said:
Now, it's clear. But one very simple example. You want to calculate the average of the column vector. Would you do it in SQL and then import the result or import and allocate the array and then find its average?

If you are talking about calculating a simple average over values in one column; 9 times out of 10 I would do it in SQL.

Niels

daleholborow · 4/22/11

Ok. I'm going to make a few assumptions. You have a potentially (very??) large dataset. You ONLY want the average, not the actual values themselves, to use in your application at this particular point in time.

So, we're doing a very simple operation, to find an average of the records that meet your query criteria.

So... your options are, retrieve all those records in the db query, bundle all that data down the pipe (potentially especially damaging/expensive for distributed application where you're querying data off a foreign server into your local app), THEN parsing all that data into some local container object (not necessary, but depending on what tech you're using, it might be happening anyway, LINQ/EntityFramework etc), THEN you use your code logic to iterate through all those records and calculate the average. This is obviously NOT ideal.

An example of the sort of query name I'd expect here might be Users_GetAllExamScores(), highlighting the fact that you're returning a potentially massive dataset, then working the data down to find the average.

OR

You realise that you've already found all the data you need by doing the query, you can do the simple average operation locally in the db layer, its not really a "business logic" operation, so it doesn't offend my especially an@lly-retentive-design-theory notions, and you return a single floating value, potentially saving your data transfer load by a factor of 1000s. This, is obviously very good.

and the query might be Users_GetAverageExamScore(), highlighting the fact that you're returning a SINGLE value.

Now... you can see by my initial assumptions where this decision might suddenly and easily be made more complicated... because lets say you ALSO need each individual exam score as well, (say you're displaying them on screen), then hell, you might as well return all the data because you HAVE to anyway, and then average it locally, instead of performing BOTH queries on the database, one to get all records, one to get the average.

As an interesting side note, it's been my experience that things like this are where you really stand to blow out your application performance. People get REALLY hung up on whether C++ is fractions of a second faster than C# when it comes to performing multidimensional integration with their awesome new algorithm... and then write a sloppy query that returns 10 billion rows of data and filtering it in application, when they REALLY needed 15 in the first place and could have made the database do the SIMPLE work earlier and saved themselves seconds of load time while your poor database breaks down in tears and your internet traffic goes through the roof...

Note I say SIMPLE work in the db... NOT business logic.

This is ESPECIALLY important when it comes to abstracted db querying frameworks like the LINQ and Entity .Net systems... which burned people REALLY badly because they often don't get a feel for what the actual SQL query will look like or do when they write up the code-side query, and Linq generates some slightly oddball SQL as a result...

Hope this helps.

Tsotne · 4/22/11

Thanks @daleholborow. It is really useful info. But one more question

People get REALLY hung up on whether C++ is fractions of a second faster than C# when it comes to performing multidimensional integration with their awesome new algorithm.

How would you compare these languages from this point of view you provided?

daleholborow · 4/22/11

You're talking specifically C++ vs C#?

I don't think I've met the programmer yet who could squeeze that much performance out of either to make it matter, like I mention above, so much time is spent trying to "optimise" bits of code... only for foolish mistakes to slip through, its actually quite funny.I see a lot of internet blogs where you get the "if i write this loop for sum 1 trillion numbers, in c++ and C#, you can see that x language is 0.0005% better", but honestly.. who cares? If you're at the point where you work for a company who physically locate themselves 100 feet closer to the exchange because your ultra-high-speed trading algos need to reduce the distance to find those final few 10000ths of a second, then the decision is already made. You're a C/C++ boy, and you probably solder your own motherboards to get extra speed, and you certainly don't give a shit what I think.

Alternatively, who else needs that speed? Think of it this way... Excel is a massively popular tool... not because its the fastest, but because its quick, dirty and reasonably powerful and gets the job done for cheap. And isn't that what you want, from any tool you chose? That it does what you want, with a minimum of fuss?

In my experience, C# is vastly easier with a lot more pre-rolled code libraries, and the open source community is getting pretty active as well, so you can do a lot and get a lot of functionality now without much financial investment.

Tsotne · 4/22/11

daleholborow said:
You're talking specifically C++ vs C#?

I don't think I've met the programmer yet who could squeeze that much performance out of either to make it matter, like I mention above, so much time is spent trying to "optimise" bits of code... only for foolish mistakes to slip through, its actually quite funny.I see a lot of internet blogs where you get the "if i write this loop for sum 1 trillion numbers, in c++ and C#, you can see that x language is 0.0005% better", but honestly.. who cares? If you're at the point where you work for a company who physically locate themselves 100 feet closer to the exchange because your ultra-high-speed trading algos need to reduce the distance to find those final few 10000ths of a second, then the decision is already made. You're a C/C++ boy, and you probably solder your own motherboards to get extra speed, and you certainly don't give a shit what I think.

Alternatively, who else needs that speed? Think of it this way... Excel is a massively popular tool... not because its the fastest, but because its quick, dirty and reasonably powerful and gets the job done for cheap. And isn't that what you want, from any tool you chose? That it does what you want, with a minimum of fuss?

In my experience, C# is vastly easier with a lot more pre-rolled code libraries, and the open source community is getting pretty active as well, so you can do a lot and get a lot of functionality now without much financial investment.

Yes I meant C++ vs C#. Nice answer, but as for 0.0005% times better speed, as a whole you are right. But there are fields for example (again) HFT in which speed is crucial as well as approximation accuracy on the level of e.g. 10e-14. I also like C# but sometimes I'm pushed to switch to C++ alone while discussing pros and cons of each. I'm currently developing a statistical software fitting 1 and 2 dimensional (copulas) distributions to the data and I'm doing it in C#. Sometimes I really care about the speed and scratch head how to optimize the code in order to make the execution time less. When I think : "If I had done this in C++, would I get better results in less time?" I stuck there since there are some issues to take into account. If I had done this in C++ I have then an opportunity cost of development time so I could have done just 1/100 of what I have now. So I always take into consideration the development time vs processing time and place more importance on the first one. The second comes out to make me nervous though.

daleholborow · 4/22/11

I guess, without knowing all the specifics of your individual case and project, my comments definitely require disclaimers, but a lesson I learned myself the hard way is to ask yourself do you NEED to run your fitting every second? Minute? Once a day?

In a job, would this code run once a day, or once a month for reports for the Director? Pride will tell you to make it run as blazingly fast as possible... but commercial reality will mean you might spend a week to write some tight code, and when you go to justify your existance to your boss, and he asks why you took a week to save 15 minutes off your monthly report run-time... its not going to go well for you.

A guy who can get quick-ish, accurate results out cheap and on time will win every time.

Tsotne · 4/22/11

daleholborow said:
I guess, without knowing all the specifics of your individual case and project, my comments definitely require disclaimers, but a lesson I learned myself the hard way is to ask yourself do you NEED to run your fitting every second? Minute? Once a day?

In a job, would this code run once a day, or once a month for reports for the Director? Pride will tell you to make it run as blazingly fast as possible... but commercial reality will mean you might spend a week to write some tight code, and when you go to justify your existance to your boss, and he asks why you took a week to save 15 minutes off your monthly report run-time... its not going to go well for you.

A guy who can get quick-ish, accurate results out cheap and on time will win every time.

Yeah my application is kind of a one-shot. And the second one is sticked to the web and retrieving data every one minute interval. Calculating the portfolio characteristics (like sigma, sharpe ration, optimizing, etc) and it has to be on constantly.

tobias elbert · 4/22/11

Not sure if you are already doing this, but whenever speed is an issue but memory isn't, I'd usually just break the data up and run the arithmetics on separate threads and then combine the results back on the main thread.

If memory is an issue the hacky way would be to create a separate process for each set, write the results to some permanent storage, then combine the results and move on. The better way would be to farm out your work to separate processes (locally or even better remotely if you have the resources) using ie WCF in .Net. It actually pays to do the latter properly so that it scales properly with different use cases. .Net makes it very easy for you to do the latter.

daleholborow · 4/22/11

Without meaning to be argumentative, I question the use of WCF and how easy it is. I've seen it used rather badly/poorly on several occasions, and can't think of an instance where it has been used well. It added quite a lot to dev time, and product complexity. Have you had a different experience?

tobias elbert · 4/22/11

Yes, it has a learning curve and as always it depends on what the objective is. I have used WCF thus far on two projects only which are admittedly not mission crucial. I used it across 5-10 (depending on availability) machines on the local network and found it to be very stable and reliable. If I wanted to run another type of problem, I'd really just have to implement an interface in a separate dll, copy that dll to a folder on each machine and kick off another run. Admittedly it does need some tweaking and fine-tuning (ie which object hosting model to use) which took me some time to figure out. I do use the same framework a lot locally though whenever I go over the 2 GB process boundary (still on 32 bit) and it works without any problems. As I said, the easier way (locally at least) is to just kick off separate processes from your main process and just wait for them to complete and retrieve the results from permanent storage (ie files, SQLLite, or any other DBs).

BTW, my former company used .Net Remoting (WCF's older brother) in a production environment and never had any (software related) issues.

Tsotne · 4/22/11

tobias elbert said:
Not sure if you are already doing this, but whenever speed is an issue but memory isn't, I'd usually just break the data up and run the arithmetics on separate threads and then combine the results back on the main thread.

I know multi threading is the way to go. I have done something like it with large amount of data allocated into an array.

tobias elbert · 4/22/11

@Tsotne: you do break up that array into subarrays though right? Or are you using .Net's ConcurrentBags? Otherwise you'll have to do a lot of locking/syncing and wont really get the benetfits from multithreading.

Tsotne · 4/22/11

tobias elbert said:
@Tsotne: you do break up that array into subarrays though right? Or are you using .Net's ConcurrentBags? Otherwise you'll have to do a lot of locking/syncing and wont really get the benetfits from multithreading.

Yes I construct many arrays from the existing one. As for ConcurrentBags, I have not used it. When importing the array from a database column, sometimes it contains very large amount of data which is very costly to treat with custom functions or cycles directly.

Tsotne · 5/5/11

I wonder if anyone has a matrix multiplication algorithm made in SQL. I know you'll wonder why I need it: to compare the speed with C# algorithm and also to see the complexity associated with SQL script defining a matrix multiplication. I'm intending to move to SQL and define many methods I have already done in C# to SQL since I find it faster in my specific cases.

Thanks

SQL interraction with VS

Similar threads