• C++ Programming for Financial Engineering
    Highly recommended by thousands of MFE students. Covers essential C++ topics with applications to financial engineering. Learn more Join!
    Python for Finance with Intro to Data Science
    Gain practical understanding of Python to read, understand, and write professional Python code for your first day on the job. Learn more Join!
    An Intuition-Based Options Primer for FE
    Ideal for entry level positions interviews and graduate studies, specializing in options trading arbitrage and options valuation models. Learn more Join!

Thoughts on why current analytical software constrains Big Data solutions for the financial industry

Joined
10/16/13
Messages
2
Points
11
Many organizations have realized significant competitive advantages with Big Data, but could do even more with new software paradigms. The financial industry in particular experiences limitations because it depends on complex analytics—particularly matrix math—which most Big Data architectures cannot accommodate readily. I recently wrote a blog post sharing my thoughts about why this is the case, and I would love some thoughts and feedback from other industry professionals.
 
Many organizations have realized significant competitive advantages with Big Data, but could do even more with new software paradigms. The financial industry in particular experiences limitations because it depends on complex analytics—particularly matrix math—which most Big Data architectures cannot accommodate readily. I recently wrote a blog post sharing my thoughts about why this is the case, and I would love some thoughts and feedback from other industry professionals.


Ok, I'll bite :)

Here are some examples of awesome things you should be able to do with a Big Data exploratory analytics database.
  1. Build the ARCA book for one day of all exchange-traded US equities (186 million quotes) in 80 seconds on a 32-instance commodity hardware cluster. Run it in about half the time on a cluster twice as large.
  2. Run a Principle Components Analysis on a 50M x 50M sparse matrix in minutes.
  3. Select data sets (based on complex criteria) in constant time—irrespective of how big your dataset gets.
I won't say I don't believe this but what time frame are we talking about?

Point 2. to be taken with a spoon of salt IMO. Disclaimer: I know no Big Data.
 
Last edited:
Your competition in this space is basically q/kdb+. They do many many things right, including ETL (much less overhead for this than other systems I've used), complex math, and being quant friendly. There is no layer middle between the DB and the language. And the basic type system + relationships to DB structures is very well thought out.

There are some limitations, notably it isn't designed/licensed for web scale problems (multi-node). It is expensive.
 
Thanks for your replies. It’s a good discussion. Yike, you’re right that P4 scales by adding commodity hardware nodes to a multi-node cluster whereas Kx scales by upgrading to a server with more cores. Adding nodes provides more aggregate computing power. And as you point out, cost is a difference, too. We’re a less expensive, scalable solution. The bigger picture is that the financial industry, which is so dependent on complex analytics, stands to gain from new software approaches. That’s good news for everyone.
 
Back
Top