I too was recently heavily involved in writing pricing code in CUDA, and was also doing lots of HPC work on other platforms in past couple years, so here are some of opinions of mine:
I'd say that going with NVIDIA hardware, and CUDA API, is still the safest option for anyone looking for speed-up. As mentioned above, IBM Cell is dead (and it was ugly as hell for programming). Larabee is going to be heavily delayed (for at least a year too), and overall its future is questionable too (although Intel guys seem like changing focus with Larabee from GPU to HPC accelerator). As for OpenCL (also mentioned in some posts) - I had an opportunity for doing lots of work in OpenCL recently, and this thing is crap at the moment. The programming model is OK, rather similar to CUDA, but drivers/tools implementation is awful. With NVIDIA OpenCL drivers, you get several times slower performance that for the same CUDA code; with AMD the performance is also very hard to get (AMD guys have some very fast hardware in their offerings, but overall they just still seem like unconvinced in using GPU for HPC, so the effort they put in is far behind what NVIDIA is doing). So overall, the state of the OpenCL is same as the state of CUDA approximately 2.5 years ago: it's certainly going to improve, but I see no reason to waste time in waiting for it, especially as it appears that write-OpenCL-once-then-run-everywhere is myth - you just have to tweak for each platform separately, so this really has no advantage over just deciding for NVIDIA hardware and sticking with CUDA.
There exist many other efforts to provide higher-level paradigm for the GPU/accelerator programming. One example of this are Matlab plugins, like above mentioned
Jacket from Accelereyes (there are others, like
GPUmat); was involved both in the implementation, and usage of something alike, and I'd say these won't fly either: it's very hard to match Matlab routines in semantic and numerical precision, and still keep GPU utilized efficiently. Furthermore, there is lots of work in trying to provide extensions for general purpose languages that would semi-automatically parallelize given sequences of code for the execution on the accelerator (GPU or other type). For example, recent release of Portland Group suite of compilers is offering something
alike for Fortran (although I didn't liked it - too much OpenMP-like stuff to me; on the other side, I really liked an
alternative capability offered by the same release of their compiler tools, and that is to write CUDA kernels in Fortran, together with having all of CUDA runtime functions available through nice native Fortran syntax). For
C++, RapidMind was providing an automated translation platform for
C++, and it was rather mature (if I remember it correctly, they supported all of multi-core CPU, GPU, and Cell), but they are recently acquired by Intel, so I'd expect soon-to-be-released-in-beta
Intel Ct platform to be much like this, so it may be interesting to take a look into.
As far as FPGA concerned, I wouldn't agree with DailyVaR - I think there is lots of potential in FPGA, especially regarding recent C-to-FPGA developments.
Impulse C offering is very mature - I experimented with it to the some extent, and while the programming model is certainly even more complicated than for GPUs, the effort could be definitely worthwhile regarding overall speed-up potentials. Also, there exist other vendors starting to offering this kind of tools (like
Mitrionics), so I'd expect this field to quickly mature into viable alternative to using GPUs.
So - overall, lots of very interesting development ongoing, but the problem is that programming models are far from being standardized, and it's hard to know which one will eventually win out as de-facto standard. Still, considerable speed-ups (thus competitive advantage over competitors too) could be achieved by employing accelerators even today, so I'd say investing in that kind of development is must already; and I'd also re-state that at going with NVIDIA hardware and CUDA API is pretty safe bet at the moment: hardware is fast and improving (Fermi is going to bring some really nice improvements), software stack is mature and stable, and there also exist considerable pool of people knowledgeable in CUDA to hire from.