Reply to thread

Message

<blockquote data-quote="DailyVaR" data-source="post: 16590" data-attributes="member: 527">I feel vindicated today.&nbsp; I found out that IBM is dropping the Cell processor as reported by this Ars Electronica <a href="http://arstechnica.com/hardware/news/2009/11/end-of-the-line-for-ibms-cell.ars" target="_blank">article</a>.One particular point that I brought up appears in the article.Boys and girl(s)*, its 2010.&nbsp; Do we want to spend the next two years before the end of the world writing cache coherency routines in your analytic code?&nbsp; I heard one IBM developer implemented some code that steals some memory from each SPE to build a coherent cache, but seriously this should be built into the hardware.---------- Post added at 02:04 PM ---------- Previous post was at 01:41 PM ----------Probably never.&nbsp; If you look at every single NVidia CUDA example they implement everything two ways: once for the CPU and once for the GPU.&nbsp; At the end of each example, they compute the run-times so they can compute the speed-up.&nbsp; Also, there is a simple test to see if the GPU result is close to the CPU result within some epsilon.&nbsp; If the numbers were exact, they wouldn&#039;t need an epsilon.&nbsp; Just my observation.Also, in all their MC simulation code, they always round up the user specified number of sims to a power of 2 that will occupy all the &quot;units.&quot;&nbsp; In CUDA, doing extra work, sometimes, doesn&#039;t take anytime at all if you are waiting anyways.&nbsp; For example, you might want to run with 1,000 sims but you will get 1,024.&nbsp; Kinda annoying if you are trying to match MC-based options prices.</blockquote>

[QUOTE="DailyVaR, post: 16590, member: 527"] I feel vindicated today. I found out that IBM is dropping the Cell processor as reported by this Ars Electronica [URL="http://arstechnica.com/hardware/news/2009/11/end-of-the-line-for-ibms-cell.ars"]article[/URL]. One particular point that I brought up appears in the article. Boys and girl(s)*, its 2010. Do we want to spend the next two years before the end of the world writing cache coherency routines in your analytic code? I heard one IBM developer implemented some code that steals some memory from each SPE to build a coherent cache, but seriously this should be built into the hardware.[COLOR="Silver"] [SIZE=1]---------- Post added at 02:04 PM ---------- Previous post was at 01:41 PM ----------[/SIZE] [/COLOR] Probably never. If you look at every single NVidia CUDA example they implement everything two ways: once for the CPU and once for the GPU. At the end of each example, they compute the run-times so they can compute the speed-up. Also, there is a simple test to see if the GPU result is close to the CPU result within some epsilon. If the numbers were exact, they wouldn't need an epsilon. Just my observation. Also, in all their MC simulation code, they always round up the user specified number of sims to a power of 2 that will occupy all the "units." In CUDA, doing extra work, sometimes, doesn't take anytime at all if you are waiting anyways. For example, you might want to run with 1,000 sims but you will get 1,024. Kinda annoying if you are trying to match MC-based options prices. [/QUOTE]

Verification