Since I am quite new to GPU architecture, and you mention the overhead, can you elaborate on how much overhead there is with tranfer of data (i.e. I/O bottlenecking) to/from the card?
For instance, in the PDE situation, can all the initial data be transferred at the start of the calculation leaving only the final result to be transferred at the end - or is there a need to continuously communicate with the GPU?
My initial impression is that since a GPU does not have virtual memory, then the video RAM would need to be fairly large to store any significant data!