C++ binomial tree float versus double preformance

Hey guys, I'm implementing a binomial tree version in C++ to benchmark versus CUDA and I found an interesting issue: changing the CPU code to use doubles(64 bit) for everything instead of float(32 bit) actually increases performance of the CPU code significantly (~3.5x faster.) Anyone know whats going on here, I always assumed floats were faster than doubles???

Also the cpu is a 64 bit processor and the speed improvement occurred when compiled in both MS express edition and g++.