Why is my C++ implementation of a binomial tree so slow??

Hello! I'm implementing a simple binomial pricing model in C++. Right now it looks like my vectorized python approach is actually faster than my C++ implementation, which surprises me (maybe it shouldn't?). I'm wondering if there's something that can be optimized in my implementation. The bulk of the computation is done in the following function:

double iterate_tree(std::vector<double>& v_curr,
                    double r, double dt, double p, double q)
    unsigned long long int N{v_curr.size()};
    unsigned long long int j{0};
    while(j < N)
        for (std::vector<double>::size_type i = 0; i < N - j - 1; i++)
            v_curr = exp(-r*dt)*(p*v_curr + q*v_curr[i+1]); // Calculate the value at each node
    return v_curr[0];

The input "v_curr" is a vector of boundary conditions (ex: (ST - K)+). I'm passing the vector by reference, and modifying it in place. I don't see where the huge bottleneck comes from in terms of the computing time. I'm hoping someone can help point it out for me. Thanks!
