• C++ Programming for Financial Engineering
    Highly recommended by thousands of MFE students. Covers essential C++ topics with applications to financial engineering. Learn more Join!
    Python for Finance with Intro to Data Science
    Gain practical understanding of Python to read, understand, and write professional Python code for your first day on the job. Learn more Join!
    An Intuition-Based Options Primer for FE
    Ideal for entry level positions interviews and graduate studies, specializing in options trading arbitrage and options valuation models. Learn more Join!

Colored runs of cards

radosr

Baruch MFE Faculty
Joined
7/18/07
Messages
648
Points
73
There are 26 black(B) and 26 red(R) cards in a standard deck. A run is a maximum block of consecutive cards of the same color. For example, a sequence RRRRBBBRBRB of only 11 cards has
6 runs; namely, RRRR, BBB, R, B, R, B. Find the expected number of runs in a shuffled deck of cards.
 
For each \(n=1:51\), let \(X_n=0\) if cards \(n\) and \(n+1\) are the same color and 1 if they are different. the answer to the question is

\(51E[X_1]+1 = 51P[X_1=1]+1 = 1+51\cdot\frac{26}{51}=27\)
 
For each \(n=1:51\), let \(X_n=0\) if cards \(n\) and \(n+1\) are the same color and 1 if they are different. the answer to the question is

\(51E[X_1]+1 = 51P[X_1=1]+1 = 1+51\cdot\frac{26}{51}=27\)
Peter, can you explain it please?
 
Peter, can you explain it please?

The number of runs is essentially the number of changes in color from one card to the next, plus 1. This is easy enough to see; just look at Rados' example. Notice that \(X_1+\cdots+X_{51}\) counts the number of changes, so we're looking for \(1+E[X_1+\cdots+X_{51}]\), which by the linearity of expectation and the fact that the \(X_i\) are identically distributed (by symmetry), reduces to \(1+51E[X_1]\). Now \(E[X_1]\) is just \(P[X_1=1]\). This is always the case for indicator random variables, and you can easily see it by writing \(E[X_1]=1\cdot P[X_1=1]+0\cdot P[X_1=0]=P[X_1=1]\). So we're left with evaluating \(P[X_1=1]=P[\text{first and second cards are different colors}]\). The probability that the second card is different from the first is just \(\frac{26}{51}\) (26 cards different from the first, out of the 51 remaining). So yeah, there was a mistake in my original solution. The answer should be \(1+51\cdot \frac{26}{51}=27\).
 
The number of runs is essentially the number of changes in color from one card to the next, plus 1. This is easy enough to see; just look at Rados' example. Notice that \(X_1+\cdots+X_{51}\) counts the number of changes, so we're looking for \(1+E[X_1+\cdots+X_{51}]\), which by the linearity of expectation and the fact that the \(X_i\) are identically distributed (by symmetry), reduces to \(1+51E[X_1]\). Now \(E[X_1]\) is just \(P[X_1=1]\). This is always the case for indicator random variables, and you can easily see it by writing \(E[X_1]=1\cdot P[X_1=1]+0\cdot P[X_1=0]=P[X_1=1]\). So we're left with evaluating \(P[X_1=1]=P[\text{first and second cards are different colors}]\). The probability that the second card is different from the first is just \(\frac{26}{51}\) (26 cards different from the first, out of the 51 remaining). So yeah, there was a mistake in my original solution. The answer should be \(1+51\cdot \frac{26}{51}=27\).
Why are you using 49? shouldn't you use 51 for the second card?
 
Why are you using 49? shouldn't you use 51 for the second card?

Haha, I am really off on this problem...

It should be 51, not 49, because there are 51 places where a change can occur. So, for the third time, the answer should actually be \(1+51\cdot \frac{26}{51}=27\). It's also the most aesthetic answer I've had so far, so I may actually get lucky this time ;)
 
Haha, I am really off on this problem...

It should be 51, not 49, because there are 51 places where a change can occur. So, for the third time, the answer should actually be \(1+51\cdot \frac{26}{51}=27\). It's also the most aesthetic answer I've had so far, so I may actually get lucky this time ;)
There is more logic now, but still I don;t think it is right. It is 3:00 am here, so I may be wrong. But the Expectation for each changes after each draw. I will think about it.
 
There is more logic now, but still I don;t think it is right. It is 3:00 am here, so I may be wrong. But the Expectation for each changes after each draw. I will think about it.

What I think you're trying to say is that runs in certain parts of the deck are affected by runs in other parts. Sure, but that's irrelevant because we're after the average number of runs here.

Think of it like this. Consider the following two procedures.

(1) Shuffle the deck 10000 times and each time count the number of runs. Then average these 10000 numbers; this will give you a good approximation to the answer you want.

(2) Now, this is the same as instead doing the following. Shuffle the deck 10000 times and each time record the following 51 numbers: the first number is either 0 or 1 depending on whether cards 1 and 2 are the same or different; the second number is 0 or 1 depending on whether cards 2 and 3 are the same or different; etc. So for example, for a deck starting RRBRB... we record 0,1,1,1,... Now take the average of the first numbers in these 0-1 sequences, take the average of the second numbers, etc. These results represent good approximations to the \(E[X_i]\) of the above solution. Add them up, and then add 1, and you get exactly the same thing you got from procedure (1).

This should clarify why it works, and why indicator random variables are so ubiquitous in the solutions of these expectation questions.
 
What I think you're trying to say is that runs in certain parts of the deck are affected by runs in other parts. Sure, but that's irrelevant because we're after the average number of runs here.

Think of it like this. Consider the following two procedures.

(1) Shuffle the deck 10000 times and each time count the number of runs. Then average these 10000 numbers; this will give you a good approximation to the answer you want.

(2) Now, this is the same as instead doing the following. Shuffle the deck 10000 times and each time record the following 51 numbers: the first number is either 0 or 1 depending on whether cards 1 and 2 are the same or different; the second number is 0 or 1 depending on whether cards 2 and 3 are the same or different; etc. So for example, for a deck starting RRBRB... we record 0,1,1,1,... Now take the average of the first numbers in these 0-1 sequences, take the average of the second numbers, etc. These results represent good approximations to the (E[X_i]) of the above solution. Add them up, and then add 1, and you get exactly the same thing you got from procedure (1).

This should clarify why it works, and why indicator random variables are so ubiquitous in the solutions of these expectation questions.
Well, let's try this. Assume we have a deck of 6 cards, 3 Red, R Black. Based on your logic, the Expected number of runs is 1 + 5 * 3/5 = 4. But if you actually solve this using a simple table of combinations (total is 20 and there are no runs of 0 and 5), you will get an answer of 3.8
Could you verify your logic with answer of 4 or my answer of 3.8?
Also, regarding your initial logic, I don't think you can have total runs of 51. You can have 50 runs and 52 runs, but 51 runs is not possible. Could you also verify that?
 
well there are actually ones with 5 runs; here's all 20:

2 runs: BBBRRR, RRRBBB (2)
3 runs: BBRRRB, BRRRBB, RRBBBR, RBBBRR (4)
4 runs: BBRRBR, BBRBRR, BRRBBR, BRRBBR, RRBBRB, RRBRBB, RBBRRB, RBBRRB (8)
5 runs: BRBRRB, BRRBRB, RBBRBR, RBRBBR (4)
6 runs: BRBRBR, RBRBRB (2)

and this gives us an average of \(\frac{2\cdot 2+3\cdot 4+4\cdot 8+5\cdot 4+6\cdot 2}{20}=4\) :)
 
Quick simulation of the problem.

C++:
//============================================================================
// Name        : Simulation2.cpp
// Author      : Alex
// Version    :
// Copyright  : Your copyright notice
// Description :
//============================================================================

#include <cstdlib>
#include <iostream>
#include <ctime>
 
using namespace std;

int simulate_FullDeckDraw();//colors only
 
int generate_Draw(int reds,int blacks);
void print_Color(int color);
 
int run_Array[52]={0};
int Deck_size(sizeof(run_Array)/sizeof(int));

int main()
{
  
    srand( (unsigned int)time(0) );

    long N = 1000000;
    // number of simulations

    int run(0);
    // number of run by Full deck Draw

    long total_Run(0);
 
    //array of runs

    for (int i = 0; i < N; i++)
    {
        run=simulate_FullDeckDraw();
      //  cout <<endl<< "Number of runs is: " << run<<endl;
        total_Run+=(long)run;
        run_Array[run-2]++;

    }

    for (int i = 0; i < Deck_size-1; i++)

          cout<<"Run "<<i+2<<" had a total of "<<run_Array[i]<<" runs"<<endl;

    cout <<endl<< "Average Number of runs over " <<N<<" simulations is "<< (double)(total_Run)/(double)(N)<<endl;

    return 0;
}

int simulate_FullDeckDraw()
{

    int RedCards(Deck_size/2), BlackCards(Deck_size/2);

    int Current_Run(1);
    int Current_Color(0), Previous_Color(0); //1 for red, 2 for black
  //First Card
    Previous_Color= generate_Draw(RedCards, BlackCards);
    if(Previous_Color==1)
        RedCards--;
    else
        BlackCards--;
    int i = 2;

  // cout<<endl;
  //  print_Color(Previous_Color);

    while( i <=Deck_size )
    {
        Current_Color= generate_Draw(RedCards, BlackCards);
    //  print_Color(Current_Color);
 
        if(Current_Color==1)
                RedCards--;
            else
                BlackCards--;
        if(Current_Color!=Previous_Color)
            Current_Run++;
        Previous_Color=Current_Color;
        i++;
    }
  
        return Current_Run;

}

int generate_Draw(int reds,int blacks)
{

    int a;
    double numRand;

    a = rand();
    numRand = (double)(a)/(double)(RAND_MAX);

    if (numRand < (double)(reds)/(double)(reds+blacks))
        return 1; //Red Card
    else
        return 2; //Black Card
}
  
void print_Color(int color)
{
    using namespace std;
if(color==1)
    cout <<"R";
else
    cout <<"B";
 
}
 
Another solution posted on the blog link above:

Number of ways in which we can have 52 runs is same as number of ways in which we can have 2 runs. Similar result holds for 52-k and k+2. This can be explained using the following construction :
Suppose wlog we always start with red (this will not affect the expectation)
Then if we have 2k runs, then we need to place k "bars" between 26 red ones (with 1 bar necessarily at the end) and similarly for the black ones.
Similarly for 2k+1 bars, we need k+1 red bars and k black bars to be placed between 26 red and 26 black cards respectively ( This seperation by bars gives a unique representation for each sequence starting with red).
And now it is easy to see that 52-k and k+2 have the same number of arrangements possible, and thus the distribution is symmetric about 52+2/2 = 27.
 
Back
Top