Sheldon Ross 10: Example 3.17


Question

In the match problem of Example 2.31 involving \(n\), \(n > 1\), individuals, find the conditional expected number of matches given that the first person did not have a match.


Analytical Solution

As we see from the the problem’s description, 2.31 has been reiterated with a condition. This can be solved by applying the method of the conditional expectations \[E[X] = E[E[X|X=x]] ⇒ E[X] = E[X|X_1=0] P\{X_1=0\} + E[X|X_1=1] P\{X_1=1\}\]

Plugging these in the equality above, we get \(E[X|X_{1}=0] = \frac{n-2}{n-1}\)


Observations

Simulations

The simulations have been summarized in the chart below. What is being shown in the chart?


    Module[{iterations = {10, 20, 30, 50, 100, 200, 300, 500, 1000, 2000}},
     DistributionChart[
      Table[Module[{alphabet = Alphabet[], samples = #, data,
           firstSelection, secondSelection},
          data = Table[RandomSample[Alphabet[]], samples];
          data = DeleteCases[data, {"a"}~Join~ConstantArray[_, 25]];
          firstSelection = Length[data];
          N@Mean[
            Count[MapThread[#1 == #2 &, {Alphabet[], #}], True] & /@ data]
          ], 1000] & /@ iterations,
      ChartElementFunction -> "PointDensity", GridLines -> {None, {0.96}},
      ChartLabels -> iterations, ImageSize -> 788]]

    Export[StringReplace[NotebookFileName[], ".nb" -> "_chart_01.png"], %, ImageResolution -> 500]

Generalized Algorithm

In the above example, we have looked at a system with a fixed number of units (hats). I have generalized the "Alphabet" algorithm from previous section to accommodate any list. I used the function to pass lists of increasing lengths and to see how the expected value changes, since, \(E[X|X_1 = 0]\) varies with \(n\) as \(\frac{n-2}{n-1}\). This particular simulation took ~30 minutes for completion.

Reading the chart: The chart has seven distributions shown and we can read it as follows. Each bar contains 1000 points and each point is a mean value generated from a 1000 selections of varying lengths. The length of the list can be seen from the label at the bottom of each of bar


    mixAndMatchHats2[listIn_List] := Module[{samples = 1000},
      DistributionChart[
       Table[Module[{list = #, data, firstSelection, secondSelection},
           data = Table[RandomSample[list], samples];
           data =
            DeleteCases[
             data, {list[[1]]}~Join~ConstantArray[_, Length@list - 1]];
           N@Mean[Count[MapThread[#1 == #2 &, {list, #}], True] & /@ data]
           ], 1000] & /@ listIn, ChartElementFunction -> "PointDensity",
       ImageSize -> 788,
       GridLines -> {None, ((# - 2)/(# - 1)) & /@ (Length /@ listIn)},
       ChartLabels -> Length /@ listIn]]

    mixAndMatchHats2[Table[Range[n], {n, {10, 20, 30, 50, 100, 200, 500}}]]
    Export[StringReplace[NotebookFileName[], ".nb" -> "_chart_02.png"], %, ImageResolution -> 500]

Comparing with theory We know that \(E[X|X_{1}=0]\) varies with \(n\) as \(\frac{n-2}{n-1}\). We can plot this simply as a function to see the behavior of \(E[X|X_{1} = 0]\) with \(n\). The plot has been done for lengths of lists from 3 to 25.


Full Code

End of the post ;)