8. Discrete Multivariate Distributions

Basic Terminology and Techniques

Many problems involve more than a single random variable. When there are multiple random variables associated with an experiment or process we usually denote them as $X,Y,\dots$ or as $X_{1},X_{2},\dots$. For example, your final mark in a course might involve $X_{1}$ -- your assignment mark, $X_{2}$ -- your midterm test mark, and $X_{3}$ -- your exam mark. We need to extend the ideas introduced for single variables to deal with multivariate problems. In this course we only consider discrete multivariate problems, though continuous multivariate variables are also common in daily life (e.g. consider a person's height $X$ and weight $Y$).

To introduce the ideas in a simple setting, we'll first consider an example in which there are only a few possible values of the variables. Later we'll apply these concepts to more complex examples. The ideas themselves are simple even though some applications can involve fairly messy algebra.

Joint Probability Functions:

First, suppose there are two r.v.'s $X$ and $Y$, and define the function MATH

We call $f(x,y)$ the joint probability function of $(X,Y)$. In general, MATH if there are $n$ r.v.'s $X_{1},\dots,X_{n}$.

The properties of a joint probability function are similar to those for a single variable; for two r.v.'s we have $f(x,y)\geq0$ for all $(x,y)$ and

MATH

Example: Consider the following numerical example, where we show $f(x,y)$ in a table.

$x$
$f(x,y)$ 0 1 2
1 .1 .2 .3
$y$
2 .2 .1 .1

for example $f(0,2)=P(X=0$ and $Y=2)=.2$

We can check that $f(x,y)$ is a proper joint probability function since $f(x,y)\geq0$ for all 6 combinations of $(x,y)$ and the sum of these 6 probabilities is 1. When there are only a few values for $X$ and $Y$ it is often easier to tabulate $f(x,y)$ than to find a formula for it. We'll use this example below to illustrate other definitions for multivariate distributions, but first we give a short example where we need to find $f(x,y)$.

Example: Suppose a fair coin is tossed 3 times. Define the r.v.'s $X$ = number of Heads and $Y=1(0)$ if $H(T)$ occurs on the first toss. Find the joint probability function for $(X,Y)$.


Solution: First we should note the range for $(X,Y)$, which is the set of possible values $(x,y)$ which can occur. Clearly $X$ can be 0, 1, 2, or 3 and $Y$ can be 0 or 1, but we'll see that not all 8 combinations $(x,y)$ are possible.

We can find $f(x,y)=P(X=x,Y=y)$ by just writing down the sample space
MATH that we have used before for this process. Then simple counting gives $f(x,y)$ as shown in the following table:

$x$
$f(x,y)$ 0 1 2 3
0 $\frac{1}{8}$ $\frac{2}{8}$ $\frac{1}{8}$ 0
$y$
1 0 $\frac{1}{8}$ $\frac{2}{8}$ $\frac{1}{8}$

For example, $(X,Y)=(0,0)$ iff the outcome is $TTT;(X,Y)=(1,0)$ iff the outcome is either $THT$ or $TTH$.

Note that the range or joint p.f. for $(X,Y)$ is a little awkward to write down here in formulas, so we just use the table.

Marginal Distributions:

We may be given a joint probability function involving more variables than we're interested in using. How can we eliminate any which are not of interest? Look at the first example above. If we're only interested in $X$, and don't care what value $Y$ takes, we can see that MATH so MATH Similarly MATH and MATH

The distribution of $X$ obtained in this way from the joint distribution is called the marginal probability function of $X$:

$x$ 0 1 2
$f(x)$ .3 .3 .4

In the same way, if we were only interested in $Y$, we obtain MATH since $X$ can be 0, 1, or 2 when $Y=1$. The marginal probability function of $Y$ would be:

$y$ 1 2
$f(y)$ .6 .4

Our notation for marginal probability functions is still inadequate. What is $f(1)$? As soon as we substitute a number for $x$ or $y$, we don't know which variable we're referring to. For this reason, we generally put a subscript on the $f$ to indicate whether it is the marginal probability function for the first or second variable. So $f_{1}(1)$ would be $P(X=1)=.3$, while $f_{2}(1)$ would be $P(Y=1)=.6$.

In general, to find $f_{1}(x)$ we add over all values of $y$ where $X=x$, and to find $f_{2}(y)$ we add over all values of $x$ with $Y=y$. Then
MATH

This reasoning can be extended beyond two variables. For example, with 3 variables MATH,

$f_{1}(x_{1})$ would be MATH and

MATH would be MATH

Independent Random Variables:

For events $A$ and $B$, we have defined $A$ and $B$ to be independent iff
$P(AB)=P(A)~~P(B)$. This definition can be extended to random variables $(X,Y)$

Definition

$X$ and $Y$ are independent random variables iff MATH for all values $(x,y)$

Definition

In general, MATH are independent random variables iff
MATH

In our first example $X$ and $Y$ are not independent since MATH for any of the 6 combinations of $(x,y)$ values; e.g., $f(1,1)=.2$ but MATH. Be careful applying this definition. You can only conclude that $X$ and $Y$ are independent after checking all $(x,y)$ combinations. Even a single case where MATH makes $X$ and $Y$ dependent.

Conditional Probability Functions:

Again we can extend a definition from events to random variables. For events $A$ and $B$, recall that MATH. Since MATH, we make the following definition.

Definition

The conditional probability function of $X$ given $Y=y$ is MATH.
Similarly, MATH (provided, of course, the denominator is not zero).

In our first example let us find $f(x|Y=1)$.

MATH This gives:

$x$ 0 1 2
$f(x|Y=1)$ MATH MATH MATH

As you would expect, marginal and conditional probability functions are probability functions in that they are always $\geq0$ and their sum is 1.

Functions of Variables:

In an example earlier, your final mark in a course might be a function of the 3 variables $X_{1},X_{2},X_{3}$ - assignment, midterm, and exam marks Note_1 . Indeed, we often encounter problems where we need to find the probability distribution of a function of two or more r.v.'s. The most general method for finding the probability function for some function of random variables $X$ and $Y$ involves looking at every combination $(x,y)$ to see what value the function takes. For example, if we let $U=2(Y-X)$ in our example, the possible values of $U$ are seen by looking at the value of $U=2(y-x)$ for each $(x,y)$ in the range of $(X,Y)$.

$x$
$u$ 0 1 2
1 2 0 -2
$y$
2 4 2 0

MATH

The probability function of $U$ is thus

$u$ -2 0 2 4
$f(u)$ .3 .3 .2 .2

For some functions it is possible to approach the problem more systematically. One of the most common functions of this type is the total. Let $T=X+Y$. This gives:

$x$
$t$ 0 1 2
1 1 2 3
$y$
2 2 3 4

Then MATH, for example. Continuing in this way, we get

$t$ 1 2 3 4
$f(t)$ .1 .4 .4 .1

(We are being a little sloppy with our notation by using "$f$" for both $f(t)$ and $f(x,y)$. No confusion arises here, but better notation would be to write $f_{T}(t)$ for $P(T=t)$.) In fact, to find $P(T=t)$ we are simply adding the probabilities for all $(x,y)$ combinations with $x+y=t$. This could be written as: MATH

However, if $x+y=t$, then $y=t-x$. To systematically pick out the right combinations of $(x,y)$, all we really need to do is sum over values of $x$ and then substitute $t-x$ for $y$. Then,
MATH

So $P(T=3)$ would be

MATH

(note $f(0,3)=0$ since $Y$ can't be 3.)

We can summarize the method of finding the probability function for a function $U=g(X,Y)$ of two random variables $X$ and $Y$ as follows:

Let $f(x,y)=P(X=x,Y=y)$ be the probability function for $(X,Y)$. Then the probability function for $U$ is MATH This can also be extended to functions of three or more r.v.'s MATH: MATH (Note: Do not get confused between the functions $f$ and $g$ in the above: $f(x,y)$ is the joint probability function of the r.v.'s $X,Y$ whereas $U=g(X,Y)$ defines the "new" random variable that is a function of $X$ and $Y$, and whose distribution we want to find.)

This completes the introduction of the basic ideas for multivariate distributions. As we look at harder problems that involve some algebra, refer back to these simpler examples if you find the ideas no longer making sense to you.

Example: Let $X$ and $Y$ be independent random variables having Poisson distributions with averages (means) of $\mu_{1}$ and $\mu_{2}$ respectively. Let $T=X+Y$. Find its probability function, $f(t)$.


Solution: We first need to find $f(x,y)$. Since $X$ and $Y$ are independent we know MATH Using the Poisson probability function, MATH where $x$ and $y$ can equal 0, 1, 2, $\dots$. Now, MATH Then
MATH

To evaluate this sum, factor out constant terms and try to regroup in some form which can be evaluated by one of our summation techniques.

MATH

If we had a $t!$ on the top inside the MATH, the sum would be of the form MATH. This is the right hand side of the binomial theorem. Multiply top and bottom by $t!$ to get: MATH

Take a common denominator of $\mu_{2}$ to get

MATH

Note that we have just shown that the sum of 2 independent Poisson random variables also has a Poisson distribution.

Example: Three sprinters, $A,B$ and $C$, compete against each other in 10 independent 100 m. races. The probabilities of winning any single race are .5 for $A$, .4 for $B$, and .1 for $C$. Let $X_{1},X_{2}$ and $X_{3}$ be the number of races $A,B$ and $C$ win.

  1. Find the joint probability function, MATH

  2. Find the marginal probability function, $f_{1}(x_{1})$

  3. Find the conditional probability function, $f(x_{2}|x_{1})$

  4. Are $X_{1}$ and $X_{2}$ independent? Why?

  5. Let $T=X_{1}+X_{2}$. Find its probability function, $f(t)$.


Solution: Before starting, note that MATH since there are 10 races in all. We really only have two variables since MATH. However it is convenient to use $x_{3}$ to save writing and preserve symmetry.

  1. The reasoning will be similar to the way we found the binomial distribution in Chapter 6 except that there are now 3 types of outcome. There are MATH different outcomes (i.e. results for races 1 to 10) in which there are $x_{1}$ wins by $A,x_{2}$ by $B$, and $x_{3}$ by $C$. Each of these arrangements has a probability of (.5) multiplied $x_{1}$ times, (.4) $x_{2}$ times, and (.1) $x_{3}$ times in some order;

    i.e., MATH

    MATH

    The range for MATH is triples MATH where each $x_{i}$ is an integer between 0 and 10, and where MATH.

  2. It would also be acceptable to drop $x_{3}$ as a variable and write down the probability function for $X_{1},X_{2}$ only; this is MATH because of the fact that $X_{3}$ must equal $10-X_{1}-X_{2}$. For this probability function MATH and $x_{1}+x_{2}\leq10$. This simplifies finding $f_{1}(x_{1})$ a little . We now have MATH. The limits of summation need care: $x_{2}$ could be as small as $0$, but since $x_{1}+x_{2}\leq10$, we also require $x_{2}\leq10-x_{1}$. (E.g., if $x_{1}=7 $ then $B$ can win $0,1,2$, or 3 races.) Thus,

    MATH

    (Hint: In MATH the 2 terms in the denominator add to the term in the numerator, if we ignore the ! sign.) Multiply top and bottom by MATH This givesMATH

    Here $f_{1}(x_{1})$ is defined for MATH.


Note: While this derivation is included as an example of how to find marginal distributions by summing a joint probability function, there is a much simpler method for this problem. Note that each race is either won by $A $ (``success'') or it is not won by $A$ (``failure''). Since the races are independent and $X_{1}$ is now just the number of ``success'' outcomes, $X_{1}$ must have a binomial distribution, with $n=10$ and $p=.5$.

Hence MATH for MATH, as above.

  1. Remember that MATH, so that MATH For any given value of $x_{1},~x_{2}$ ranges through MATH (So the range of $X_{2}$ depends on the value $x_{1}$, which makes sense: if $B$ wins $x_{1}$ races then the most $A$ can win is $10-x_{1}$.)


Note: As in (b), this result can be obtained more simply by general reasoning. Once we are given that $A$ wins $x_{1}$ races, the remaining $(10-x_{1})$ races are all won by either $B$ or $C$. For these races, $B$ wins $\frac{4}{5}$ of the time and $C~\frac{1}{5}$ of the time, because MATH and $P(C)=.1$; i.e., $B$ wins 4 times as often as $C$. More formally MATHMATH from the binomial distribution.

  1. $X_{1}$ and $X_{2}$ are clearly not independent since the more races $A$ wins, the fewer races there are for $B$ to win. More formally,MATH (In general, if the range for $X_{1}$ depends on the value of $X_{2}$, then $X_{1}$ and $X_{2}$ cannot be independent.)

  2. If $T=X_{1}+X_{2}$ then MATH

    The upper limit on $x_{1}$ is $t$ because, for example, if $t=7$ then $A$ could not have won more than 7 races. Then MATH

    What do we need to multiply by on the top and bottom? Can you spot it before looking below? MATH

Exercise: Explain to yourself how this answer can be obtained from the binomial distribution, as we did in the notes following parts (b) and (c).

The following problem is similar to conditional probability problems that we solved in Chapter 4. Now we are dealing with events defined in terms of random variables. Earlier results give us things like MATH

Example: In an auto parts company an average of $\mu$ defective parts are produced per shift. The number, $X$, of defective parts produced has a Poisson distribution. An inspector checks all parts prior to shipping them, but there is a 10% chance that a defective part will slip by undetected. Let $Y$ be the number of defective parts the inspector finds on a shift. Find $f(x|y)$. (The company wants to know how many defective parts are produced, but can only know the number which were actually detected.)


Solution: Think of $X=x$ being event $A$ and $Y=y$ being event $B$; we want to find $P(A|B)$. To do this we'll use MATH We know MATH Also, for a given number $x$ of defective items produced, the number, $Y$, detected has a binomial distribution with $n=y$ and $p=.9$, assuming each inspection takes place independently. Then MATH Therefore MATH To get $f(x|y)$ we'll need $f_{2}(y)$. We have MATH ($x\geq y$ since the number of defective items produced can't be less than the number detected.) MATH We could fit this into the summation result MATH by writing $\mu^{x}$ as $\mu ^{x-y}\mu^{y}$. Then

MATH

Problems:

  1. The joint probability function of $\left( X,Y\right) $ is:

    $x$
    $f(x,y)$ 0 1 2
    0 .09 .06 .15
    $y$ 1 .15 .05 .20
    2 .06 .09 .15

    1. Are $X$ and $Y$ independent? Why?

    2. Tabulate the conditional probability function, MATH.

    3. Tabulate the probability function of $D=X-Y$.

  2. In problem 6.14, given that $x$ sales were made in a 1 hour period, find the probability function for $Y$, the number of calls made in that hour.

  3. $X$ and $Y$ are independent, with MATH and
    MATH. Let $T=X+Y$. Find the probability function, $f(t)$. You may use the result MATH.

Multinomial Distribution

There is only this one multivariate model distribution introduced in this course, though other multivariate distributions exist. The multinomial distribution defined below is very important. It is a generalization of the binomial model to the case where each trial has $k$ possible outcomes.

Physical Setup: This distribution is the same as binomial except there are $k$ types of outcome rather than two. An experiment is repeated independently $n$ times with $k$ distinct types of outcome each time. Let the probabilities of these $k$ types be MATH each time. Let $X_{1}$ be the number of times the $1^{\QTR{rm}{st}}$ type occurs, $X_{2}$ the number of times the $2^{\QTR{rm}{nd}}$ occurs, $\cdots$, $X_{k}$ the number of times the $k^{\QTR{rm}{th}}$ type occurs. Then MATH has a multinomial distribution.

Notes:

  1. MATH

  2. MATH,

If we wish we can drop one of the variables (say the last), and just note that $X_{k}$ equals MATH.


Illustrations:

  1. In the example of Section 8.1 with sprinters A,B, and C running 10 races we had a multinomial distribution with $n=10$ and $k=3$.

  2. Suppose student marks are given in letter grades as A, B, C, D, or F. In a class of 80 students the number getting A, B, ..., F might have a multinomial distribution with $n=80$ and $k=5$.


Joint Probability Function: The joint probability function of $X_{1},\dots,X_{k}$ is given by extending the argument in the sprinters example from $k=3$ to general $k$. There are MATH different outcomes of the $n$ trials in which $x_{1}$ are of the $1^{\QTR{rm}{st}}$ type, $x_{2}$ are of the $2^{\QTR{rm}{nd}}$ type, etc. Each of these arrangements has probability MATH since $p_{1}$ is multiplied $x_{1}$ times in some order, etc. MATH The restriction on the $x_{i}$'s are $x_{i}=0,1,\cdots,n$ and MATH.

As a check that MATH we use the multinomial theorem to get MATH

We have already seen one example of the multinomial distribution in the sprinter example.

Here is another simple example.

Example: Every person is one of four blood types: A, B, AB and O. (This is important in determining, for example, who may give a blood transfusion to a person.) In a large population let the fraction that has type A, B, AB and O, respectively, be MATH. Then, if $n$ persons are randomly selected from the population, the numbers MATH of types A, B, AB, O have a multinomial distribution with $k=4$ (In Caucasian people the values of the $p_{i}$'s are approximately MATH)

Remark: We sometimes use the notation MATH to indicate that MATH have a multinomial distribution.

Remark: For some types of problems its helpful to write formulas in terms of MATH and MATH using the fact that MATH In this case we can write the joint p.f. as MATH but we must remember then that MATH satisfy the condition MATH.

The multinomial distribution can also arise in combination with other models, and students often have trouble recognizing it then.


Example: A potter is producing teapots one at a time. Assume that they are produced independently of each other and with probability $p$ the pot produced will be "satisfactory"; the rest are sold at a lower price. The number, $X$, of rejects before producing a satisfactory teapot is recorded. When 12 satisfactory teapots are produced, what is the probability the 12 values of $X$ will consist of six 0's, three 1's, two 2's and one value which is $\geq3$?


Solution: Each time a "satisfactory" pot is produced the value of $X$ falls in one of the four categories $X=0,X=1,X=2,X\geq3$. Under the assumptions given in this question, $X$ has a geometric distribution with MATH so we can find the probability for each of these categories. We have $P(X=x)=f(x)$ for $0,1,2,$ and we can obtain MATH in various ways:

  1. MATH

    since we have a geometric series.

  2. MATH With some re-arranging, this also gives $(1-p)^{3}$.

  3. The only way to have $X\geq3$ is to have the first 3 pots produced all being rejects. MATH (3 consecutive rejects) = MATH

Reiterating that each time a pot is successfully produced, the value of $X$ falls in one of 4 categories $(0,1,2,or~\geq3)$, we see that the probability asked for is given by a multinomial distribution, MultMATH: MATH

Problems:

  1. An insurance company classifies policy holders as class A,B,C, or D. The probabilities of a randomly selected policy holder being in these categories are .1, .4, .3 and .2, respectively. Give expressions for the probability that 25 randomly chosen policy holders will include

    1. 3A's, 11B's, 7C's, and 4D's.

    2. 3A's and 11B's.

    3. 3A's and 11B's, given that there are 4D's.

  2. Chocolate chip cookies are made from batter containing an average of 0.6 chips per c.c. Chips are distributed according to the conditions for a Poisson process. Each cookie uses 12 c.c. of batter. Give expressions for the probabilities that in a dozen cookies:

    1. 3 have fewer than 5 chips.

    2. 3 have fewer than 5 chips and 7 have more than 9.

    3. 3 have fewer than 5 chips, given that 7 have more than 9.

Markov Chains

Consider a sequence of (discrete) random variables $X_{1},X_{2},\ldots$ each of which takes integer values $1,2,\ldots N$ (called states). We assume that for a certain matrix $P$ (called the transition probability matrix), the conditional probabilities are given by corresponding elements of the matrix; i.e. MATH and furthermore that the chain only uses the last state occupied in determining its future; i.e. that MATH for all MATH and $l=2,3,...$. Then the sequence of random variables $X_{n}$ is called a Markov Note_2 Chain. Markov Chain models are the most common simple models for dependent variables, and are used to predict weather as well as movements of security prices. They allow the future of the process to depend on the present state of the process, but the past behaviour can influence the future only through the present state.

Example. Rain-No rain

Suppose that the probability that tomorrow is rainy given that today is not raining is $\alpha$ (and it does not otherwise depend on whether it rained in the past) and the probability that tomorrow is dry given that today is rainy is $\beta.$ If tomorrow's weather depends on the past only through whether today is wet or dry, we can define random variables MATH (beginning at some arbitrary time origin, day $n=0$ ). Then the random variables $X_{n},n=0,1,2,...$ form a Markov chain with $N=2$ possible states and having probability transition matrix MATH

Properties of the Transition Matrix $P$

Note that $P_{ij}\geq0$ for all $i,j$ and MATH for all $i.$ This last property holds because given that $X_{n}=i,$ $X_{n+1}$ must occupy one of the states $j=1,2,...,N.$

The distribution of $X_{n}$

Suppose that the chain is started by randomly choosing a state for $X_{0}$ with distribution MATH. Then the distribution of $X_{1}$ is given by MATH and this is the $j^{\prime}th$ element of the vector $q^{\prime}P$ where $\underline{q}$ is the column vector of values $q_{i}$. To obtain the distribution at time $n=1,$ premultiply the transition matrix $P$ by a vector representing the distribution at time $n=0.$ Similarly the distribution of $X_{2}$ is the vector MATH where $P^{2}$ is the product of the matrix $P$ with itself and the distribution of $X_{n}$ is MATH Under very general conditions, it can be shown that these probabilities converge because the matrix $P^{n} $ converges pointwise to a limiting matrix as MATH In fact, in many such cases, the limit does not depend on the initial distribution $\underline{q}$ because the limiting matrix has all of its rows identical and equal to some vector of probabilities $\underline{\pi}.$ Identifying this vector $\underline{\pi}$ when convergence holds is reasonably easy.

Definition

A limiting distribution of a Markov chain is a vector ($\underline{\pi }$ say) of long run probabilities of the individual states so MATH Now let us suppose that convergence to this distribution holds for a particular initial distribution $\underline{q}$ so we assume that MATH Then notice that MATH but also MATH so MATH must have the property that MATH Any limiting distribution must have this property and this makes it easy in many examples to identify the limiting behaviour of the chain.

Definition

A stationary distribution of a Markov chain is the column vector ($\underline{\pi}$ say) of probabilities of the individual states such that MATH.

Example: (weather continued)

Let us return to the weather example in which the transition probabilities are given by the matrix MATH What is the long-run proportion of rainy days? To determine this we need to solve the equations MATH subject to the conditions that the values $\pi_{0},\pi_{1}$ are both probabilities (non-negative) and add to one. It is easy to see that the solution isMATH which is intuitively reasonable in that it says that the long-run probability of the two states is proportional to the probability of a switch to that state from the other. So the long-run probability of a dry day is the limit MATH You might try verifying this by computing the powers of the matrix $P^{n}$ for $n=1,2,....$ and show that $P^{n}$ approaches the matrix MATH as MATH There are various mathematical conditions under which the limiting distribution of a Markov chain unique and independent of the initial state of the chain but roughly they assert that the chain is such that it forgets the more and more distant past.

Example (Gene Model)

TA simple form of inheritance of traits occurs when a trait is governed by a pair of genes $A$ and $a.$An individual may have an $AA$ of an $Aa$ combination (in which case they are indistinguishable in appearance, or "$A$ dominates $a')$. Let us call an AA individual dominant, $aa,$ recessive and $Aa$ hybrid. When two individuals mate, the offspring inherits one gene of the pair from each parent, and we assume that these genes are selected at random. Now let us suppose that two individuals of opposite sex selected at random mate, and then two of their offspring mate, etc. Here the state is determined by a pair of individuals, so the states of our process can be considered to be objects like $(AA,Aa)$ indicating that one of the pair is $AA$ and the other is $Aa$ (we do not distinguish the order of the pair, or male and female-assuming these genes do not depend on the sex of the individual)

Number State
1 $(AA,AA)$
2 $(AA,Aa)$
3 $(AA,aa)$
4 $(Aa,Aa)$
5 $Aa,aa)$
6 $(aa,aa)$

For example, consider the calculation of MATH In this case each offspring has probability $1/2$ of being a dominant $AA$, and probability of $1/2$ of being a hybrid ($Aa$). If two offspring are selected independently from this distribution the possible pairs are MATH with probabilities $1/4,1/2,1/4$ respectively. So the transitions have probabilities below:MATH

and transition probability matrixMATH $\allowbreak$ What is the long-run behaviour in such a system? For example, the two-generation transition probabilities are given by MATH which seems to indicate a drift to one or other of the extreme states 1 or 6. To confirm the long-run behaviour calculate and : MATH which shows that eventually the chain is absorbed in either of state 1 or state 6, with the probability of absorption depending on the initial state. This chain, unlike the ones studied before, has more than one possible stationary distributions, for example, MATH and MATHand in these circumstances the chain does not have the same limiting distribution regardless of the initial state.

Extension of Expectation to Multivariate Distributions

It is easy to extend the definition of expectation to multiple variables. Generalizing MATH leads to the definition of expected value in the multivariate case

Definition

MATH and MATH

As before, these represent the average value of $g(X,Y)$ and MATH.

Example: Let the joint probability function, $f(x,y)$, be given by

$x$
$f(x,y)$ 0 1 2
1 .1 .2 .3
$y$ 2 .2 .1 .1

Find $E(XY)$ and $E(X)$.

Solution:MATH

To find $E(X)$ we have a choice of methods. First, taking $g(x,y)=x$ we getMATH Alternatively, since $E(X)$ only involves $X$, we could find $f_{1}(x)$ and use MATH


Example: In the example of Section 8.1 with sprinters A, B, and C we had (using only $X_{1}$ and $X_{2}$ in our formulas) MATH where A wins $x_{1}$ times and B wins $x_{2}$ times in 10 races. Find MATH.



Solution: This will be similar to the way we derived the mean of the binomial distribution but, since this is a multinomial distribution, we'll be using the multinomial theorem to sum. MATH Let $y_{1}=x_{1}-1$ and $y_{2}=x_{2}-1$ in the sum and we obtain MATH


Property of Multivariate Expectation: It is easily proved (make sure you can do this) that MATH This can be extended beyond 2 functions $g_{1}$ and $g_{2}$, and beyond 2 variables $X$ and $Y$.

Relationships between Variables:

Independence is a "yes/no" way of defining a relationship between variables. We all know that there can be different types of relationships between variables which are dependent. For example, if $X$ is your height in inches and $Y$ your height in centimetres the relationship is one-to-one and linear. More generally, two random variables may be related (non-independent) in a probabilistic sense. For example, a person's weight $Y$ is not an exact linear function of their height $X$, but $Y$ and $X$ are nevertheless related. We'll look at two ways of measuring the strength of the relationship between two random variables. The first is called covariance.

Definition

The covariance of $X$ and $Y$, denoted $\QTR{rm}{Cov}(X,Y)$ or $\sigma_{XY}$, is MATH

For calculation purposes this definition is usually harder to use than the formula which follows, which is proved noting that MATH

Example:

In the example with joint probability function MATH find Cov $(X,Y)$.


Solution: We previously calculated $E(XY)=1.4$ and $E(X)=1.1$. Similarly, MATH MATH

Exercise: Calculate the covariance of $X_{1}$ and $X_{2}$ for the sprinter example. We have already found that MATH = 18. The marginal distributions of $X_{1}$ and of $X_{2}$ are models for which we've already derived the mean. If your solution takes more than a few lines you're missing an easier solution.


Interpretation of Covariance:

  1. Suppose large values of $X$ tend to occur with large values of $Y$ and small values of $X$ with small values of $Y$. Then MATH and MATH will tend to be of the same sign, whether positive or negative. Thus MATH will be positive. Hence Cov $(X,Y)>0$. For example in Figure bivariatenormal we see several hundred points plotted. Notice that the majority of the points are in the two quadrants (lower left and upper right) labelled with "+" so that for these MATH A minority of points are in the other two quadrants labelled "-" and for these MATH. Moreover the points in the latter two quadrants appear closer to the mean $(\mu_{X},\mu_{Y})$ indicating that on average, over all points generated MATH Presumably this implies that over the joint distribution of $(X,Y),$ MATH or $Cov(X,Y)>0.$


bivariate_normal.eps
Random points ($X,Y)$ with covariance 0.5, variances 1.

For example of $X=$person's height and $Y=$person's weight, then these two random variables will have positive covariance.

  1. Suppose large values of $X$ tend to occur with small values of $Y$ and small values of $X$ with large values of $Y$. Then MATH and MATH will tend to be of opposite signs. Thus MATH tends to be negative. Hence Cov $(X,Y)<0$. For example see Figure bivariatenormal2


bivariate_normal2.eps
Covariance=-0.5, variances=1

    For example if $X=$thickness of attic insulation in a house and $Y=$heating cost for the house, then $Cov(X,Y)<0.$

Theorem

If $X$ and $Y$ are independent then Cov $(X,Y)=0$.



Proof: Recall MATH. Let $X$ and $Y$ be independent.
Then MATH. MATH

The following theorem gives a direct proof the result above, and is useful in many other situations.

Theorem

Suppose random variables $X$ and $Y$ are independent. Then, if $g_{1}(X)$ and $g_{2}(Y)$ are any two functions, MATH

Proof: Since $X$ and $Y$ are independent, MATH. Thus MATH \framebox[0.10in]{}

To prove result (3) above, we just note that if $X$ and $Y$ are independent then MATH

Caution: This result is not reversible. If Cov $(X,Y)=0$ we can not conclude that $X$ and $Y$ are independent. For example suppose that the random variable $Z$ is uniformly distributed on the values MATH and define $X=\sin(2\pi Z)$ and $Y=\cos(2\pi Z).$ It is easy to see that Cov$(X,Y)=0$ but the two random variables $X,Y$ are clearly related because the points $(X,Y)$ are always on a circle.


Example: Let $(X,Y)$ have the joint probability function MATH; i.e. $(X,Y)$ only takes 3 values.

$x$ 0 1 2
$f_{1}(x)$ .2 .6 .2

and

$y$ 0 1
$f_{2}(y)$ .4 .6

are marginal probability functions. Since MATH therefore, $X$ and $Y$ are not independent. However, MATHMATH So $X$ and $Y$ have covariance 0 but are not independent. If Cov $(X,Y)=0$ we say that $X$ and $Y$ are uncorrelated, because of the definition of correlation Note_3 given below.

  1. The actual numerical value of Cov $(X,Y)$ has no interpretation, so covariance is of limited use in measuring relationships.


Exercise:

  1. Look back at the example in which $f(x,y)$ was tabulated and Cov $(X,Y)=-.14$. Considering how covariance is interpreted, does it make sense that Cov $(X,Y)$ would be negative?

  2. Without looking at the actual covariance for the sprinter exercise, would you expect Cov MATH to be positive or negative? (If A wins more of the 10 races, will B win more races or fewer races?)


We now consider a second, related way to measure the strength of relationship between $X$ and $Y$.

Definition

The correlation coefficient of $X$ and $Y$ is MATH

The correlation coefficient measures the strength of the linear relationship between $X$ and $Y$ and is simply a rescaled version of the covariance, scaled to lie in the interval $[-1,1].$ You can attempt to guess the correlation between two variables based on a scatter diagram of values of these variables at the web page
http://statweb.calpoly.edu/chance/applets/guesscorrelation/GuessCorrelation.html
For example in Figure guesscorrelation I guessed a correlation of -0.9 whereas the true correlation coefficient generating these data was $\rho=-0.91.$


guess_correlation.jpg
Guessing the correlation based on a scatter diagram of points

Properties of $\rho$:

  1. Since $\sigma_{X}$ and $\sigma_{Y}$, the standard deviations of $X$ and $Y$, are both positive, $\rho$ will have the same sign as Cov $(X,Y)$. Hence the interpretation of the sign of $\rho$ is the same as for Cov $(X,Y)$, and $\rho=0$ if $X$ and $Y$ are independent. When $\rho=0$ we say that $X$ and $Y$ are uncorrelated.

  2. $-1\leq\rho\leq1$ and as MATH the relation between $X$ and $Y$ becomes one-to-one and linear.

Proof: Define a new random variable $S=X+tY$, where $t$ is some real number. We'll show that the fact that Var$(S)\geq0$ leads to 2) above. We haveMATH

Since $Var(S)\geq0$ for any real number $t,$ this quadratic equation must have at most one real root (value of $t$ for which it is zero). Therefore MATH leading to the inequality MATH To see that $\rho\pm1$ corresponds to a one-to-one linear relationship between $X$ and $Y$, note that $\rho=\pm1$ corresponds to a zero discriminant in the quadratic equation. This means that there exists one real number $t^{\ast}$ for which MATH But for Var$(X+t^{\ast}Y)$ to be zero, $X+t^{\ast}Y$ must equal a constant $c$. Thus $X$ and $Y$ satisfy a linear relationship.


Exercise: Calculate $\rho$ for the sprinter example. Does your answer make sense? (You should already have found Cov MATH in a previous exercise, so little additional work is needed.)


Problems:

    1. The joint probability function of $(X,Y)$ is:

      $x$
      $f(x,y)$ 0 1 2
      0 .06 .15 .09
      $y$
      1 .14 .35 .21

      Calculate the correlation coefficient, $\rho$. What does it indicate about the relationship between $X$ and $Y$?

    2. Suppose that $X$ and $Y$ are random variables with joint probability function:

      $x$
      $f(x,y)$ 2 4 6
      -1 1/8 1/4 $p$
      $y$
      1 1/4 1/8 $\frac{1}{4}-p$

      1. For what value of $p$ are $X$ and $Y$ uncorrelated?

      2. Show that there is no value of $p$ for which $X$ and $Y$ are independent.

    Mean and Variance of a Linear Combination of Random Variables

    Many problems require us to consider linear combinations of random variables; examples will be given below and in Chapter 9. Although writing down the formulas is somewhat tedious, we give here some important results about their means and variances.


    Results for Means:

    1. MATH, when $a$ and $b$ are constants. (This follows from the definition of expectation.) In particular, MATH and MATH.

    2. Let $a_{i}$ be constants (real numbers) and MATH. Then MATH. In particular, MATH.

    3. Let MATH be random variables which have mean $\mu$. (You can imagine these being some sample results from an experiment such as recording the number of occupants in cars travelling over a toll bridge.) The sample mean is MATH. Then MATH.

    Proof: From (2), MATH. Thus

    MATH


    Results for Covariance:

    1. Cov MATH

    2. Cov MATH where $a,b,c,$ and $d$ are constants.

    Proof:

    MATH This type of result can be generalized, but gets messy to write out.


    Results for Variance:

    MATH

    Proof:MATH


    Exercise: Try to prove this result by writing MATH as Cov MATH and using properties of covariance.

    1. Let $X$ and $Y$ be independent. Since Cov MATH, result 1. gives MATH i.e., for independent variables, the variance of a sum is the sum of the variances. Also note MATH i.e., for independent variables, the variance of a difference is the sum of the variances.

    2. Let $a_{i}$ be constants and Var MATH. Then MATH This is a generalization of result 1. and can be proved using either of the methods used for 1.

    3. Special cases of result 3. are:

      1. If MATH are independent then Cov MATH, so that MATH

      2. If MATH are independent and all have the same variance $\sigma^{2}$, then MATH

    Proof of 4 (b): MATH. From 4(a), Var MATH. Using Var MATH, we get:
    $~$MATH


    Remark: This result is a very important one in probability and statistics. To recap, it says that if $X_{1},\dots,X_{n}$ are independent r.v.'s with the same mean $\mu$ and some variance $\sigma^{2}$, then the sample mean MATH has MATH This shows that the average $\bar{X}$ of $n$ random variables with the same distribution is less variable than any single observation $X_{i}$, and that the larger $n$ is the less variability there is. This explains mathematically why, for example, that if we want to estimate the unknown mean height $\mu$ in a population of people, we are better to take the average height for a random sample of $n=10$ persons than to just take the height of one randomly selected person. A sample of $n=20$ persons would be better still. There are interesting applets at the url http://users.ece.gatech.edu/users/gtz/java/samplemean/notes.html and http://www.ds.unifi.it/VL/VL_EN/applets/BinomialCoinExperiment.html which allows one to sample and explore the rate at which the sample mean approaches the expected value. In Chapter 9 we will see how to decide how large a sample we should take for a certain degree of precision. Also note that as MATH, which means that $\bar{X}$ becomes arbitrarily close to $\mu$. This is sometimes called the "law of averages". There is a formal theorem which supports the claim that for large sample sizes, sample means approach the expected value, called the "law of large numbers".

    Indicator Variables

    The results for linear combinations of random variables provide a way of breaking up more complicated problems, involving mean and variance, into simpler pieces using indicator variables; an indicator variable is just a binary variable (0 or 1) that indicates whether or not some event occurs. We'll illustrate this important method with 3 examples.

    Example: Mean and Variance of a Binomial R.V.

    Let $X\sim Bi(n,p)$ in a binomial process. Define new variables $X_{i}$ by:

    $X_{i}$ = 0 if the $i^{\QTR{rm}{th}}$ trial was a failure
    $X_{i}$ = 1 if the $i^{\QTR{rm}{th}}$ trial was a success.

    i.e. $X_{i}$ indicates whether the outcome "success" occurred on the $i^{\QTR{rm}{th}}$ trial. The trick we use is that the total number of successes, $X$, is the sum of the $X_{i}$'s: MATH

    We can find the mean and variance of $X_{i}$ and then use our results for the mean and variance of a sum to get the mean and variance of $X$. First, MATH But $f(1)=p$ since the probability of success is $p$ on each trial. MATH. Since $X_{i}=0$ or 1, $X_{i}=X_{i}^{2}$, and therefore

    MATH Thus MATH

    In the binomial distribution the trials are independent so the $X_{i}$'s are also independent. Thus MATH

    These, of course, are the same as we derived previously for the mean and variance of the binomial distribution. Note how simple the derivation here is!


    Remark: If $X_{i}$ is a binary random variable with MATH then $E(X_{i})=p$ and Var$(X_{i})=p(1-p)$, as shown above. (Note that $X_{i}\sim Bi(1,p)$ is actually a binomial r.v.) In some problems the $X_{i}$'s are not independent, and then we also need covariances.


    Example: Let $X$ have a hypergeometric distribution. Find the mean and variance of $X$.


    Solution: As above, let us think of the setting, which involves drawing $n$ items at random from a total of $N$, of which $r$ are "$S$" and $N-r$ are "$F'$ items. Define MATH

    Then MATH as for the binomial example, but now the $X_{i}$'s are dependent. (For example, what we get on the first draw affects the probabilities of $S$ and $F$ for the second draw, and so on.) Therefore we need to find Cov$(X_{i},X_{j})$ for $i\neq j$ as well as $E(X_{i})$ and Var$(X_{i})$ in order to use our formula for the variance of a sum.

    We see first that $P(X_{i}=1)=r/N$ for each of $i=1,\dots,n$. (If the draws are random then the probability an $S$ occurs in draw $i$ is just equal to the probability position $i$ is an $S$ when we arrange $r$ $S$'s and $N-r$ $F$'s in a row.) This immediately gives MATH since MATH The covariance of $X_{i}$ and $X_{j}(i\neq j)$ is equal to MATH, so we need MATH The probability of an $S$ on both draws $i$ and $j$ is just MATH Thus, MATH (Does it make sense that Cov MATH is negative? If you draw a success in draw $i$, are you more or less likely to have a success on draw $j$?) Now we find $E(X)$ and Var$(X)$. First, MATH Before finding Var $(X)$, how many combinations $X_{i},X_{j}$ are there for which $i<j$? Each $i$ and $j$ takes values from $1,2,\cdots,n$ so there are $\binom{n}{2}$ different combinations of $(i,j)$ values. Each of these can only be written in 1 way to make $i<j$. $\text{Therefore }$ There are $\binom{n}{2}$ combinations with $i<j$ (e.g. if $i=1,2,3$ and $j=1,2,3$, the combinations with $i<j$ are (1,2) (1,3) and (2,3). So there are $\binom{3}{2}=3$ different combinations.)

    Now we can find MATH


    In the last two examples, we know $f(x)$, and could have found $E(X)$ and Var$(X)$ without using indicator variables. In the next example $f(x)$ is not known and is hard to find, but we can still use indicator variables for obtaining $\mu$ and $\sigma^{2}$. The following example is a famous problem in probability.


    Example: We have $N$ letters to $N$ different people, and $N$ envelopes addressed to those $N$ people. One letter is put in each envelope at random. Find the mean and variance of the number of letters placed in the right envelope.


    Solution:

    MATH Then MATH is the number of correctly placed letters. Once again, the $X_{i}$'s are dependent (Why?).
    First MATH (since there is 1 chance in $N$ that letter $i$ will be put in envelope $i$) and then, MATH


    Exercise: Before calculating cov MATH, what sign do you expect it to have? (If letter $i$ is correctly placed does that make it more or less likely that letter $j$ will be placed correctly?)

    Next, MATH (As in the last example, this is the only non-zero term in the sum.) Now, MATH since once letter $i$ is correctly placed there is 1 chance in $N-1$ of letter $j$ going in envelope $j$. MATH For the covariance,MATH (Common sense often helps in this course, but we have found no way of being able to say this result is obvious. On average 1 letter will be correctly placed and the variance will be 1, regardless of how many letters there are.)

    Problems:

    1. The joint probability function of $(X,Y)$ is given by: MATH Calculate $E(X)$, Var $(X)$, Cov $(X,Y)$ and Var $(3X-2Y)$. You may use the fact that $E(Y)=.7$ and Var $(Y)$ = .21 without verifying these figures.

    2. In a row of 25 switches, each is considered to be "on" or "off". The probability of being on is .6 for each switch, independently of other switch. Find the mean and variance of the number of unlike pairs among the 24 pairs of adjacent switches.

    3. Suppose Var $(X)=1.69$, Var $(Y)=4$, $\rho=0.5$; and let $U=2X-\ Y$. Find the standard deviation of $U$.

    4. Let MATH be uncorrelated random variables with mean 0 and variance $\sigma^{2}$. Let MATH. Find Cov MATH for $i=1,2,3,\cdots,n$ and Var MATH.

    5. A plastic fabricating company produces items in strips of 24, with the items connected by a thin piece of plastic:

      Item 1
      --
      Item 2
      -- ... --
      Item 24

      A cutting machine then cuts the connecting pieces to separate the items, with the 23 cuts made independently. There is a 10% chance the machine will fail to cut a connecting piece. Find the mean and standard deviation of the number of the 24 items which are completely separate after the cuts have been made. (Hint: Let $X_{i}=0$ if item $i$ is not completely separate, and $X_{i}=1$ if item $i$ is completely separate.)

    Multivariate Moment Generating Functions

    Suppose we have two possibly dependent random variables $(X,Y)$ and we wish to characterize their joint distribution using a moment generating function. Just as the probability function and the cumulative distribution function are, in tis case, functions of two arguments, so is the moment generating function.

    Definition

    The joint moment generating function of $(X,Y)$ is MATH

    Recall that if $X,Y$ happen to be independent, $g_{1}(X)$ and $g_{2}(Y)$ are any two functions, MATH and so with $g_{1}(X)=e^{sX}$ and $g_{2}(Y)=e^{tY}$ we obtain, for independent random variables $X,Y$MATH the product of the moment generating functions of $X$ and $Y$ respectively.

    There is another labour-saving property of moment generating functions for independent random variables. Suppose $X,Y$ are independent random variables with moment generating functions $M_{X}(t)$ and $M_{Y}(t)$. Suppose you wish the moment generating function of the sum $Z=X+Y.$ One could attack this problem by first determining the probability function of $Z,$ MATH and then calculating MATH Evidently lots of work! On the other hand recycling (Eg1g2) with MATH gives MATH

    Theorem

    The moment generating function of the sum of independent random variables is the product of the individual moment generating functions.

    For example if both $X$ $\ $and $Y$ are independent with the same (Bernoulli) distribution MATH then both have moment generating function MATH and so the moment generating function of the sum $Z$ is MATH Similarly if we add another independent Bernoulli the moment generating function is $(1-p+pe^{t})^{3}$ and in general the sum of $n$ independent Bernoulli random variables is $(1-p+pe^{t})^{n},$ the moment generating function of a Binomial$(n,p)$ distribution. This confirms that the sum of independent Bernoulli random variables has a Binomial$(n,p)$ distribution.

    Problems on Chapter 8

    1. The joint probability function of $(X,Y)$ is given by:

      $x$
      $f(x,y)$ 0 1 2
      0 .15 .1 .05
      $y$
      1 .35 .2 .15

      1. Are $X$ and $Y$ independent? Why?

      2. Find MATH and MATH

    2. For a person whose car insurance and house insurance are with the same company, let $X$ and $Y$ represent the number of claims on the car and house policies, respectively, in a given year. Suppose that for a certain group of individuals, $X\sim$ Poisson (mean $=.10$) and $Y\sim$ Poisson (mean $=.05$).

      • If $X$ and $Y$ are independent, find $P(X+Y>1)$ and find the mean and variance of $X+Y$.

      • Suppose it was learned that $P(X=0,Y=0)$ was very close to $.94$. Show why $X$ and $Y$ cannot be independent in this case. What might explain the non-independence?

    3. Consider Problem 2.7 for Chapter 2, which concerned machine recognition of handwritten digits. Recall that $p(x,y)$ was the probability that the number actually written was $x$, and the number identified by the machine was $y$.

      • Are the random variables $X$ and $Y$ independent? Why?

      • What is $P(X=Y)$, that is, the probability that a random number is correctly identified?

      • What is the probability that the number 5 is incorrectly identified?

    4. Blood donors arrive at a clinic and are classified as type A, type O, or other types. Donors' blood types are independent with $P$ (type A) = $p$, $P$ (type O) = $q$, and $P$ (other type) = $1-p-q$. Consider the number, $X$, of type A and the number, $Y$, of type O donors arriving before the $10^{\QTR{rm}{th}}$ other type.

      1. Find the joint probability function, $f(x,y)$

      2. Find the conditional probability function, $f(y|x)$.

    5. Slot machine payouts. Suppose that in a slot machine there are $n+1$ possible outcomes MATH for a single play. A single play costs $1. If outcome $A_{i}$ occurs, you win $\$a_{i}$, for $i=1,\dots,n$. If outcome $A_{n+1}$ occurs, you win nothing. In other words, if outcome MATH occurs your net profit is $a_{i}-1$; if $A_{n+1}$ occurs your net profit is - 1.

      • Give a formula for your expected profit from a single play, if the probabilities of the $n+1$ outcomes are MATH.

      • The owner of the slot machine wants the player's expected profit to be negative. Suppose $n=4$, with MATH. If the slot machine is set to pay $3 when outcome $A_{1}$ occurs, and $5 when either of outcomes $A_{2},A_{3},A_{4}$ occur, determine the player's expected profit per play.

      • The slot machine owner wishes to pay $da_{i}$ dollars when outcome $A_{i}$ occurs, where MATH and $d$ is a number between 0 and 1. The owner also wishes his or her expected profit to be $.05 per play. (The player's expected profit is -.05 per play.) Find $d$ as a function of $n$ and $p_{n+1}$. What is the value of $d$ if $n=10$ and $p_{n+1}=.7$?

    6. Bacteria are distributed through river water according to a Poisson process with an average of 5 per 100 c.c. of water. What is the probability five 50 c.c. samples of water have 1 with no bacteria, 2 with one bacterium, and 2 with two or more?

    7. A box contains 5 yellow and 3 red balls, from which 4 balls are drawn at random without replacement. Let $X$ be the number of yellow balls on the first two draws and $Y$ the number of yellow balls on all 4 draws.

      1. Find the joint probability function, $f(x,y)$.

      2. Are $X$ and $Y$ independent? Justify your answer.

    8. In a quality control inspection items are classified as having a minor defect, a major defect, or as being acceptable. A carton of 10 items contains 2 with minor defects, 1 with a major defect, and 7 acceptable. Three items are chosen at random without replacement. Let $X$ be the number selected with minor defects and $Y$ be the number with major defects.

      1. Find the joint probability function of $X$ and $Y$.

      2. Find the marginal probability functions of $X$ and of $Y$.

      3. Evaluate numerically MATH and MATH.

    9. Let $X$ and $Y$ be discrete random variables with joint probability function MATH for $x=0,1,2,\cdots$ and $y=0,1,2,\cdots$, where $k$ is a positive constant.

      1. Derive the marginal probability function of $X$.

      2. Evaluate $k$.

      3. Are $X$ and $Y$ independent? Explain.

      4. Derive the probability function of $T=X+Y$.

    10. "Thinning" a Poisson process. Suppose that events are produced according to a Poisson process with an average of $\lambda$ events per minute. Each event has a probability $p$ of being a "Type A" event, independent of other events.

      1. Let the random variable $Y$ represent the number of Type A events that occur in a one-minute period. Prove that $Y$ has a Poisson distribution with mean $\lambda p$. (Hint: let $X$ be the total number of events in a 1 minute period and consider the formula just before the last example in Section 8.1).

      2. Lighting strikes in a large forest region occur over the summer according to a Poisson process with $\lambda=3$ strikes per day. Each strike has probability .05 of starting a fire. Find the probability that there are at least 5 fires over a 30 day period.

    11. In a breeding experiment involving horses the offspring are of four genetic types with probabilities:

      Type 1 2 3 4
      Probability 3/16 5/16 5/16 3/16

      A group of 40 independent offspring are observed. Give expressions for the following probabilities:

      1. There are 10 of each type.

      2. The total number of types 1 and 2 is 16.

      3. There are exactly 10 of type 1, given that the total number of types 1 and 2 is 16.

    12. In a particular city, let the random variable $X$ represent the number of children in a randomly selected household, and let $Y$ represent the number of female children. Assume that the probability a child is female is $0.5$, regardless of what size household they live in, and that the marginal distribution of $X$ is as follows:
      MATH

      • Determine $E(X)$.

      • Find the probability function for the number of girls $Y$ in a randomly chosen family. What is $E(Y)$?

    13. In a particular city, the probability a call to a fire department concerns various situations is as given below:

      1. fire in a detached home - $p_{1}=.10$
      2. fire in a semi detached home - $p_{2}=.05$
      3. fire in an apartment or multiple unit residence - $p_{3}=.05$
      4. fire in a non-residential building - $p_{4}=.15$
      5. non-fire-related emergency - $p_{5}=.15$
      6. false alarm - $p_{6}=.50$

      In a set of 10 calls, let $X_{1},...,X_{6}$ represent the numbers of calls of each of types $1,...,6$.

      • Give the joint probability function for $X_{1},...,X_{6}$.

      • What is the probability there is at least one apartment fire, given that there are 4 fire-related calls?

      • If the average costs of calls of types $1,...,6$ are (in $100 units) 5, 5, 7, 20, 4, 2 respectively, what is the expected total cost of the 10 calls?

    14. Suppose $X_{1},\dots,X_{n}$ have joint p.f. MATH. If MATH is a function such that for all MATH in the range of $f$,

      then show that MATH

    15. Let $X$ and $Y$ be random variables with Var $(X)=13$, Var$(Y)=34 $ and $\rho=-0.7$. Find VarMATH.

    16. Let $X$ and $Y$ have a trinomial distribution with joint probability function MATH and $x+y\leq n$. Let $T=X+Y$.

      1. What distribution does $T$ have? Either explain why or derive this result.

      2. For the distribution in (a), what is $E(T)$ and Var$(T)$?

      3. Using (b) find Cov$(X,Y)$, and explain why you expect it to have the sign it does.

    17. Jane and Jack each toss a fair coin twice. Let $X$ be the number of heads Jane obtains and $Y$ the number of heads Jack obtains. Define $U=X+Y $ and $V=X-Y$.

      1. Find the means and variances of $U$ and $V$.

      2. Find Cov $(U,V)$

      3. Are $U$ and $V$ independent? Why?

    18. A multiple choice exam has 100 questions, each with 5 possible answers. One mark is awarded for a correct answer and 1/4 mark is deducted for an incorrect answer. A particular student has probability $p_{i}$ of knowing the correct answer to the $i^{\QTR{rm}{th}}$ question, independently of other questions.

      1. Suppose that on a question where the student does not know the answer, he or she guesses randomly. Show that his or her total mark has mean $\sum p_{i}$ and variance MATH.

      2. Show that the total mark for a student who refrains from guessing also has mean $\sum p_{i}$, but with variance MATH. Compare the variances when all $p_{i}$'s equal (i) .9, (ii) .5.

    19. Let $X$ and $Y$ be independent random variables with $E(X)=E(Y)=0 $, Var$(X)=1$ and Var $(Y)=2$. Find CovMATH.

    20. An automobile driveshaft is assembled by placing parts A, B and C end to end in a straight line. The standard deviation in the lengths of parts A, B and C are 0.6, 0.8, and 0.7 respectively.

      1. Find the standard deviation of the length of the assembled driveshaft.

      2. What percent reduction would there be in the standard deviation of the assembled driveshaft if the standard deviation of the length of part B were cut in half?

    21. The inhabitants of the beautiful and ancient canal city of Pentapolis live on 5 islands separated from each other by water. Bridges cross from one island to another as shown.
      p134.ps

      On any day, a bridge can be closed, with probability $p$, for restoration work. Assuming that the 8 bridges are closed independently, find the mean and variance of the number of islands which are completely cut off because of restoration work.

    22. A Markov chain has a doubly stochastic transition matrix if both the row sums and the column sums of the transition matrix $~P~$ are all $1$. Show that for such a Markov chain, the uniform distribution on $\{1,2,\ldots,N\}$ is a stationary distribution.

    23. A salesman sells in three cities A,B, and C. He never sells in the same city on successive weeks. If he sells in city A, then the next week he always sells in B. However if he sells in either A or B, then the next week he is twice as likely to sell in city A as in the other city. What is the long-run proportion of time he spends in each of the three cities?

    24. Find MATH where MATH

    25. Suppose $X$ and $Y$ are independent having Poisson distributions with parameters $\lambda_{1}$ and $\lambda_{2}$ respectively. Use moment generating functions to identify the distribution of the sum $X+Y.$

    26. Waterloo in January is blessed by many things, but not by good weather. There are never two nice days in a row. If there is a nice day, we are just as likely to have snow as rain the next day. If we have snow or rain, there is an even chance of having the same the next day. If there is change from snow or rain, only half of the time is this a change to a nice day. Taking as states the kinds of weather R, N, and S. the transition probabilities $P$ are as followsMATH If today is raining, find the probability of Rain, Nice, Snow three days from now. Find the probabilities of the three states in five days, given (1) today is raining (ii) today is nice (iii) today is snowing.

    27. (One-card Poker) A card game, which, for the purposes of this question we will call Metzler Poker, is played as follows. Each of 2 players bets an initial $1 and is dealt a card from a deck of 13 cards numbered 1-13. Upon looking at their card, each player then decides (unaware of the other's decision) whether or not to increase their bet by $5 (to a total stake of $6). If both increase the stake ("raise"), then the player with the higher card wins both stakes-i.e. they get their money back as well as the other player's $6. If one person increases and the other does not, then the player who increases automatically wins the pot (i.e. money back+$1). If neither person increases the stake, then it is considered a draw-each player receives their own $1 back. Suppose that Player A and B have similar strategies, based on threshold numbers {a,b} they have chosen between 1 and 13. A chooses to raise whenever their card is greater than or equal to a and B whenever B's card is greater than or equal to b.

      1. Suppose B always raises (so that b=1). What is the expected value of A's win or loss for the different possible values of a=1,2,...,13.

      2. Suppose a and b are arbitrary. Given that both players raise, what is the probability that A wins? What is the expected value of A's win or loss?

      3. Suppose you know that b=11. Find your expected win or loss for various values of a and determine the optimal value. How much do you expect to make or lose per game under this optimal strategy?

    28. (Searching a database) Suppose that we are given 3 records, $R_{1},R_{2},R_{3}$ initially stored in that order. The cost of accessing the j'th record in the list is j so we would like the more frequently accessed records near the front of the list. Whenever a request for record j is processed, the "move-to-front" heuristic stores $R_{j}$ at the front of the list and the others in the original order. For example if the first request is for record $2,$ then the records will be re-stored in the order $R_{2},R_{1},R_{3}.$ Assume that on each request, record $j$ is requested with probability $p_{j},$ for $j=1,2,3.$

      1. Show that if $X_{t}=$the permutation that obtains after $j$ requests for records (e.g. $X_{2}=(2,1,3)$), then $X_{t}$ is a Markov chain.

      2. Find the stationary distribution of this Markov chain. (Hint: what is the probability that $X_{t}$ takes the form $(2,\ast,\ast)$?).

      3. Find the expected long-run cost per record accessed in the case MATH respectively.

      4. How does this expected long-run cost compare with keeping the records in random order, and with keeping them in order of decreasing values of $p_{j}$(only possible if we know $p_{j}).$