Basic Terminology and Techniques

Multinomial Distribution

Markov Chains

Extension of Expectation to Multivariate Distributions

Mean and Variance of a Linear Combination of Random Variables

Multivariate Moment Generating Functions

Problems on Chapter 8

Joint Probability Functions:

Marginal Distributions:

Independent Random Variables:

Conditional Probability Functions:

Functions of Variables:

Problems:

Relationships between Variables:

Indicator Variables

Problems:

Example. Rain-No rain

Properties of the Transition Matrix

The distribution of $X_{n}$

Definition

Example: (weather continued)

Example (Gene Model)

8. Discrete Multivariate Distributions

Many problems involve more than a single random variable. When there are multiple random variables associated with an experiment or process we usually denote them as $X,Y,\dots$ or as $X_{1},X_{2},\dots$ . For example, your final mark in a course might involve $X_{1}$ -- your assignment mark, $X_{2}$ -- your midterm test mark, and $X_{3}$ -- your exam mark. We need to extend the ideas introduced for single variables to deal with multivariate problems. In this course we only consider discrete multivariate problems, though continuous multivariate variables are also common in daily life (e.g. consider a person's height and weight ).

To introduce the ideas in a simple setting, we'll first consider an example in which there are only a few possible values of the variables. Later we'll apply these concepts to more complex examples. The ideas themselves are simple even though some applications can involve fairly messy algebra.

First, suppose there are two r.v.'s and , and define the function MATH

We call the joint probability function of . In general, MATH if there are r.v.'s $X_{1},\dots,X_{n}$ .

The properties of a joint probability function are similar to those for a single variable; for two r.v.'s we have $f(x,y)\geq0$ for all and

MATH

Example: Consider the following numerical example, where we show in a table.

0 1 2

1 .1 .2 .3

2 .2 .1 .1

for example and

We can check that is a proper joint probability function since $f(x,y)\geq0$ for all 6 combinations of and the sum of these 6 probabilities is 1. When there are only a few values for and it is often easier to tabulate than to find a formula for it. We'll use this example below to illustrate other definitions for multivariate distributions, but first we give a short example where we need to find .

Example: Suppose a fair coin is tossed 3 times. Define the r.v.'s = number of Heads and if occurs on the first toss. Find the joint probability function for .

Solution: First we should note the range for , which is the set of possible values which can occur. Clearly can be 0, 1, 2, or 3 and can be 0 or 1, but we'll see that not all 8 combinations are possible.

We can find by just writing down the sample space
that we have used before for this process. Then simple counting gives as shown in the following table:

0 1 2 3

0 $\frac{1}{8}$ $\frac{2}{8}$ $\frac{1}{8}$ 0

1 0 $\frac{1}{8}$ $\frac{2}{8}$ $\frac{1}{8}$

For example, iff the outcome is iff the outcome is either or .

Note that the range or joint p.f. for is a little awkward to write down here in formulas, so we just use the table.

We may be given a joint probability function involving more variables than we're interested in using. How can we eliminate any which are not of interest? Look at the first example above. If we're only interested in , and don't care what value takes, we can see that MATH so Similarly and

The distribution of obtained in this way from the joint distribution is called the marginal probability function of :

0 1 2

.3 .3 .4

	0	1	2
	.3	.3	.4

In the same way, if we were only interested in , we obtain MATH since can be 0, 1, or 2 when . The marginal probability function of would be:

1 2

.6 .4

	1	2
	.6	.4

Our notation for marginal probability functions is still inadequate. What is ? As soon as we substitute a number for or , we don't know which variable we're referring to. For this reason, we generally put a subscript on the to indicate whether it is the marginal probability function for the first or second variable. So $f_{1}(1)$ would be , while $f_{2}(1)$ would be .

In general, to find $f_{1}(x)$ we add over all values of where , and to find $f_{2}(y)$ we add over all values of with . Then
MATH

This reasoning can be extended beyond two variables. For example, with 3 variables ,

$f_{1}(x_{1})$ would be MATH and

would be MATH

For events and , we have defined and to be independent iff
. This definition can be extended to random variables

Definition

and are independent random variables iff for all values

Definition

In general, are independent random variables iff
MATH

In our first example and are not independent since for any of the 6 combinations of values; e.g., but . Be careful applying this definition. You can only conclude that and are independent after checking all combinations. Even a single case where makes and dependent.

Again we can extend a definition from events to random variables. For events and , recall that . Since , we make the following definition.

Definition

The conditional probability function of given is .
Similarly, (provided, of course, the denominator is not zero).

In our first example let us find .

MATH This gives:

0 1 2

As you would expect, marginal and conditional probability functions are probability functions in that they are always $\geq0$ and their sum is 1.

In an example earlier, your final mark in a course might be a function of the 3 variables $X_{1},X_{2},X_{3}$ - assignment, midterm, and exam marks Note_1 . Indeed, we often encounter problems where we need to find the probability distribution of a function of two or more r.v.'s. The most general method for finding the probability function for some function of random variables and involves looking at every combination to see what value the function takes. For example, if we let in our example, the possible values of are seen by looking at the value of for each in the range of .

0 1 2

1 2 0 -2

2 4 2 0

MATH

The probability function of is thus

-2 0 2 4

.3 .3 .2 .2

	-2	0	2	4
	.3	.3	.2	.2

For some functions it is possible to approach the problem more systematically. One of the most common functions of this type is the total. Let . This gives:

0 1 2

1 1 2 3

2 2 3 4

Then , for example. Continuing in this way, we get

1 2 3 4

.1 .4 .4 .1

	1	2	3	4
	.1	.4	.4	.1

(We are being a little sloppy with our notation by using "" for both and . No confusion arises here, but better notation would be to write $f_{T}(t)$ for .) In fact, to find we are simply adding the probabilities for all combinations with . This could be written as: MATH

However, if , then . To systematically pick out the right combinations of , all we really need to do is sum over values of and then substitute for . Then,
MATH

So would be

MATH

(note since can't be 3.)

We can summarize the method of finding the probability function for a function of two random variables and as follows:

Let be the probability function for . Then the probability function for is MATH This can also be extended to functions of three or more r.v.'s : MATH (Note: Do not get confused between the functions and in the above: is the joint probability function of the r.v.'s whereas defines the "new" random variable that is a function of and , and whose distribution we want to find.)

This completes the introduction of the basic ideas for multivariate distributions. As we look at harder problems that involve some algebra, refer back to these simpler examples if you find the ideas no longer making sense to you.

Example: Let and be independent random variables having Poisson distributions with averages (means) of $\mu_{1}$ and $\mu_{2}$ respectively. Let . Find its probability function, .

Solution: We first need to find . Since and are independent we know MATH Using the Poisson probability function, MATH where and can equal 0, 1, 2, $\dots$ . Now, MATH Then
MATH

To evaluate this sum, factor out constant terms and try to regroup in some form which can be evaluated by one of our summation techniques.

MATH

If we had a on the top inside the , the sum would be of the form . This is the right hand side of the binomial theorem. Multiply top and bottom by to get: MATH

Take a common denominator of $\mu_{2}$ to get

MATH

Note that we have just shown that the sum of 2 independent Poisson random variables also has a Poisson distribution.

Example: Three sprinters, and , compete against each other in 10 independent 100 m. races. The probabilities of winning any single race are .5 for , .4 for , and .1 for . Let $X_{1},X_{2}$ and $X_{3}$ be the number of races and win.

Find the joint probability function,
Find the marginal probability function, $f_{1}(x_{1})$
Find the conditional probability function, $f(x_{2}|x_{1})$
Are $X_{1}$ and $X_{2}$ independent? Why?
Let $T=X_{1}+X_{2}$ . Find its probability function, .

Solution: Before starting, note that since there are 10 races in all. We really only have two variables since . However it is convenient to use $x_{3}$ to save writing and preserve symmetry.

The reasoning will be similar to the way we found the binomial distribution in Chapter 6 except that there are now 3 types of outcome. There are different outcomes (i.e. results for races 1 to 10) in which there are $x_{1}$ wins by $A,x_{2}$ by , and $x_{3}$ by . Each of these arrangements has a probability of (.5) multiplied $x_{1}$ times, (.4) $x_{2}$ times, and (.1) $x_{3}$ times in some order;

i.e.,

The range for is triples where each $x_{i}$ is an integer between 0 and 10, and where .
It would also be acceptable to drop $x_{3}$ as a variable and write down the probability function for $X_{1},X_{2}$ only; this is because of the fact that $X_{3}$ must equal $10-X_{1}-X_{2}$ . For this probability function and $x_{1}+x_{2}\leq10$ . This simplifies finding $f_{1}(x_{1})$ a little . We now have . The limits of summation need care: $x_{2}$ could be as small as , but since $x_{1}+x_{2}\leq10$ , we also require $x_{2}\leq10-x_{1}$ . (E.g., if $x_{1}=7$ then can win , or 3 races.) Thus,

(Hint: In the 2 terms in the denominator add to the term in the numerator, if we ignore the ! sign.) Multiply top and bottom by This gives

Here $f_{1}(x_{1})$ is defined for .

Note: While this derivation is included as an example of how to find marginal distributions by summing a joint probability function, there is a much simpler method for this problem. Note that each race is either won by (``success'') or it is not won by (``failure''). Since the races are independent and $X_{1}$ is now just the number of ``success'' outcomes, $X_{1}$ must have a binomial distribution, with and .

Hence for , as above.

Remember that , so that For any given value of $x_{1},~x_{2}$ ranges through (So the range of $X_{2}$ depends on the value $x_{1}$ , which makes sense: if wins $x_{1}$ races then the most can win is $10-x_{1}$ .)

Note: As in (b), this result can be obtained more simply by general reasoning. Once we are given that wins $x_{1}$ races, the remaining $(10-x_{1})$ races are all won by either or . For these races, wins $\frac{4}{5}$ of the time and $C~\frac{1}{5}$ of the time, because and ; i.e., wins 4 times as often as . More formally MATH MATH from the binomial distribution.

$X_{1}$ and $X_{2}$ are clearly not independent since the more races wins, the fewer races there are for to win. More formally, (In general, if the range for $X_{1}$ depends on the value of $X_{2}$ , then $X_{1}$ and $X_{2}$ cannot be independent.)
If $T=X_{1}+X_{2}$ then

The upper limit on $x_{1}$ is because, for example, if then could not have won more than 7 races. Then

What do we need to multiply by on the top and bottom? Can you spot it before looking below?

Exercise: Explain to yourself how this answer can be obtained from the binomial distribution, as we did in the notes following parts (b) and (c).

The following problem is similar to conditional probability problems that we solved in Chapter 4. Now we are dealing with events defined in terms of random variables. Earlier results give us things like MATH

Example: In an auto parts company an average of $\mu$ defective parts are produced per shift. The number, , of defective parts produced has a Poisson distribution. An inspector checks all parts prior to shipping them, but there is a 10% chance that a defective part will slip by undetected. Let be the number of defective parts the inspector finds on a shift. Find . (The company wants to know how many defective parts are produced, but can only know the number which were actually detected.)

Solution: Think of being event and being event ; we want to find . To do this we'll use MATH We know Also, for a given number of defective items produced, the number, , detected has a binomial distribution with and , assuming each inspection takes place independently. Then MATH Therefore MATH To get we'll need $f_{2}(y)$ . We have MATH ( $x\geq y$ since the number of defective items produced can't be less than the number detected.) MATH We could fit this into the summation result by writing $\mu^{x}$ as $\mu ^{x-y}\mu^{y}$ . Then

MATH

The joint probability function of $\left( X,Y\right)$ is:

0 1 2

0 .09 .06 .15

1 .15 .05 .20

2 .06 .09 .15

Are and independent? Why?
Tabulate the conditional probability function, .
Tabulate the probability function of .

In problem 6.14, given that sales were made in a 1 hour period, find the probability function for , the number of calls made in that hour.
and are independent, with and
. Let . Find the probability function, . You may use the result .

There is only this one multivariate model distribution introduced in this course, though other multivariate distributions exist. The multinomial distribution defined below is very important. It is a generalization of the binomial model to the case where each trial has possible outcomes.

Physical Setup: This distribution is the same as binomial except there are types of outcome rather than two. An experiment is repeated independently times with distinct types of outcome each time. Let the probabilities of these types be each time. Let $X_{1}$ be the number of times the $1^{\QTR{rm}{st}}$ type occurs, $X_{2}$ the number of times the $2^{\QTR{rm}{nd}}$ occurs, $\cdots$ , $X_{k}$ the number of times the $k^{\QTR{rm}{th}}$ type occurs. Then has a multinomial distribution.

Notes:

If we wish we can drop one of the variables (say the last), and just note that $X_{k}$ equals .

Illustrations:

In the example of Section 8.1 with sprinters A,B, and C running 10 races we had a multinomial distribution with and .
Suppose student marks are given in letter grades as A, B, C, D, or F. In a class of 80 students the number getting A, B, ..., F might have a multinomial distribution with and .

Joint Probability Function: The joint probability function of $X_{1},\dots,X_{k}$ is given by extending the argument in the sprinters example from to general . There are different outcomes of the trials in which $x_{1}$ are of the $1^{\QTR{rm}{st}}$ type, $x_{2}$ are of the $2^{\QTR{rm}{nd}}$ type, etc. Each of these arrangements has probability since $p_{1}$ is multiplied $x_{1}$ times in some order, etc. MATH The restriction on the $x_{i}$ 's are $x_{i}=0,1,\cdots,n$ and .

As a check that we use the multinomial theorem to get MATH

We have already seen one example of the multinomial distribution in the sprinter example.

Here is another simple example.

Example: Every person is one of four blood types: A, B, AB and O. (This is important in determining, for example, who may give a blood transfusion to a person.) In a large population let the fraction that has type A, B, AB and O, respectively, be . Then, if persons are randomly selected from the population, the numbers of types A, B, AB, O have a multinomial distribution with (In Caucasian people the values of the $p_{i}$ 's are approximately )

Remark: We sometimes use the notation to indicate that have a multinomial distribution.

Remark: For some types of problems its helpful to write formulas in terms of and using the fact that MATH In this case we can write the joint p.f. as but we must remember then that satisfy the condition .

The multinomial distribution can also arise in combination with other models, and students often have trouble recognizing it then.

Example: A potter is producing teapots one at a time. Assume that they are produced independently of each other and with probability the pot produced will be "satisfactory"; the rest are sold at a lower price. The number, , of rejects before producing a satisfactory teapot is recorded. When 12 satisfactory teapots are produced, what is the probability the 12 values of will consist of six 0's, three 1's, two 2's and one value which is $\geq3$ ?

Solution: Each time a "satisfactory" pot is produced the value of falls in one of the four categories $X=0,X=1,X=2,X\geq3$ . Under the assumptions given in this question, has a geometric distribution with MATH so we can find the probability for each of these categories. We have for and we can obtain in various ways:

since we have a geometric series.
With some re-arranging, this also gives $(1-p)^{3}$ .
The only way to have $X\geq3$ is to have the first 3 pots produced all being rejects. (3 consecutive rejects) =

Reiterating that each time a pot is successfully produced, the value of falls in one of 4 categories $(0,1,2,or~\geq3)$ , we see that the probability asked for is given by a multinomial distribution, Mult: MATH

Problems:

An insurance company classifies policy holders as class A,B,C, or D. The probabilities of a randomly selected policy holder being in these categories are .1, .4, .3 and .2, respectively. Give expressions for the probability that 25 randomly chosen policy holders will include
1. 3A's, 11B's, 7C's, and 4D's.
2. 3A's and 11B's.
3. 3A's and 11B's, given that there are 4D's.
Chocolate chip cookies are made from batter containing an average of 0.6 chips per c.c. Chips are distributed according to the conditions for a Poisson process. Each cookie uses 12 c.c. of batter. Give expressions for the probabilities that in a dozen cookies:
1. 3 have fewer than 5 chips.
2. 3 have fewer than 5 chips and 7 have more than 9.
3. 3 have fewer than 5 chips, given that 7 have more than 9.

Consider a sequence of (discrete) random variables $X_{1},X_{2},\ldots$ each of which takes integer values $1,2,\ldots N$ (called states). We assume that for a certain matrix (called the transition probability matrix), the conditional probabilities are given by corresponding elements of the matrix; i.e. MATH and furthermore that the chain only uses the last state occupied in determining its future; i.e. that MATH for all and . Then the sequence of random variables $X_{n}$ is called a Markov Note_2 Chain. Markov Chain models are the most common simple models for dependent variables, and are used to predict weather as well as movements of security prices. They allow the future of the process to depend on the present state of the process, but the past behaviour can influence the future only through the present state.

Suppose that the probability that tomorrow is rainy given that today is not raining is $\alpha$ (and it does not otherwise depend on whether it rained in the past) and the probability that tomorrow is dry given that today is rainy is $\beta.$ If tomorrow's weather depends on the past only through whether today is wet or dry, we can define random variables MATH (beginning at some arbitrary time origin, day ). Then the random variables $X_{n},n=0,1,2,...$ form a Markov chain with possible states and having probability transition matrix MATH

Note that $P_{ij}\geq0$ for all and for all This last property holds because given that $X_{n}=i,$ $X_{n+1}$ must occupy one of the states

Suppose that the chain is started by randomly choosing a state for $X_{0}$ with distribution . Then the distribution of $X_{1}$ is given by MATH and this is the $j^{\prime}th$ element of the vector $q^{\prime}P$ where $\underline{q}$ is the column vector of values $q_{i}$ . To obtain the distribution at time premultiply the transition matrix by a vector representing the distribution at time Similarly the distribution of $X_{2}$ is the vector where $P^{2}$ is the product of the matrix with itself and the distribution of $X_{n}$ is Under very general conditions, it can be shown that these probabilities converge because the matrix $P^{n}$ converges pointwise to a limiting matrix as In fact, in many such cases, the limit does not depend on the initial distribution $\underline{q}$ because the limiting matrix has all of its rows identical and equal to some vector of probabilities $\underline{\pi}.$ Identifying this vector $\underline{\pi}$ when convergence holds is reasonably easy.

A limiting distribution of a Markov chain is a vector ( $\underline{\pi }$ say) of long run probabilities of the individual states so MATH Now let us suppose that convergence to this distribution holds for a particular initial distribution $\underline{q}$ so we assume that MATH Then notice that MATH but also MATH so must have the property that MATH Any limiting distribution must have this property and this makes it easy in many examples to identify the limiting behaviour of the chain.

Definition

A stationary distribution of a Markov chain is the column vector ( $\underline{\pi}$ say) of probabilities of the individual states such that .

Let us return to the weather example in which the transition probabilities are given by the matrix MATH What is the long-run proportion of rainy days? To determine this we need to solve the equations MATH subject to the conditions that the values $\pi_{0},\pi_{1}$ are both probabilities (non-negative) and add to one. It is easy to see that the solution is MATH which is intuitively reasonable in that it says that the long-run probability of the two states is proportional to the probability of a switch to that state from the other. So the long-run probability of a dry day is the limit MATH You might try verifying this by computing the powers of the matrix $P^{n}$ for and show that $P^{n}$ approaches the matrix MATH as There are various mathematical conditions under which the limiting distribution of a Markov chain unique and independent of the initial state of the chain but roughly they assert that the chain is such that it forgets the more and more distant past.

TA simple form of inheritance of traits occurs when a trait is governed by a pair of genes and An individual may have an of an combination (in which case they are indistinguishable in appearance, or " dominates . Let us call an AA individual dominant, recessive and hybrid. When two individuals mate, the offspring inherits one gene of the pair from each parent, and we assume that these genes are selected at random. Now let us suppose that two individuals of opposite sex selected at random mate, and then two of their offspring mate, etc. Here the state is determined by a pair of individuals, so the states of our process can be considered to be objects like indicating that one of the pair is and the other is (we do not distinguish the order of the pair, or male and female-assuming these genes do not depend on the sex of the individual)

Number State

1

2

3

4

5

6

For example, consider the calculation of In this case each offspring has probability of being a dominant , and probability of of being a hybrid (). If two offspring are selected independently from this distribution the possible pairs are with probabilities respectively. So the transitions have probabilities below: MATH

and transition probability matrix MATH $\allowbreak$ What is the long-run behaviour in such a system? For example, the two-generation transition probabilities are given by MATH which seems to indicate a drift to one or other of the extreme states 1 or 6. To confirm the long-run behaviour calculate and : MATH which shows that eventually the chain is absorbed in either of state 1 or state 6, with the probability of absorption depending on the initial state. This chain, unlike the ones studied before, has more than one possible stationary distributions, for example, and and in these circumstances the chain does not have the same limiting distribution regardless of the initial state.

It is easy to extend the definition of expectation to multiple variables. Generalizing leads to the definition of expected value in the multivariate case

Definition

MATH and MATH

As before, these represent the average value of and .

Example: Let the joint probability function, , be given by

0 1 2

1 .1 .2 .3

2 .2 .1 .1

Find and .

Solution: MATH

To find we have a choice of methods. First, taking we get MATH Alternatively, since only involves , we could find $f_{1}(x)$ and use MATH

Example: In the example of Section 8.1 with sprinters A, B, and C we had (using only $X_{1}$ and $X_{2}$ in our formulas) MATH where A wins $x_{1}$ times and B wins $x_{2}$ times in 10 races. Find .

Solution: This will be similar to the way we derived the mean of the binomial distribution but, since this is a multinomial distribution, we'll be using the multinomial theorem to sum. MATH Let $y_{1}=x_{1}-1$ and $y_{2}=x_{2}-1$ in the sum and we obtain MATH

Property of Multivariate Expectation: It is easily proved (make sure you can do this) that MATH This can be extended beyond 2 functions $g_{1}$ and $g_{2}$ , and beyond 2 variables and .

Independence is a "yes/no" way of defining a relationship between variables. We all know that there can be different types of relationships between variables which are dependent. For example, if is your height in inches and your height in centimetres the relationship is one-to-one and linear. More generally, two random variables may be related (non-independent) in a probabilistic sense. For example, a person's weight is not an exact linear function of their height , but and are nevertheless related. We'll look at two ways of measuring the strength of the relationship between two random variables. The first is called covariance.

Definition

The covariance of and , denoted $\QTR{rm}{Cov}(X,Y)$ or $\sigma_{XY}$ , is MATH

For calculation purposes this definition is usually harder to use than the formula which follows, which is proved noting that MATH

Example:

In the example with joint probability function MATH find Cov .

Solution: We previously calculated and . Similarly, MATH

Exercise: Calculate the covariance of $X_{1}$ and $X_{2}$ for the sprinter example. We have already found that = 18. The marginal distributions of $X_{1}$ and of $X_{2}$ are models for which we've already derived the mean. If your solution takes more than a few lines you're missing an easier solution.

Interpretation of Covariance:

Suppose large values of tend to occur with large values of and small values of with small values of . Then and will tend to be of the same sign, whether positive or negative. Thus will be positive. Hence Cov . For example in Figure bivariate_normal we see several hundred points plotted. Notice that the majority of the points are in the two quadrants (lower left and upper right) labelled with "+" so that for these A minority of points are in the other two quadrants labelled "-" and for these . Moreover the points in the latter two quadrants appear closer to the mean $(\mu_{X},\mu_{Y})$ indicating that on average, over all points generated Presumably this implies that over the joint distribution of or

bivariate_normal.eps

Random points ( with covariance 0.5, variances 1.

For example of person's height and person's weight, then these two random variables will have positive covariance.

Suppose large values of tend to occur with small values of and small values of with large values of . Then and will tend to be of opposite signs. Thus tends to be negative. Hence Cov . For example see Figure bivariate_normal2

bivariate_normal2.eps

Covariance=-0.5, variances=1

For example if thickness of attic insulation in a house and heating cost for the house, then

Theorem

If and are independent then Cov .

Proof: Recall . Let and be independent.
Then . MATH

The following theorem gives a direct proof the result above, and is useful in many other situations.

Theorem

Suppose random variables and are independent. Then, if $g_{1}(X)$ and $g_{2}(Y)$ are any two functions, MATH

Proof: Since and are independent, . Thus MATH \framebox[0.10in]{}

To prove result (3) above, we just note that if and are independent then MATH

Caution: This result is not reversible. If Cov we can not conclude that and are independent. For example suppose that the random variable is uniformly distributed on the values and define $X=\sin(2\pi Z)$ and $Y=\cos(2\pi Z).$ It is easy to see that Cov but the two random variables are clearly related because the points are always on a circle.

Example: Let have the joint probability function ; i.e. only takes 3 values.

0 1 2

$f_{1}(x)$ .2 .6 .2

	0	1	2
$f_{1}(x)$	.2	.6	.2

and

0 1

$f_{2}(y)$ .4 .6

are marginal probability functions. Since therefore, and are not independent. However, MATH MATH So and have covariance 0 but are not independent. If Cov we say that and are uncorrelated, because of the definition of correlation Note_3 given below.

	0	1
$f_{2}(y)$	.4	.6

The actual numerical value of Cov has no interpretation, so covariance is of limited use in measuring relationships.

Exercise:

Look back at the example in which was tabulated and Cov . Considering how covariance is interpreted, does it make sense that Cov would be negative?
Without looking at the actual covariance for the sprinter exercise, would you expect Cov to be positive or negative? (If A wins more of the 10 races, will B win more races or fewer races?)

We now consider a second, related way to measure the strength of relationship between and .

Definition

The correlation coefficient of and is

The correlation coefficient measures the strength of the linear relationship between and and is simply a rescaled version of the covariance, scaled to lie in the interval You can attempt to guess the correlation between two variables based on a scatter diagram of values of these variables at the web page
http://statweb.calpoly.edu/chance/applets/guesscorrelation/GuessCorrelation.html
For example in Figure guess_correlation I guessed a correlation of -0.9 whereas the true correlation coefficient generating these data was $\rho=-0.91.$

Guessing the correlation based on a scatter diagram of points

Properties of $\rho$ :

Since $\sigma_{X}$ and $\sigma_{Y}$ , the standard deviations of and , are both positive, $\rho$ will have the same sign as Cov . Hence the interpretation of the sign of $\rho$ is the same as for Cov , and $\rho=0$ if and are independent. When $\rho=0$ we say that and are uncorrelated.
$-1\leq\rho\leq1$ and as the relation between and becomes one-to-one and linear.

Proof: Define a new random variable , where is some real number. We'll show that the fact that Var $(S)\geq0$ leads to 2) above. We have MATH

Since $Var(S)\geq0$ for any real number this quadratic equation must have at most one real root (value of for which it is zero). Therefore MATH leading to the inequality MATH To see that $\rho\pm1$ corresponds to a one-to-one linear relationship between and , note that $\rho=\pm1$ corresponds to a zero discriminant in the quadratic equation. This means that there exists one real number $t^{\ast}$ for which MATH But for Var $(X+t^{\ast}Y)$ to be zero, $X+t^{\ast}Y$ must equal a constant . Thus and satisfy a linear relationship.

Exercise: Calculate $\rho$ for the sprinter example. Does your answer make sense? (You should already have found Cov in a previous exercise, so little additional work is needed.)

Problems:

The joint probability function of is:

0 1 2

0 .06 .15 .09

1 .14 .35 .21

Calculate the correlation coefficient, $\rho$ . What does it indicate about the relationship between and ?

Suppose that and are random variables with joint probability function:

2 4 6

-1 1/8 1/4

1 1/4 1/8 $\frac{1}{4}-p$

For what value of are and uncorrelated?
Show that there is no value of for which and are independent.

Many problems require us to consider linear combinations of random variables; examples will be given below and in Chapter 9. Although writing down the formulas is somewhat tedious, we give here some important results about their means and variances.

Results for Means:

, when and are constants. (This follows from the definition of expectation.) In particular, and .
Let $a_{i}$ be constants (real numbers) and . Then . In particular, .
Let be random variables which have mean $\mu$ . (You can imagine these being some sample results from an experiment such as recording the number of occupants in cars travelling over a toll bridge.) The sample mean is . Then .

Proof: From (2), . Thus

MATH

Results for Covariance:

Cov
Cov where and are constants.

Proof:

MATH This type of result can be generalized, but gets messy to write out.

Results for Variance:

MATH

Proof: MATH

Exercise: Try to prove this result by writing as Cov and using properties of covariance.

Let and be independent. Since Cov , result 1. gives i.e., for independent variables, the variance of a sum is the sum of the variances. Also note i.e., for independent variables, the variance of a difference is the sum of the variances.
Let $a_{i}$ be constants and Var . Then This is a generalization of result 1. and can be proved using either of the methods used for 1.
Special cases of result 3. are:
1. If are independent then Cov , so that
2. If are independent and all have the same variance $\sigma^{2}$ , then

Proof of 4 (b): . From 4(a), Var . Using Var , we get:
MATH

Remark: This result is a very important one in probability and statistics. To recap, it says that if $X_{1},\dots,X_{n}$ are independent r.v.'s with the same mean $\mu$ and some variance $\sigma^{2}$ , then the sample mean has MATH This shows that the average $\bar{X}$ of random variables with the same distribution is less variable than any single observation $X_{i}$ , and that the larger is the less variability there is. This explains mathematically why, for example, that if we want to estimate the unknown mean height $\mu$ in a population of people, we are better to take the average height for a random sample of persons than to just take the height of one randomly selected person. A sample of persons would be better still. There are interesting applets at the url http://users.ece.gatech.edu/users/gtz/java/samplemean/notes.html and http://www.ds.unifi.it/VL/VL_EN/applets/BinomialCoinExperiment.html which allows one to sample and explore the rate at which the sample mean approaches the expected value. In Chapter 9 we will see how to decide how large a sample we should take for a certain degree of precision. Also note that as , which means that $\bar{X}$ becomes arbitrarily close to $\mu$ . This is sometimes called the "law of averages". There is a formal theorem which supports the claim that for large sample sizes, sample means approach the expected value, called the "law of large numbers".

The results for linear combinations of random variables provide a way of breaking up more complicated problems, involving mean and variance, into simpler pieces using indicator variables; an indicator variable is just a binary variable (0 or 1) that indicates whether or not some event occurs. We'll illustrate this important method with 3 examples.

Example: Mean and Variance of a Binomial R.V.

Let $X\sim Bi(n,p)$ in a binomial process. Define new variables $X_{i}$ by:

$X_{i}$ = 0 if the $i^{\QTR{rm}{th}}$ trial was a failure

$X_{i}$ = 1 if the $i^{\QTR{rm}{th}}$ trial was a success.

i.e. $X_{i}$ indicates whether the outcome "success" occurred on the $i^{\QTR{rm}{th}}$ trial. The trick we use is that the total number of successes, , is the sum of the $X_{i}$ 's: MATH

We can find the mean and variance of $X_{i}$ and then use our results for the mean and variance of a sum to get the mean and variance of . First, MATH But since the probability of success is on each trial. . Since $X_{i}=0$ or 1, $X_{i}=X_{i}^{2}$ , and therefore

MATH Thus MATH

In the binomial distribution the trials are independent so the $X_{i}$ 's are also independent. Thus MATH

These, of course, are the same as we derived previously for the mean and variance of the binomial distribution. Note how simple the derivation here is!

Remark: If $X_{i}$ is a binary random variable with then $E(X_{i})=p$ and Var $(X_{i})=p(1-p)$ , as shown above. (Note that $X_{i}\sim Bi(1,p)$ is actually a binomial r.v.) In some problems the $X_{i}$ 's are not independent, and then we also need covariances.

Example: Let have a hypergeometric distribution. Find the mean and variance of .

Solution: As above, let us think of the setting, which involves drawing items at random from a total of , of which are "" and are " items. Define MATH

Then as for the binomial example, but now the $X_{i}$ 's are dependent. (For example, what we get on the first draw affects the probabilities of and for the second draw, and so on.) Therefore we need to find Cov $(X_{i},X_{j})$ for $i\neq j$ as well as $E(X_{i})$ and Var $(X_{i})$ in order to use our formula for the variance of a sum.

We see first that $P(X_{i}=1)=r/N$ for each of $i=1,\dots,n$ . (If the draws are random then the probability an occurs in draw is just equal to the probability position is an when we arrange 's and 's in a row.) This immediately gives MATH since MATH The covariance of $X_{i}$ and $X_{j}(i\neq j)$ is equal to , so we need MATH The probability of an on both draws and is just MATH Thus, MATH (Does it make sense that Cov is negative? If you draw a success in draw , are you more or less likely to have a success on draw ?) Now we find and Var. First, MATH Before finding Var , how many combinations $X_{i},X_{j}$ are there for which ? Each and takes values from $1,2,\cdots,n$ so there are $\binom{n}{2}$ different combinations of values. Each of these can only be written in 1 way to make . $\text{Therefore }$ There are $\binom{n}{2}$ combinations with (e.g. if and , the combinations with are (1,2) (1,3) and (2,3). So there are $\binom{3}{2}=3$ different combinations.)

Now we can find MATH

In the last two examples, we know , and could have found and Var without using indicator variables. In the next example is not known and is hard to find, but we can still use indicator variables for obtaining $\mu$ and $\sigma^{2}$ . The following example is a famous problem in probability.

Example: We have letters to different people, and envelopes addressed to those people. One letter is put in each envelope at random. Find the mean and variance of the number of letters placed in the right envelope.

Solution:

MATH Then is the number of correctly placed letters. Once again, the $X_{i}$ 's are dependent (Why?).
First MATH (since there is 1 chance in that letter will be put in envelope ) and then, MATH

Exercise: Before calculating cov , what sign do you expect it to have? (If letter is correctly placed does that make it more or less likely that letter will be placed correctly?)

Next, (As in the last example, this is the only non-zero term in the sum.) Now, since once letter is correctly placed there is 1 chance in of letter going in envelope . MATH For the covariance, MATH (Common sense often helps in this course, but we have found no way of being able to say this result is obvious. On average 1 letter will be correctly placed and the variance will be 1, regardless of how many letters there are.)

The joint probability function of is given by: Calculate , Var , Cov and Var . You may use the fact that and Var = .21 without verifying these figures.
In a row of 25 switches, each is considered to be "on" or "off". The probability of being on is .6 for each switch, independently of other switch. Find the mean and variance of the number of unlike pairs among the 24 pairs of adjacent switches.
Suppose Var , Var , $\rho=0.5$ ; and let $U=2X-\ Y$ . Find the standard deviation of .
Let be uncorrelated random variables with mean 0 and variance $\sigma^{2}$ . Let . Find Cov for $i=1,2,3,\cdots,n$ and Var .
A plastic fabricating company produces items in strips of 24, with the items connected by a thin piece of plastic:

Item 1
--
Item 2
-- ... --
Item 24

A cutting machine then cuts the connecting pieces to separate the items, with the 23 cuts made independently. There is a 10% chance the machine will fail to cut a connecting piece. Find the mean and standard deviation of the number of the 24 items which are completely separate after the cuts have been made. (Hint: Let $X_{i}=0$ if item is not completely separate, and $X_{i}=1$ if item is completely separate.)

Suppose we have two possibly dependent random variables and we wish to characterize their joint distribution using a moment generating function. Just as the probability function and the cumulative distribution function are, in tis case, functions of two arguments, so is the moment generating function.

Definition

The joint moment generating function of is MATH

Recall that if happen to be independent, $g_{1}(X)$ and $g_{2}(Y)$ are any two functions, MATH and so with $g_{1}(X)=e^{sX}$ and $g_{2}(Y)=e^{tY}$ we obtain, for independent random variables MATH the product of the moment generating functions of and respectively.

There is another labour-saving property of moment generating functions for independent random variables. Suppose are independent random variables with moment generating functions $M_{X}(t)$ and $M_{Y}(t)$ . Suppose you wish the moment generating function of the sum One could attack this problem by first determining the probability function of MATH and then calculating MATH Evidently lots of work! On the other hand recycling (Eg1g2) with MATH gives MATH

Theorem

The moment generating function of the sum of independent random variables is the product of the individual moment generating functions.

For example if both $\$ and are independent with the same (Bernoulli) distribution MATH then both have moment generating function MATH and so the moment generating function of the sum is Similarly if we add another independent Bernoulli the moment generating function is $(1-p+pe^{t})^{3}$ and in general the sum of independent Bernoulli random variables is $(1-p+pe^{t})^{n},$ the moment generating function of a Binomial distribution. This confirms that the sum of independent Bernoulli random variables has a Binomial distribution.

$\frac{1}{8}$

$\frac{2}{8}$

$\frac{1}{8}$

$\frac{1}{8}$

$\frac{2}{8}$

$\frac{1}{8}$

The joint probability function of is given by:

0 1 2

0 .15 .1 .05

1 .35 .2 .15

Are and independent? Why?
Find and

For a person whose car insurance and house insurance are with the same company, let and represent the number of claims on the car and house policies, respectively, in a given year. Suppose that for a certain group of individuals, $X\sim$ Poisson (mean ) and $Y\sim$ Poisson (mean ).
- If and are independent, find and find the mean and variance of .
- Suppose it was learned that was very close to . Show why and cannot be independent in this case. What might explain the non-independence?
Consider Problem 2.7 for Chapter 2, which concerned machine recognition of handwritten digits. Recall that was the probability that the number actually written was , and the number identified by the machine was .
- Are the random variables and independent? Why?
- What is , that is, the probability that a random number is correctly identified?
- What is the probability that the number 5 is incorrectly identified?
Blood donors arrive at a clinic and are classified as type A, type O, or other types. Donors' blood types are independent with (type A) = , (type O) = , and (other type) = . Consider the number, , of type A and the number, , of type O donors arriving before the $10^{\QTR{rm}{th}}$ other type.
1. Find the joint probability function,
2. Find the conditional probability function, .
Slot machine payouts. Suppose that in a slot machine there are possible outcomes for a single play. A single play costs $1. If outcome $A_{i}$ occurs, you win $\$a_{i}$ , for $i=1,\dots,n$ . If outcome $A_{n+1}$ occurs, you win nothing. In other words, if outcome occurs your net profit is $a_{i}-1$ ; if $A_{n+1}$ occurs your net profit is - 1.
- Give a formula for your expected profit from a single play, if the probabilities of the outcomes are .
- The owner of the slot machine wants the player's expected profit to be negative. Suppose , with . If the slot machine is set to pay $3 when outcome $A_{1}$ occurs, and $5 when either of outcomes $A_{2},A_{3},A_{4}$ occur, determine the player's expected profit per play.
- The slot machine owner wishes to pay $da_{i}$ dollars when outcome $A_{i}$ occurs, where and is a number between 0 and 1. The owner also wishes his or her expected profit to be $.05 per play. (The player's expected profit is -.05 per play.) Find as a function of and $p_{n+1}$ . What is the value of if and $p_{n+1}=.7$ ?
Bacteria are distributed through river water according to a Poisson process with an average of 5 per 100 c.c. of water. What is the probability five 50 c.c. samples of water have 1 with no bacteria, 2 with one bacterium, and 2 with two or more?
A box contains 5 yellow and 3 red balls, from which 4 balls are drawn at random without replacement. Let be the number of yellow balls on the first two draws and the number of yellow balls on all 4 draws.
1. Find the joint probability function, .
2. Are and independent? Justify your answer.
In a quality control inspection items are classified as having a minor defect, a major defect, or as being acceptable. A carton of 10 items contains 2 with minor defects, 1 with a major defect, and 7 acceptable. Three items are chosen at random without replacement. Let be the number selected with minor defects and be the number with major defects.
1. Find the joint probability function of and .
2. Find the marginal probability functions of and of .
3. Evaluate numerically and .
Let and be discrete random variables with joint probability function for $x=0,1,2,\cdots$ and $y=0,1,2,\cdots$ , where is a positive constant.
1. Derive the marginal probability function of .
2. Evaluate .
3. Are and independent? Explain.
4. Derive the probability function of .
"Thinning" a Poisson process. Suppose that events are produced according to a Poisson process with an average of $\lambda$ events per minute. Each event has a probability of being a "Type A" event, independent of other events.
1. Let the random variable represent the number of Type A events that occur in a one-minute period. Prove that has a Poisson distribution with mean $\lambda p$ . (Hint: let be the total number of events in a 1 minute period and consider the formula just before the last example in Section 8.1).
2. Lighting strikes in a large forest region occur over the summer according to a Poisson process with $\lambda=3$ strikes per day. Each strike has probability .05 of starting a fire. Find the probability that there are at least 5 fires over a 30 day period.

In a breeding experiment involving horses the offspring are of four genetic types with probabilities:

Type 1 2 3 4

Probability 3/16 5/16 5/16 3/16

Type	1	2	3	4
Probability	3/16	5/16	5/16	3/16

A group of 40 independent offspring are observed. Give expressions for the following probabilities:

There are 10 of each type.
The total number of types 1 and 2 is 16.
There are exactly 10 of type 1, given that the total number of types 1 and 2 is 16.

In a particular city, let the random variable represent the number of children in a randomly selected household, and let represent the number of female children. Assume that the probability a child is female is , regardless of what size household they live in, and that the marginal distribution of is as follows:
- Determine .
- Find the probability function for the number of girls in a randomly chosen family. What is ?

In a particular city, the probability a call to a fire department concerns various situations is as given below:

1. fire in a detached home - $p_{1}=.10$

2. fire in a semi detached home - $p_{2}=.05$

3. fire in an apartment or multiple unit residence - $p_{3}=.05$

4. fire in a non-residential building - $p_{4}=.15$

5. non-fire-related emergency - $p_{5}=.15$

6. false alarm - $p_{6}=.50$

In a set of 10 calls, let $X_{1},...,X_{6}$ represent the numbers of calls of each of types .

Give the joint probability function for $X_{1},...,X_{6}$ .
What is the probability there is at least one apartment fire, given that there are 4 fire-related calls?
If the average costs of calls of types are (in $100 units) 5, 5, 7, 20, 4, 2 respectively, what is the expected total cost of the 10 calls?

Suppose $X_{1},\dots,X_{n}$ have joint p.f. . If is a function such that for all in the range of ,

then show that
Let and be random variables with Var , Var and $\rho=-0.7$ . Find Var.
Let and have a trinomial distribution with joint probability function and $x+y\leq n$ . Let .
1. What distribution does have? Either explain why or derive this result.
2. For the distribution in (a), what is and Var?
3. Using (b) find Cov, and explain why you expect it to have the sign it does.
Jane and Jack each toss a fair coin twice. Let be the number of heads Jane obtains and the number of heads Jack obtains. Define and .
1. Find the means and variances of and .
2. Find Cov
3. Are and independent? Why?
A multiple choice exam has 100 questions, each with 5 possible answers. One mark is awarded for a correct answer and 1/4 mark is deducted for an incorrect answer. A particular student has probability $p_{i}$ of knowing the correct answer to the $i^{\QTR{rm}{th}}$ question, independently of other questions.
1. Suppose that on a question where the student does not know the answer, he or she guesses randomly. Show that his or her total mark has mean $\sum p_{i}$ and variance .
2. Show that the total mark for a student who refrains from guessing also has mean $\sum p_{i}$ , but with variance . Compare the variances when all $p_{i}$ 's equal (i) .9, (ii) .5.
Let and be independent random variables with , Var and Var . Find Cov.
An automobile driveshaft is assembled by placing parts A, B and C end to end in a straight line. The standard deviation in the lengths of parts A, B and C are 0.6, 0.8, and 0.7 respectively.
1. Find the standard deviation of the length of the assembled driveshaft.
2. What percent reduction would there be in the standard deviation of the assembled driveshaft if the standard deviation of the length of part B were cut in half?
The inhabitants of the beautiful and ancient canal city of Pentapolis live on 5 islands separated from each other by water. Bridges cross from one island to another as shown.

On any day, a bridge can be closed, with probability , for restoration work. Assuming that the 8 bridges are closed independently, find the mean and variance of the number of islands which are completely cut off because of restoration work.
A Markov chain has a doubly stochastic transition matrix if both the row sums and the column sums of the transition matrix are all . Show that for such a Markov chain, the uniform distribution on $\{1,2,\ldots,N\}$ is a stationary distribution.
A salesman sells in three cities A,B, and C. He never sells in the same city on successive weeks. If he sells in city A, then the next week he always sells in B. However if he sells in either A or B, then the next week he is twice as likely to sell in city A as in the other city. What is the long-run proportion of time he spends in each of the three cities?
Find where
Suppose and are independent having Poisson distributions with parameters $\lambda_{1}$ and $\lambda_{2}$ respectively. Use moment generating functions to identify the distribution of the sum
Waterloo in January is blessed by many things, but not by good weather. There are never two nice days in a row. If there is a nice day, we are just as likely to have snow as rain the next day. If we have snow or rain, there is an even chance of having the same the next day. If there is change from snow or rain, only half of the time is this a change to a nice day. Taking as states the kinds of weather R, N, and S. the transition probabilities are as follows If today is raining, find the probability of Rain, Nice, Snow three days from now. Find the probabilities of the three states in five days, given (1) today is raining (ii) today is nice (iii) today is snowing.
(One-card Poker) A card game, which, for the purposes of this question we will call Metzler Poker, is played as follows. Each of 2 players bets an initial $1 and is dealt a card from a deck of 13 cards numbered 1-13. Upon looking at their card, each player then decides (unaware of the other's decision) whether or not to increase their bet by $5 (to a total stake of $6). If both increase the stake ("raise"), then the player with the higher card wins both stakes-i.e. they get their money back as well as the other player's $6. If one person increases and the other does not, then the player who increases automatically wins the pot (i.e. money back+$1). If neither person increases the stake, then it is considered a draw-each player receives their own $1 back. Suppose that Player A and B have similar strategies, based on threshold numbers {a,b} they have chosen between 1 and 13. A chooses to raise whenever their card is greater than or equal to a and B whenever B's card is greater than or equal to b.
1. Suppose B always raises (so that b=1). What is the expected value of A's win or loss for the different possible values of a=1,2,...,13.
2. Suppose a and b are arbitrary. Given that both players raise, what is the probability that A wins? What is the expected value of A's win or loss?
3. Suppose you know that b=11. Find your expected win or loss for various values of a and determine the optimal value. How much do you expect to make or lose per game under this optimal strategy?
(Searching a database) Suppose that we are given 3 records, $R_{1},R_{2},R_{3}$ initially stored in that order. The cost of accessing the j'th record in the list is j so we would like the more frequently accessed records near the front of the list. Whenever a request for record j is processed, the "move-to-front" heuristic stores $R_{j}$ at the front of the list and the others in the original order. For example if the first request is for record then the records will be re-stored in the order $R_{2},R_{1},R_{3}.$ Assume that on each request, record is requested with probability $p_{j},$ for
1. Show that if $X_{t}=$ the permutation that obtains after requests for records (e.g. $X_{2}=(2,1,3)$ ), then $X_{t}$ is a Markov chain.
2. Find the stationary distribution of this Markov chain. (Hint: what is the probability that $X_{t}$ takes the form $(2,\ast,\ast)$ ?).
3. Find the expected long-run cost per record accessed in the case respectively.
4. How does this expected long-run cost compare with keeping the records in random order, and with keeping them in order of decreasing values of $p_{j}$ (only possible if we know $p_{j}).$

$X_{i}$	=	0 if the $i^{\QTR{rm}{th}}$ trial was a failure
$X_{i}$	=	1 if the $i^{\QTR{rm}{th}}$ trial was a success.

1. fire in a detached home	- $p_{1}=.10$
2. fire in a semi detached home	- $p_{2}=.05$
3. fire in an apartment or multiple unit residence	- $p_{3}=.05$
4. fire in a non-residential building	- $p_{4}=.15$
5. non-fire-related emergency	- $p_{5}=.15$
6. false alarm	- $p_{6}=.50$