Many problems involve more than a single random variable. When there are
multiple random variables associated with an experiment or process we usually
denote them as
or as
.
For example, your final mark in a course might involve
-- your assignment mark,
-- your midterm test mark, and
-- your exam mark. We need to extend the ideas introduced for single variables
to deal with multivariate problems. In this course we only consider discrete
multivariate problems, though continuous multivariate variables are also
common in daily life (e.g. consider a person's height
and weight
).
To introduce the ideas in a simple setting, we'll first consider an example in which there are only a few possible values of the variables. Later we'll apply these concepts to more complex examples. The ideas themselves are simple even though some applications can involve fairly messy algebra.
First, suppose there are two r.v.'s
and
,
and define the function
We call
the joint probability function of
.
In general,
if there are
r.v.'s
.
The properties of a joint probability function are similar to those for a
single variable; for two r.v.'s we have
for all
and
Example: Consider the following numerical example, where we
show
in a table.
![]() |
||||
![]() |
0 | 1 | 2 | |
1 | .1 | .2 | .3 | |
![]() |
||||
2 | .2 | .1 | .1 |
for example
and
We can check that
is a proper joint probability function since
for all 6 combinations of
and the sum of these 6 probabilities is 1. When there are only a few values
for
and
it is often easier to tabulate
than to find a formula for it. We'll use this example below to illustrate
other definitions for multivariate distributions, but first we give a short
example where we need to find
.
Example: Suppose a fair coin is tossed 3 times. Define the
r.v.'s
= number of Heads and
if
occurs on the first toss. Find the joint probability function for
.
Solution: First we should note the range for
,
which is the set of possible values
which can occur. Clearly
can be 0, 1, 2, or 3 and
can be 0 or 1, but we'll see that not all 8 combinations
are possible.
We can find
by just writing down the sample space
that we have used before for this process. Then simple counting gives
as shown in the following table:
![]() |
||||||
![]() |
0 | 1 | 2 | 3 | ||
0 | ![]() |
![]() |
![]() |
0 | ||
![]() |
||||||
1 | 0 | ![]() |
![]() |
![]() |
For example,
iff the outcome is
iff the outcome is either
or
.
Note that the range or joint p.f. for
is a little awkward to write down here in formulas, so we just use the
table.
We may be given a joint probability function involving more variables than
we're interested in using. How can we eliminate any which are not of interest?
Look at the first example above. If we're only interested in
,
and don't care what value
takes, we can see that
so
Similarly
and
The distribution of
obtained in this way from the joint distribution is called the marginal
probability function of
:
![]() |
0 | 1 | 2 |
![]() |
.3 | .3 | .4 |
In the same way, if we were only interested in
,
we obtain
since
can be 0, 1, or 2 when
.
The marginal probability function of
would be:
![]() |
1 | 2 |
![]() |
.6 | .4 |
Our notation for marginal probability functions is still inadequate. What is
?
As soon as we substitute a number for
or
,
we don't know which variable we're referring to. For this reason, we generally
put a subscript on the
to indicate whether it is the marginal probability function for the first or
second variable. So
would be
,
while
would be
.
In general, to find
we add over all values of
where
,
and to find
we add over all values of
with
.
Then
This reasoning can be extended beyond two variables. For example, with 3
variables
,
would be
and
would be
For events
and
,
we have defined
and
to be independent
iff
.
This definition can be extended to random variables
and
are independent random variables iff
for all values
In general,
are independent random variables
iff
In our first example
and
are not independent since
for any of the 6 combinations of
values; e.g.,
but
.
Be careful applying this definition. You can only conclude that
and
are independent after checking
all
combinations. Even a single case where
makes
and
dependent.
Again we can extend a definition from events to random variables. For events
and
,
recall that
.
Since
,
we make the following definition.
The conditional probability function of
given
is
.
Similarly,
(provided, of course, the denominator is not zero).
In our first example let us
find
.
This gives:
![]() |
0 | 1 | 2 |
![]() |
![]() |
![]() |
![]() |
As you would expect, marginal and conditional probability functions are
probability functions in that they are always
and their sum is 1.
In an example earlier, your final mark in a course might be a function of the
3 variables
- assignment, midterm, and exam
marks Note_1 . Indeed, we often
encounter problems where we need to find the probability distribution of a
function of two or more r.v.'s. The most general method for finding the
probability function for some function of random variables
and
involves looking at every combination
to see what value the function takes. For example, if we let
in our example, the possible values of
are seen by looking at the value of
for each
in the range of
.
![]() |
||||
![]() |
0 | 1 | 2 | |
1 | 2 | 0 | -2 | |
![]() |
||||
2 | 4 | 2 | 0 |
The probability function of
is thus
![]() |
-2 | 0 | 2 | 4 |
![]() |
.3 | .3 | .2 | .2 |
For some functions it is possible to approach the problem more systematically.
One of the most common functions of this type is the total. Let
.
This gives:
![]() |
||||
![]() |
0 | 1 | 2 | |
1 | 1 | 2 | 3 | |
![]() |
||||
2 | 2 | 3 | 4 |
Then
,
for example. Continuing in this way, we get
![]() |
1 | 2 | 3 | 4 |
![]() |
.1 | .4 | .4 | .1 |
(We are being a little sloppy with our notation by using
"" for
both
and
.
No confusion arises here, but better notation would be to write
for
.)
In fact, to find
we are simply adding the probabilities for all
combinations with
.
This could be written as:
However, if
,
then
.
To systematically pick out the right combinations of
,
all we really need to do is sum over values of
and then substitute
for
.
Then,
So
would be
(note
since
can't be 3.)
We can summarize the method of finding the probability function for a function
of two random variables
and
as follows:
Let
be the probability function for
.
Then the probability function for
is
This can also be extended to functions of three or more r.v.'s
:
(Note: Do not get confused between the functions
and
in the above:
is the joint probability function of the r.v.'s
whereas
defines the "new" random variable that is a function of
and
,
and whose distribution we want to find.)
This completes the introduction of the basic ideas for multivariate distributions. As we look at harder problems that involve some algebra, refer back to these simpler examples if you find the ideas no longer making sense to you.
Example: Let
and
be independent random variables having Poisson distributions with averages
(means) of
and
respectively. Let
.
Find its probability function,
.
Solution: We first need to find
.
Since
and
are independent we know
Using the Poisson probability function,
where
and
can equal 0, 1, 2,
.
Now,
Then
To evaluate this sum, factor out constant terms and try to regroup in some form which can be evaluated by one of our summation techniques.
If we had a
on the top inside the
,
the sum would be of the form
.
This is the right hand side of the binomial theorem. Multiply top and bottom
by
to get:
Take a common denominator of
to get
Note that we have just shown that the sum of 2 independent Poisson random variables also has a Poisson distribution.
Example: Three sprinters,
and
,
compete against each other in 10 independent 100 m. races. The probabilities
of winning any single race are .5 for
,
.4 for
,
and .1 for
.
Let
and
be the number of races
and
win.
Find the joint probability function,
Find the marginal probability function,
Find the conditional probability function,
Are
and
independent? Why?
Let
.
Find its probability function,
.
Solution: Before starting, note that
since there are 10 races in all. We really only have two variables since
.
However it is convenient to use
to save writing and preserve symmetry.
The reasoning will be similar to the way we found the binomial distribution in
Chapter 6 except that there are now 3 types of outcome. There are
different outcomes (i.e. results for races 1 to 10) in which there are
wins by
by
,
and
by
.
Each of these arrangements has a probability of (.5) multiplied
times, (.4)
times, and (.1)
times in some order;
i.e.,
The range for
is triples
where each
is an integer between 0 and 10, and where
.
It would also be acceptable to drop
as a variable and write down the probability function for
only; this is
because of the fact that
must equal
.
For this probability function
and
.
This simplifies finding
a little . We now have
.
The limits of summation need care:
could be as small as
,
but since
,
we also require
.
(E.g., if
then
can win
,
or 3 races.) Thus,
(Hint: In
the 2 terms in the denominator add to the term in the numerator, if we ignore
the ! sign.) Multiply top and bottom by
This
gives
Here
is defined for
.
Note: While this derivation is included as an example of how
to find marginal distributions by summing a joint probability function, there
is a much simpler method for this problem. Note that each race is either won
by
(``success'') or it is not won by
(``failure''). Since the races are independent and
is now just the number of ``success'' outcomes,
must have a binomial distribution, with
and
.
Hence
for
,
as above.
Remember that
,
so that
For any given value of
ranges through
(So the range of
depends on the value
,
which makes sense: if
wins
races then the most
can win is
.)
Note: As in (b), this result can be obtained more simply
by general reasoning. Once we are given that
wins
races, the remaining
races are all won by either
or
.
For these races,
wins
of the time and
of the time, because
and
;
i.e.,
wins 4 times as often as
.
More formally
from the binomial distribution.
and
are clearly not independent since the more races
wins, the fewer races there are for
to win. More
formally,
(In general, if the range for
depends on the value of
,
then
and
cannot be independent.)
If
then
The upper limit on
is
because, for example, if
then
could not have won more than 7 races. Then
What do we need to multiply by on the top and bottom? Can you spot it before
looking below?
Exercise: Explain to yourself how this answer can be obtained from the binomial distribution, as we did in the notes following parts (b) and (c).
The following problem is
similar to conditional probability problems that we solved in Chapter 4. Now
we are dealing with events defined in terms of random variables. Earlier
results give us things like
Example: In an auto parts company an average of
defective parts are produced per shift. The number,
,
of defective parts produced has a Poisson distribution. An inspector checks
all parts prior to shipping them, but there is a 10% chance that a defective
part will slip by undetected. Let
be the number of defective parts the inspector finds on a shift. Find
.
(The company wants to know how many defective parts are produced, but can only
know the number which were actually detected.)
Solution: Think of
being event
and
being event
;
we want to find
.
To do this we'll use
We know
Also, for a given number
of defective items produced, the number,
,
detected has a binomial distribution with
and
,
assuming each inspection takes place independently. Then
Therefore
To get
we'll need
.
We have
(
since the number of defective items produced can't be less than the number
detected.)
We could fit this into the summation result
by writing
as
.
Then
The joint probability function of
is:
![]() |
||||
![]() |
0 | 1 | 2 | |
0 | .09 | .06 | .15 | |
![]() |
1 | .15 | .05 | .20 |
2 | .06 | .09 | .15 |
Are
and
independent? Why?
Tabulate the conditional probability function,
.
Tabulate the probability function of
.
In problem 6.14, given that
sales were made in a 1 hour period, find the probability function for
,
the number of calls made in that hour.
and
are independent, with
and
.
Let
.
Find the probability function,
.
You may use the result
.
There is only this one multivariate model distribution introduced in this
course, though other multivariate distributions exist. The multinomial
distribution defined below is very important. It is a generalization of the
binomial model to the case where each trial has
possible outcomes.
Physical Setup: This distribution is the same as binomial
except there are
types of outcome rather than two. An experiment is repeated independently
times with
distinct types of outcome each time. Let the probabilities of these
types be
each time. Let
be the number of times the
type occurs,
the number of times the
occurs,
,
the number of times the
type occurs. Then
has a multinomial distribution.
Notes:
,
If we wish we can drop one of the variables (say the last), and just note that
equals
.
Illustrations:
In the example of Section 8.1 with sprinters A,B, and C running 10 races we
had a multinomial distribution with
and
.
Suppose student marks are given in letter grades as A, B, C, D, or F. In a
class of 80 students the number getting A, B, ..., F might have a multinomial
distribution with
and
.
Joint Probability Function: The joint probability function of
is given by extending the argument in the sprinters example from
to general
.
There are
different outcomes of the
trials in which
are of the
type,
are of the
type, etc. Each of these arrangements has probability
since
is multiplied
times in some order, etc.
The restriction on the
's
are
and
.
As a check that
we use the multinomial theorem to get
We have already seen one example of the multinomial distribution in the sprinter example.
Here is another simple example.
Example: Every person is one of four blood types: A, B, AB
and O. (This is important in determining, for example, who may give a blood
transfusion to a person.) In a large population let the fraction that has type
A, B, AB and O, respectively, be
.
Then, if
persons are randomly selected from the population, the numbers
of types A, B, AB, O have a multinomial distribution with
(In Caucasian people the values of the
's
are approximately
)
Remark: We sometimes use the notation
to indicate that
have a multinomial
distribution.
Remark: For some types of problems its helpful to write
formulas in terms of
and
using the fact that
In this case we can write the joint p.f. as
but we must remember then that
satisfy the condition
.
The multinomial distribution can also arise in combination with other models,
and students often have trouble recognizing it then.
Example: A potter is producing teapots one at a time. Assume
that they are produced independently of each other and with probability
the pot produced will be "satisfactory"; the rest are sold at a lower price.
The number,
,
of rejects before producing a satisfactory teapot is recorded. When 12
satisfactory teapots are produced, what is the probability the 12 values of
will consist of six 0's, three 1's, two 2's and one value which is
?
Solution: Each time a "satisfactory" pot is produced the
value of
falls in one of the four categories
.
Under the assumptions given in this question,
has a geometric distribution with
so we can find the probability for each of these categories. We have
for
and we can obtain
in various ways:
since we have a geometric series.
With some re-arranging, this also gives
.
The only way to have
is to have the first 3 pots produced all being rejects.
(3 consecutive rejects) =
Reiterating that each time a pot is successfully produced, the value of
falls in one of 4 categories
,
we see that the probability asked for is given by a multinomial distribution,
Mult
:
Problems:
An insurance company classifies policy holders as class A,B,C, or D. The probabilities of a randomly selected policy holder being in these categories are .1, .4, .3 and .2, respectively. Give expressions for the probability that 25 randomly chosen policy holders will include
3A's, 11B's, 7C's, and 4D's.
3A's and 11B's.
3A's and 11B's, given that there are 4D's.
Chocolate chip cookies are made from batter containing an average of 0.6 chips per c.c. Chips are distributed according to the conditions for a Poisson process. Each cookie uses 12 c.c. of batter. Give expressions for the probabilities that in a dozen cookies:
3 have fewer than 5 chips.
3 have fewer than 5 chips and 7 have more than 9.
3 have fewer than 5 chips, given that 7 have more than 9.
Consider a sequence of (discrete) random variables
each of which takes integer values
(called states). We assume that for a certain matrix
(called the transition probability matrix), the conditional
probabilities are given by corresponding elements of the matrix; i.e.
and furthermore that the chain only uses the last state occupied in
determining its future; i.e. that
for all
and
.
Then the sequence of random variables
is called a Markov Note_2
Chain. Markov Chain models are the most common simple models for
dependent variables, and are used to predict weather as well as movements of
security prices. They allow the future of the process to depend on the present
state of the process, but the past behaviour can influence the future only
through the present state.
Suppose that the probability that tomorrow is rainy given that today is not
raining is
(and it does not otherwise depend on whether it rained in the past) and the
probability that tomorrow is dry given that today is rainy is
If tomorrow's weather depends on the past only through whether today is wet or
dry, we can define random variables
(beginning at some arbitrary time origin, day
). Then the random variables
form a Markov chain with
possible states and having probability transition matrix
Note that
for all
and
for all
This last property holds because given that
must occupy one of the states
Suppose that the chain is started by randomly choosing a state for
with distribution
.
Then the distribution of
is given by
and this is the
element of the vector
where
is the column vector of values
.
To obtain the distribution at time
premultiply the transition matrix
by a vector representing the distribution at time
Similarly the distribution of
is the vector
where
is the product of the matrix
with itself and the distribution of
is
Under very general conditions, it can be shown that these probabilities
converge because the matrix
converges pointwise to a limiting matrix as
In fact, in many such cases, the limit does not depend on the initial
distribution
because the limiting matrix has all of its rows identical and equal to some
vector of probabilities
Identifying this vector
when convergence holds is reasonably easy.
A limiting distribution of a Markov chain is a vector
(
say) of long run probabilities of the individual states so
Now let us suppose that convergence to this distribution holds for a
particular initial distribution
so we assume that
Then notice that
but also
so
must have the property that
Any limiting distribution must have this property and this makes it easy in
many examples to identify the limiting behaviour of the chain.
A stationary distribution of a Markov chain is the column vector
(
say) of probabilities of the individual states such that
.
Let us return to the weather example in which the transition probabilities are
given by the matrix
What is the long-run proportion of rainy days? To determine this we need to
solve the equations
subject to the conditions that the values
are both probabilities (non-negative) and add to one. It is easy to see that
the solution
is
which is intuitively reasonable in that it says that the long-run probability
of the two states is proportional to the probability of a switch to that state
from the other. So the long-run probability of a dry day is the limit
You might try verifying this by computing the powers of the matrix
for
and show that
approaches the matrix
as
There are various mathematical conditions under which the limiting
distribution of a Markov chain unique and independent of the initial state of
the chain but roughly they assert that the chain is such that it forgets the
more and more distant past.
TA simple form of inheritance of traits occurs when a trait is governed by a
pair of genes
and
An
individual may have an
of an
combination (in which case they are indistinguishable in appearance, or
"
dominates
.
Let us call an AA individual dominant,
recessive and
hybrid. When two individuals mate, the offspring
inherits one gene of the pair from each parent, and we assume that these genes
are selected at random. Now let us suppose that two individuals of opposite
sex selected at random mate, and then two of their offspring mate, etc. Here
the state is determined by a pair of individuals, so the states of our process
can be considered to be objects like
indicating that one of the pair is
and the other is
(we do not distinguish the order of the pair, or male and female-assuming
these genes do not depend on the sex of the individual)
Number | State |
1 | ![]() |
2 | ![]() |
3 | ![]() |
4 | ![]() |
5 | ![]() |
6 | ![]() |
For example, consider the calculation of
In this case each offspring has probability
of being a dominant
,
and probability of
of being a hybrid
(
).
If two offspring are selected independently from this distribution the
possible pairs are
with probabilities
respectively. So the transitions have probabilities
below:
and transition probability
matrix
What is the long-run behaviour in such a system? For example, the
two-generation transition probabilities are given by
which seems to indicate a drift to one or other of the extreme states 1 or 6.
To confirm the long-run behaviour calculate and :
which shows that eventually the chain is absorbed in either of state 1 or
state 6, with the probability of absorption depending on the initial state.
This chain, unlike the ones studied before, has more than one possible
stationary distributions, for example,
and
and
in these circumstances the chain does not have the same limiting distribution
regardless of the initial state.
It is easy to extend the definition of expectation to multiple variables.
Generalizing
leads to the definition of expected value in the multivariate case
and
As before, these represent
the average value of
and
.
Example: Let the joint probability function,
,
be given by
![]() |
||||
![]() |
0 | 1 | 2 | |
1 | .1 | .2 | .3 | |
![]() |
2 | .2 | .1 | .1 |
Find
and
.
Solution:
To find
we have a choice of methods. First, taking
we
get
Alternatively, since
only involves
,
we could find
and use
Example: In the example of Section 8.1 with sprinters A, B,
and C we had (using only
and
in our formulas)
where A wins
times and B wins
times in 10 races. Find
.
Solution: This will be similar to the way
we derived the mean of the binomial distribution but, since this is a
multinomial distribution, we'll be using the multinomial theorem to sum.
Let
and
in the sum and we obtain
Property of Multivariate
Expectation: It is easily proved (make sure you can do this) that
This can be extended beyond 2 functions
and
,
and beyond 2 variables
and
.
Independence is a "yes/no" way of defining a relationship between variables.
We all know that there can be different types of relationships between
variables which are dependent. For example, if
is your height in inches and
your height in centimetres the relationship is one-to-one and linear. More
generally, two random variables may be related (non-independent) in a
probabilistic sense. For example, a person's weight
is not an exact linear function of their height
,
but
and
are nevertheless related. We'll look at two ways of measuring the strength of
the relationship between two random variables. The first is called covariance.
The covariance of
and
,
denoted
or
,
is
For calculation purposes
this definition is usually harder to use than the formula which follows, which
is proved noting that
Example:
In the example with joint probability function
find Cov
.
Solution: We previously calculated
and
.
Similarly,
Exercise: Calculate the covariance of
and
for the sprinter example. We have already found that
= 18. The marginal distributions of
and of
are models for which we've already derived the mean. If your solution takes
more than a few lines you're missing an easier solution.
Interpretation of Covariance:
Suppose large values of
tend to occur with large values of
and small values of
with small values of
.
Then
and
will tend to be of the same sign, whether positive or negative. Thus
will be positive. Hence Cov
.
For example in Figure bivariatenormal we
see several hundred points plotted. Notice that the majority of the points
are in the two quadrants (lower left and upper right) labelled with "+" so
that for these
A minority of points are in the other two quadrants labelled "-" and for
these
.
Moreover the points in the latter two quadrants appear closer to the mean
indicating that on average, over all points generated
Presumably this implies that over the joint distribution of
or
Random points
(
with covariance 0.5, variances 1.
For example of
person's
height and
person's
weight, then these two random variables will have positive covariance.
Suppose large values of
tend to occur with small values of
and small values of
with large values of
.
Then
and
will tend to be of opposite signs. Thus
tends to be negative. Hence Cov
.
For example see Figure bivariatenormal2
Covariance=-0.5,
variances=1
For example if
thickness
of attic insulation in a house and
heating
cost for the house, then
If
and
are independent then Cov
.
Proof: Recall
.
Let
and
be independent.
Then
.
The following theorem gives a direct proof the result above, and is useful in many other situations.
Suppose random variables
and
are independent. Then, if
and
are any two functions,
Proof: Since
and
are independent,
.
Thus
\framebox[0.10in]{}
To prove result (3) above, we just note that if
and
are independent then
Caution: This result is not reversible. If Cov
we can not conclude that
and
are independent. For example suppose that the random variable
is uniformly distributed on the values
and define
and
It is easy to see that
Cov
but the two random variables
are clearly related because the points
are always on a circle.
Example: Let
have the joint probability function
;
i.e.
only takes 3 values.
![]() |
0 | 1 | 2 |
![]() |
.2 | .6 | .2 |
and
![]() |
0 | 1 |
![]() |
.4 | .6 |
The actual numerical value of Cov
has no interpretation, so covariance is of limited use in measuring
relationships.
Exercise:
Look back at the example in which
was tabulated and Cov
.
Considering how covariance is interpreted, does it make sense that Cov
would be negative?
Without looking at the actual covariance for the sprinter exercise, would you
expect Cov
to be positive or negative? (If A wins more of the 10 races, will B win more
races or fewer races?)
We now consider a second, related way to measure the strength of relationship
between
and
.
The correlation coefficient of
and
is
The correlation coefficient measures the strength of the linear relationship
between
and
and is simply a rescaled version of the covariance, scaled to lie in the
interval
You can attempt to guess the correlation between two variables based on a
scatter diagram of values of these variables at the web
page
http://statweb.calpoly.edu/chance/applets/guesscorrelation/GuessCorrelation.html
For
example in Figure guesscorrelation I
guessed a correlation of -0.9 whereas the true correlation coefficient
generating these data was
Guessing the correlation
based on a scatter diagram of points
Properties of
:
Since
and
,
the standard deviations of
and
,
are both positive,
will have the same sign as Cov
.
Hence the interpretation of the sign of
is the same as for Cov
,
and
if
and
are independent. When
we say that
and
are uncorrelated.
and as
the relation between
and
becomes one-to-one and linear.
Proof: Define a new random variable
,
where
is some real number. We'll show that the fact that
Var
leads to 2) above. We
have
Since
for any real number
this quadratic equation must have at most one real root (value of
for which it is zero). Therefore
leading to the inequality
To see that
corresponds to a one-to-one linear relationship between
and
,
note that
corresponds to a zero discriminant in the quadratic equation. This means that
there exists one real number
for which
But for
Var
to be zero,
must equal a constant
.
Thus
and
satisfy a linear relationship.
Exercise: Calculate
for the sprinter example. Does your answer make sense? (You should already
have found Cov
in a previous exercise, so little additional work is
needed.)
Problems:
The joint probability function of
is:
![]() |
||||
![]() |
0 | 1 | 2 | |
0 | .06 | .15 | .09 | |
![]() |
||||
1 | .14 | .35 | .21 |
Calculate the correlation coefficient,
.
What does it indicate about the relationship between
and
?
Suppose that
and
are random variables with joint probability function:
![]() |
||||
![]() |
2 | 4 | 6 | |
-1 | 1/8 | 1/4 | ![]() |
|
![]() |
||||
1 | 1/4 | 1/8 | ![]() |
For what value of
are
and
uncorrelated?
Show that there is no value of
for which
and
are independent.
Many problems require us to consider linear combinations of random variables;
examples will be given below and in Chapter 9. Although writing down the
formulas is somewhat tedious, we give here some important results about their
means and variances.
Results for Means:
,
when
and
are constants. (This follows from the definition of expectation.) In
particular,
and
.
Let
be constants (real numbers) and
.
Then
.
In particular,
.
Let
be random variables which have mean
.
(You can imagine these being some sample results from an experiment such as
recording the number of occupants in cars travelling over a toll bridge.) The
sample mean is
.
Then
.
Proof: From (2),
.
Thus
Results for Covariance:
Cov
Cov
where
and
are constants.
Proof:
This type of result can be generalized, but gets messy to write
out.
Results for Variance:
Proof:
Exercise: Try to prove this result by writing
as Cov
and using properties of covariance.
Let
and
be independent. Since Cov
,
result 1. gives
i.e., for independent variables, the variance of a sum
is the sum of the variances. Also note
i.e., for independent variables, the variance of a difference is the
sum of the variances.
Let
be constants and Var
.
Then
This is a generalization of result 1. and can be proved using either of the
methods used for 1.
Special cases of result 3. are:
If
are independent then Cov
,
so that
If
are independent and all have the same variance
,
then
Proof of 4 (b):
.
From 4(a), Var
.
Using Var
,
we get:
Remark: This result is a very important one in probability
and statistics. To recap, it says that if
are independent r.v.'s with the same mean
and some variance
,
then the sample mean
has
This shows that the average
of
random variables with the same distribution is less variable than any single
observation
,
and that the larger
is the less variability there is. This explains mathematically why, for
example, that if we want to estimate the unknown mean height
in a population of people, we are better to take the average height for a
random sample of
persons than to just take the height of one randomly selected person. A sample
of
persons would be better still. There are interesting applets at the url
http://users.ece.gatech.edu/users/gtz/java/samplemean/notes.html
and
http://www.ds.unifi.it/VL/VL_EN/applets/BinomialCoinExperiment.html which
allows one to sample and explore the rate at which the sample mean approaches
the expected value. In Chapter 9 we will see how to decide how large a sample
we should take for a certain degree of precision. Also note that as
,
which means that
becomes arbitrarily close to
.
This is sometimes called the "law of averages". There is a formal theorem
which supports the claim that for large sample sizes, sample means approach
the expected value, called the "law of large numbers".
The results for linear combinations of random variables provide a way of breaking up more complicated problems, involving mean and variance, into simpler pieces using indicator variables; an indicator variable is just a binary variable (0 or 1) that indicates whether or not some event occurs. We'll illustrate this important method with 3 examples.
Example: Mean and Variance of a Binomial R.V.
Let
in a binomial process. Define new variables
by:
![]() |
= | 0 if the
![]() |
![]() |
= | 1 if the
![]() |
i.e.
indicates whether the outcome "success" occurred on the
trial. The trick we use is that the total number of successes,
,
is the sum of the
's:
We can find the mean and variance of
and then use our results for the mean and variance of a sum to get the mean
and variance of
.
First,
But
since the probability of success is
on each trial.
.
Since
or 1,
,
and therefore
Thus
In the binomial distribution the trials are independent so the
's
are also independent. Thus
These, of course, are the same as we derived previously for the mean and
variance of the binomial distribution. Note how simple the derivation here
is!
Remark: If
is a binary random variable with
then
and
Var
,
as shown above. (Note that
is actually a binomial r.v.) In some problems the
's
are not independent, and then we also need covariances.
Example: Let
have a hypergeometric distribution. Find the mean and variance of
.
Solution: As above, let us think of the setting, which
involves drawing
items at random from a total of
,
of which
are
"
" and
are
"
items. Define
Then
as for the binomial example, but now the
's
are dependent. (For example, what we get on the first draw affects the
probabilities of
and
for the second draw, and so on.) Therefore we need to find
Cov
for
as well as
and
Var
in order to use our formula for the variance of a sum.
We see first that
for each of
.
(If the draws are random then the probability an
occurs in draw
is just equal to the probability position
is an
when we arrange
's
and
's
in a row.) This immediately gives
since
The covariance of
and
is equal to
,
so we need
The probability of an
on both draws
and
is just
Thus,
(Does it make sense that Cov
is negative? If you draw a success in draw
,
are you more or less likely to have a success on draw
?)
Now we find
and
Var
.
First,
Before finding Var
,
how many combinations
are there for which
?
Each
and
takes values from
so there are
different combinations of
values. Each of these can only be written in 1 way to make
.
There are
combinations with
(e.g. if
and
,
the combinations with
are (1,2) (1,3) and (2,3). So there are
different combinations.)
Now we can find
In the last two examples, we know
,
and could have found
and
Var
without using indicator variables. In the next example
is not known and is hard to find, but we can still use indicator variables for
obtaining
and
.
The following example is a famous problem in probability.
Example: We have
letters to
different people, and
envelopes addressed to those
people. One letter is put in each envelope at random. Find the mean and
variance of the number of letters placed in the right envelope.
Solution:
Then
is the number of correctly placed letters. Once again, the
's
are dependent (Why?).
First
(since there is 1 chance in
that letter
will be put in envelope
)
and then,
Exercise: Before calculating cov
,
what sign do you expect it to have? (If letter
is correctly placed does that make it more or less likely that letter
will be placed correctly?)
Next,
(As in the last example, this is the only non-zero term in the sum.) Now,
since once letter
is correctly placed there is 1 chance in
of letter
going in envelope
.
For the
covariance,
(Common sense often helps in this course, but we have found no way of being
able to say this result is obvious. On average 1 letter will be correctly
placed and the variance will be 1, regardless of how many letters there are.)
The joint probability function of
is given by:
Calculate
,
Var
,
Cov
and Var
.
You may use the fact that
and Var
= .21 without verifying these figures.
In a row of 25 switches, each is considered to be "on" or "off". The probability of being on is .6 for each switch, independently of other switch. Find the mean and variance of the number of unlike pairs among the 24 pairs of adjacent switches.
Suppose Var
,
Var
,
;
and let
.
Find the standard deviation of
.
Let
be uncorrelated random variables with mean 0 and variance
.
Let
.
Find Cov
for
and Var
.
A plastic fabricating company produces items in strips of 24, with the items connected by a thin piece of plastic:
Suppose we have two possibly dependent random variables
and we wish to characterize their joint distribution using a moment generating
function. Just as the probability function and the cumulative distribution
function are, in tis case, functions of two arguments, so is the moment
generating function.
The joint moment generating function of
is
Recall that if
happen to be independent,
and
are any two functions,
and so with
and
we obtain, for independent random variables
the product of the moment generating functions of
and
respectively.
There is another labour-saving property of moment generating functions for
independent random variables. Suppose
are independent random variables with moment generating functions
and
.
Suppose you wish the moment generating function of the sum
One could attack this problem by first determining the probability function of
and then calculating
Evidently lots of work! On the other hand recycling
(Eg1g2) with
gives
The moment generating function of the sum of independent random variables is the product of the individual moment generating functions.
For example if both
and
are independent with the same (Bernoulli) distribution
then both have moment generating function
and so the moment generating function of the sum
is
Similarly if we add another independent Bernoulli the moment generating
function is
and in general the sum of
independent Bernoulli random variables is
the moment generating function of a
Binomial
distribution. This confirms that the sum of independent Bernoulli random
variables has a
Binomial
distribution.
The joint probability function of
is given by:
![]() |
||||
![]() |
0 | 1 | 2 | |
0 | .15 | .1 | .05 | |
![]() |
||||
1 | .35 | .2 | .15 |
Are
and
independent? Why?
Find
and
For a person whose car insurance and house insurance are with the same
company, let
and
represent the number of claims on the car and house policies, respectively, in
a given year. Suppose that for a certain group of individuals,
Poisson (mean
)
and
Poisson (mean
).
If
and
are independent, find
and find the mean and variance of
.
Suppose it was learned that
was very close to
.
Show why
and
cannot be independent in this case. What might explain the non-independence?
Consider Problem 2.7 for Chapter 2, which concerned machine recognition of
handwritten digits. Recall that
was the probability that the number actually written was
,
and the number identified by the machine was
.
Are the random variables
and
independent? Why?
What is
,
that is, the probability that a random number is correctly identified?
What is the probability that the number 5 is incorrectly identified?
Blood donors arrive at a clinic and are classified as type A, type O, or other
types. Donors' blood types are independent with
(type A) =
,
(type O) =
,
and
(other type) =
.
Consider the number,
,
of type A and the number,
,
of type O donors arriving before the
other type.
Find the joint probability function,
Find the conditional probability function,
.
Slot machine payouts. Suppose that in a slot machine there are
possible outcomes
for a single play. A single play costs $1. If outcome
occurs, you win
,
for
.
If outcome
occurs, you win nothing. In other words, if outcome
occurs your net profit is
;
if
occurs your net profit is - 1.
Give a formula for your expected profit from a single play, if the
probabilities of the
outcomes are
.
The owner of the slot machine wants the player's expected profit to be
negative. Suppose
,
with
.
If the slot machine is set to pay $3 when outcome
occurs, and $5 when either of outcomes
occur, determine the player's expected profit per play.
The slot machine owner wishes to pay
dollars when outcome
occurs, where
and
is a number between 0 and 1. The owner also wishes his or her expected profit
to be $.05 per play. (The player's expected profit is -.05 per play.) Find
as a function of
and
.
What is the value of
if
and
?
Bacteria are distributed through river water according to a Poisson process with an average of 5 per 100 c.c. of water. What is the probability five 50 c.c. samples of water have 1 with no bacteria, 2 with one bacterium, and 2 with two or more?
A box contains 5 yellow and 3 red balls, from which 4 balls are drawn at
random without replacement. Let
be the number of yellow balls on the first two draws and
the number of yellow balls on all 4 draws.
Find the joint probability function,
.
Are
and
independent? Justify your answer.
In a quality control inspection items are classified as having a minor defect,
a major defect, or as being acceptable. A carton of 10 items contains 2 with
minor defects, 1 with a major defect, and 7 acceptable. Three items are chosen
at random without replacement. Let
be the number selected with minor defects and
be the number with major defects.
Find the joint probability function of
and
.
Find the marginal probability functions of
and of
.
Evaluate numerically
and
.
Let
and
be discrete random variables with joint probability function
for
and
,
where
is a positive constant.
Derive the marginal probability function of
.
Evaluate
.
Are
and
independent? Explain.
Derive the probability function of
.
"Thinning" a Poisson process. Suppose that events are
produced according to a Poisson process with an average of
events per minute. Each event has a probability
of being a "Type A" event, independent of other events.
Let the random variable
represent the number of Type A events that occur in a one-minute period. Prove
that
has a Poisson distribution with mean
.
(Hint: let
be the total number of events in a 1 minute period and consider the formula
just before the last example in Section 8.1).
Lighting strikes in a large forest region occur over the summer according to a
Poisson process with
strikes per day. Each strike has probability .05 of starting a fire. Find the
probability that there are at least 5 fires over a 30 day period.
In a breeding experiment involving horses the offspring are of four genetic types with probabilities:
Type | 1 | 2 | 3 | 4 |
Probability | 3/16 | 5/16 | 5/16 | 3/16 |
A group of 40 independent offspring are observed. Give expressions for the following probabilities:
There are 10 of each type.
The total number of types 1 and 2 is 16.
There are exactly 10 of type 1, given that the total number of types 1 and 2 is 16.
In a particular city, let the random variable
represent the number of children in a randomly selected household, and let
represent the number of female children. Assume that the probability a child
is female is
,
regardless of what size household they live in, and that the marginal
distribution of
is as follows:
Determine
.
Find the probability function for the number of girls
in a randomly chosen family. What is
?
In a particular city, the probability a call to a fire department concerns various situations is as given below:
1. fire in a detached home | -
![]() |
2. fire in a semi detached home | -
![]() |
3. fire in an apartment or multiple unit residence | -
![]() |
4. fire in a non-residential building | -
![]() |
5. non-fire-related emergency | -
![]() |
6. false alarm | -
![]() |
In a set of 10 calls, let
represent the numbers of calls of each of types
.
Give the joint probability function for
.
What is the probability there is at least one apartment fire, given that there are 4 fire-related calls?
If the average costs of calls of types
are (in $100 units) 5, 5, 7, 20, 4, 2 respectively, what is the expected total
cost of the 10 calls?
Suppose
have joint p.f.
.
If
is a function such that for all
in the range of
,
then show that
Let
and
be random variables with Var
,
Var
and
.
Find
Var
.
Let
and
have a trinomial distribution with joint probability function
and
.
Let
.
What distribution does
have? Either explain why or derive this result.
For the distribution in (a), what is
and
Var
?
Using (b) find
Cov,
and explain why you expect it to have the sign it does.
Jane and Jack each toss a fair coin twice. Let
be the number of heads Jane obtains and
the number of heads Jack obtains. Define
and
.
Find the means and variances of
and
.
Find Cov
Are
and
independent? Why?
A multiple choice exam has 100 questions, each with 5 possible answers. One
mark is awarded for a correct answer and 1/4 mark is deducted for an incorrect
answer. A particular student has probability
of knowing the correct answer to the
question, independently of other questions.
Suppose that on a question where the student does not know the answer, he or
she guesses randomly. Show that his or her total mark has mean
and variance
.
Show that the total mark for a student who refrains from guessing also has
mean
,
but with variance
.
Compare the variances when all
's
equal (i) .9, (ii) .5.
Let
and
be independent random variables with
,
Var
and Var
.
Find
Cov
.
An automobile driveshaft is assembled by placing parts A, B and C end to end in a straight line. The standard deviation in the lengths of parts A, B and C are 0.6, 0.8, and 0.7 respectively.
Find the standard deviation of the length of the assembled driveshaft.
What percent reduction would there be in the standard deviation of the assembled driveshaft if the standard deviation of the length of part B were cut in half?
The inhabitants of the beautiful and ancient canal city of Pentapolis live on
5 islands separated from each other by water. Bridges cross from one island to
another as shown.
On any day, a bridge can be closed, with probability
,
for restoration work. Assuming that the 8 bridges are closed independently,
find the mean and variance of the number of islands which are completely cut
off because of restoration work.
A Markov chain has a doubly stochastic transition matrix if both the
row sums and the column sums of the transition matrix
are all
.
Show that for such a Markov chain, the uniform distribution on
is a stationary distribution.
A salesman sells in three cities A,B, and C. He never sells in the same city on successive weeks. If he sells in city A, then the next week he always sells in B. However if he sells in either A or B, then the next week he is twice as likely to sell in city A as in the other city. What is the long-run proportion of time he spends in each of the three cities?
Find
where
Suppose
and
are independent having Poisson distributions with parameters
and
respectively. Use moment generating functions to identify the distribution of
the sum
Waterloo in January is blessed by many things, but not by good weather. There
are never two nice days in a row. If there is a nice day, we are just as
likely to have snow as rain the next day. If we have snow or rain, there is an
even chance of having the same the next day. If there is change from snow or
rain, only half of the time is this a change to a nice day. Taking as states
the kinds of weather R, N, and S. the transition probabilities
are as
follows
If today is raining, find the probability of Rain, Nice, Snow three days from
now. Find the probabilities of the three states in five days, given (1) today
is raining (ii) today is nice (iii) today is snowing.
(One-card Poker) A card game, which, for the purposes of this question we will call Metzler Poker, is played as follows. Each of 2 players bets an initial $1 and is dealt a card from a deck of 13 cards numbered 1-13. Upon looking at their card, each player then decides (unaware of the other's decision) whether or not to increase their bet by $5 (to a total stake of $6). If both increase the stake ("raise"), then the player with the higher card wins both stakes-i.e. they get their money back as well as the other player's $6. If one person increases and the other does not, then the player who increases automatically wins the pot (i.e. money back+$1). If neither person increases the stake, then it is considered a draw-each player receives their own $1 back. Suppose that Player A and B have similar strategies, based on threshold numbers {a,b} they have chosen between 1 and 13. A chooses to raise whenever their card is greater than or equal to a and B whenever B's card is greater than or equal to b.
Suppose B always raises (so that b=1). What is the expected value of A's win or loss for the different possible values of a=1,2,...,13.
Suppose a and b are arbitrary. Given that both players raise, what is the probability that A wins? What is the expected value of A's win or loss?
Suppose you know that b=11. Find your expected win or loss for various values of a and determine the optimal value. How much do you expect to make or lose per game under this optimal strategy?
(Searching a database) Suppose that we are given
3 records,
initially stored in that order. The cost of accessing the
j'th record in the list is j
so we would like the more frequently accessed records near the front of the
list. Whenever a request for record j is processed,
the "move-to-front" heuristic stores
at the front of the list and the others in the original order. For example if
the first request is for record
then the records will be re-stored in the order
Assume that on each request, record
is requested with probability
for
Show that if
the
permutation that obtains after
requests for records (e.g.
),
then
is a Markov chain.
Find the stationary distribution of this Markov chain. (Hint: what is the
probability that
takes the form
?).
Find the expected long-run cost per record accessed in the case
respectively.
How does this expected long-run cost compare with keeping the records in
random order, and with keeping them in order of decreasing values of
(only
possible if we know