A continuous random variable is one for which the range (set
of possible values) is an interval (or a collection of intervals) on the real
number line. Continuous variables have to be treated a little differently than
discrete ones, the reason being that
has to be zero for each
,
in order to avoid mathematical contradiction. The distribution of a continuous
random variable is called a continuous probability
distribution. To illustrate, consider the simple spinning pointer in
Figure
spinner.
Spinner: a device for
generating a continuous random variable (in a zero-gravity, virtually
frictionless environment)
where all numbers in the
interval (0,4] are equally likely. The probability of the pointer stopping
precisely at the number
must be zero, because if not the total probability for
would be infinite, since the set
is non-countable. Thus, for a continuous random variable the probability at
each point is 0. This means we can no longer use the probability function to
describe a distribution. Instead there are two other functions commonly used
to describe continuous distributions.
Cumulative Distribution Function:
For discrete random variables we defined the c.d.f., . This function still works for continuous random variables. For the spinner, the probability the pointer stops between 0 and 1 is 1/4 if all values are equally ``likely"; between 0 and 2 the probability is 1/2, between 0 and 3 it is 3/4; and so on. In general, for .
Also, for since there is no chance of the pointer stopping at a number , and for since the pointer is certain to stop at number below if .
Most properties of a c.d.f. are the same for continuous variables as for discrete variables. These are:
1. and |
2. is a non-decreasing function of |
3. . |
Note that, as indicated before, for a continuous distribution, we have . Also, since the probability is 0 at each point: (For a discrete random variable, each of these 4 probabilities could be different.). For the continuous distributions in this chapter, we do not worry about whether intervals are open, closed, or half-open since the probability of these intervals is the same.
Probability Density Function (p.d.f.): While the c.d.f. can be used to find probabilities, it does not give an intuitive picture of which values of are more likely, and which are less likely. To develop such a picture suppose that we take a short interval of -values, . The probability lies in the interval is To compare the probabilities for two intervals, each of length , is easy. Now suppose we consider what happens as becomes small, and we divide the probability by . This leads to the following definition.
The probability density function (p.d.f.) for a continuous random variable is the derivative where is the c.d.f. for .
We will see that represents the relative likelihood of different -values. To do this we first note some properties of a p.d.f. It is assumed that is a continuous function of at all points for which .
Properties of a probability density function
This
follows from the definition of
.
(Since
is non-decreasing).
This
is because
.
.
This is just property 1 with
.
To see that
represents the relative likelihood of different outcomes, we note that for
small,
Thus,
but
is the approximate probability that
is inside the interval
.
A plot of the function
shows such values clearly and for this reason it is very common to plot the
p.d.f.'s of continuous random variables.
Example: Consider the spinner example, where
Thus, the p.d.f. is
,
or
and outside this interval the p.d.f. is
Figure
uniformpdf shows the probability density function
;
for obvious reasons this is called a "uniform" distribution.
Uniform p.d.f.
Remark: Continuous probability distributions are, like
discrete distributions, mathematical models. Thus, the
distribution assumed for the spinner above is a model, though it seems likely
it would be a good model for many real spinners.
Remark: It may seem paradoxical that for a continuous r.v. and yet we record the outcomes in real "experiments" with continuous variables. The catch is that all measurements have finite precision; they are in effect discrete. For example, the height inches is within the range of the height of people in a population but we could never observe the outcome if we selected a person at random and measured their height.
To summarize, in measurements we are actually observing something like where may be very small, but not zero. The probability of this outcome is not zero: it is (approximately) .
We now consider a more complicated mathematical example of a continuous random
variable Then we'll consider real problems that involve continuous variables.
Remember that it is always a good idea to sketch or plot the p.d.f.
for a r.v.
Example:
Let be a p.d.f.
Find
Solution:
Set to solve for . When finding the area of a region bounded by different functions we split the integral into pieces.
(We normally wouldn't even write down the parts with )
Doing the easy pieces, which are often left out, first:
(see shaded area
below)
i.e.
As a rough check, since for a continuous distribution there is no probability at any point, should have the same value as we approach each boundary point from above and from below.
e.g.
This quick check won't prove your answer is right, but will detect many careless errors.
Defined Variables or Change of Variable:
When we know the
p.d.f. or c.d.f. for a continuous random variable
we sometimes want to find the p.d.f. or c.d.f. for some other random variable
which is a function of
.
The procedure for doing this is summarized below. It is based on the fact that
the c.d.f.
for
equals
,
and this can be rewritten in terms of
since
is a function of
.
Thus:
Write the c.d.f. of as a function of .
Use to find . Then if you want the p.d.f. , you can differentiate the expression for .
Find the range of values of
.
Example: In the earlier spinner example,
Let
.
Find
.
Solution:
For step (2), we can do either: (As goes from 0 to 4, goes between and .) Generally if is known it is easier to substitute first, then differentiate. If is in the form of an integral that can't be solved, it is usually easier to differentiate first, then substitute .
Extension of Expectation, Mean, and Variance to Continuous Distributions
When is continuous, we still define
With this definition, all of the earlier properties of expectation and variance still hold; for example with
(This definition can be justified by writing as a limit of a Riemann sum and recognizing the Riemann sum as being in the form of an expectation for discrete random variables.)
Example: In the spinner example with
Example: Let
have p.d.f.
Then
Problems:
Let have p.d.f. Find
the c.d.f.,
the mean and variance of .
let . Derive the p.d.f. of .
A continuous distribution has c.d.f. for , where is a positive constant.
Evaluate .
Find the p.d.f., .
What is the median of this distribution? (The median is the value of such that half the time we get a value below it and half the time above it.)
Just as we did for discrete r.v.'s, we now consider some special types of continuous probability distributions. These distributions arise in certain settings, described below. This section considers what we call uniform distributions.
Physical Setup:
Suppose
takes values in some interval [a,b] (it doesn't actually matter whether
interval is open or closed) with all subintervals of a fixed length being
equally likely. Then
has a continuous uniform distribution. We write
.
Illustrations:
In the spinner example .
Computers can generate a random number which appears as though it is drawn from the distribution . This is the starting point for many computer simulations of random processes; an example is given below.
The probability density function and the cumulative distribution function:
Since all points are equally likely (more precisely, intervals contained in
of a given length, say 0.01, all have the same probability), the probability
density function must be a constant
for some constant
.
To make
,
we require
.
Mean and
Variance:
Example: Suppose
has the continuous p.d.f.
(This is called an exponential distribution and is discussed in the next
section. It is used in areas such as queueing theory and reliability.) We'll
show that the new random variable
has a uniform distribution,
.
To see this, we follow the steps in Section
9.1:
Since
we get
(The range of
is (0,1) since
.)
Thus
and so
.
Many computer software systems have ``random number generator" functions that will simulate observations from a distribution. (These are more properly called pseudo-random number generators because they are based on deterministic algorithms. In addition they give observations that have finite precision so they cannot be exactly like continuous random variables. However, good generators give 's that appear indistinguishable in most ways from r.v.'s.) Given such a generator, we can also simulate r.v.'s with the exponential distribution above by the following algorithm:
Generate using the computer random number generator.
Compute .
Then
has the desired distribution. This is a particular case of a method described
in Section 9.4 for generating random variables from a general distribution. In
software the command
produces a vector consisting of
independent
values.
Problem:
If has c.d.f. , then has a uniform distribution on [0,1]. (Show this.) Suppose you want to simulate observations from a distribution with , by using the random number generator on a computer to generate numbers. What value would take when you generated the random number .27125?
The continuous random variable
is said to have an exponential distribution if its p.d.f. is
of the form
where
is a real parameter value. This distribution arises in various problems
involving the time until some event occurs. The following gives one such
setting.
Physical Setup: In a Poisson process for events in time let be the length of time we wait for the first event occurrence. We'll show that has an exponential distribution. (Recall that the number of occurrences in a fixed time has a Poisson distribution. The difference between the Poisson and exponential distributions lies in what is being measured.)
Illustrations:
The length of time we wait with a Geiger counter until the emission of a radioactive particle is recorded follows an exponential distribution.
The length of time between phone calls to a fire station (assuming calls
follow a Poisson process) follows an exponential distribution.
Derivation of the probability density function and the c.d.f.
= | (time to occurrence ) | |
= | (time to occurrence ) | |
= | (no occurrences in the interval ) |
Check that you understand this last step. If the time to the first occurrence , there must be no occurrences in , and vice versa.
We have now expressed in terms of the number of occurrences in a Poisson process by time . But the number of occurrences has a Poisson distribution with mean , where is the average rate of occurrence. Since for . Thus which is the formula we gave above.
Alternate Form: It is common to use the parameter in the exponential distribution. (We'll see below that .) This makes
Exercise:
Suppose trees in a forest are distributed according to a Poisson process. Let be the distance from an arbitrary starting point to the nearest tree. The average number of trees per square metre is . Derive the same way we derived the exponential p.d.f. You're now using the Poisson distribution in 2 dimensions (area) rather than 1 dimension (time).
Mean and Variance:
Finding and directly involves integration by parts. An easier solution uses properties of gamma functions, which extends the notion of factorials beyond the integers to the positive real numbers.
The Gamma Function: is called the gamma function of , where .
Note that is 1 more than the power of in the integrand. e.g. . There are 3 properties of gamma functions which we'll use.
for
Proof:
Using integration by parts,
and provided that
Therefore
if
is a positive integer.
Proof: It is easy to show that
Using property 1. repeatedly, we obtain
etc.
Generally,.
for integer
(This
can be proved using double
integration.)
Returning to the exponential distribution:
Let
.
Then
and
Note: Read questions carefully. If you're given the average rate of occurrence in a Poisson process, that is . If you're given the average time you wait for an occurrence, that is .
To get , we first find
Example:
Suppose #7 buses arrive at a bus stop according to a Poisson process with an
average of 5 buses per hour. (i.e.
/hr.
So
hr. or 12 min.) Find the probability (a) you have to wait longer than 15
minutes for a bus (b) you have to wait more than 15 minutes longer, having
already been waiting for 6 minutes.
Solution:
=
If
is the total waiting time, the question asks for the probability
Does this surprise you? The fact that you're already waited 6 minutes doesn't
seem to matter. This illustrates the ``memoryless property'' of the
exponential distribution:
Fortunately, buses don't follow a Poisson process so this example needn't
cause you to stop using the bus.
Problems:
In a bank with on-line terminals, the time the system runs between disruptions has an exponential distribution with mean hours. One quarter of the time the system shuts down within 8 hours of the previous disruption. Find .
Flaws in painted sheets of metal occur over the surface according to the conditions for a Poisson process, at an intensity of per . Let be the distance from an arbitrary starting point to the second closest flaw. (Assume sheets are of infinite size!)
Find the p.d.f., .
What is the average distance to the second closest flaw?
Most computer software has a built-in "pseudo-random number generator" that
will simulate observations
from a
distribution, or at least a reasonable approximation to this uniform
distribution. If we wish a random variable with a non-uniform distribution,
the standard approach is to take a suitable function of
By far the simplest and most common method for generating non-uniform variates
is based on the inverse cumulative distribution function. For arbitrary c.d.f.
,
define
min.
This is a real inverse (i.e.
)
in the case that the c.d.f. is continuous and strictly increasing, so for
example for a continuous distribution. However, in the more general case of a
possibly discontinuous non-decreasing c.d.f. (such as the c.d.f. of a discrete
distribution) the function continues to enjoy at least some of the properties
of an inverse.
is useful for generating a random variables having c.d.f.
from
a uniform random variable on the interval
If is an arbitrary c.d.f. and is uniform on then the random variable defined by has c.d.f. .
Proof:
The proof is a consequence of the fact that You can check this graphically be checking, for example, that if then (this confirms the left hand " Taking probabilities on all sides of this, and using the fact that , we discover that
Inverting a c.d.f.
The relation
implies that
and for any point
For example, for the rather unusual looking piecewise linear cumulative
distribution function in Figure inversetransform, we
find the solution
by drawing a horizontal line at
until it strikes the graph of the c.d.f. (or where the graph would have been
if we had joined the ends at the jumps) and then
is the
of this point. This is true in general,
is the coordinate of the point where a horizontal line first strikes the graph
of the c.d.f. We provide one simple example of generating random variables by
this method, for the geometric distribution.
For the Geometric distribution, the cumulative distribution function is given by
Then if is a uniform random number in the interval we seek an integer such that (you should confirm that this is the value of at which the above horizontal line strikes the graph of the c.d.f) and solving these inequalities gives so we compute the value of and round down to the next lower integer.
Show that the inverse transform method above results in the generator for the exponential distribution
Physical Setup:
A random variable defined on has a normal distribution if it has probability density function of the form where and are parameters. It turns out (and is shown below) that and Var for this distribution; that is why its p.d.f. is written using the symbols and . We write to denote that has a normal distribution with mean and variance (standard deviation ).
The normal distribution is the most widely used distribution in probability and statistics. Physical processes leading to the normal distribution exist but are a little complicated to describe. (For example, it arises in physics via statistical mechanics and maximum entropy arguments.) It is used for many processes where represents a physical dimension of some kind, but also in many other settings. We'll see other applications of it below. The shape of the p.d.f. above is what is often termed a ``bell shape'' or ``bell curve'', symmetric about as shown in Figure normpdf.(you should be able to verify the shape without graphing the function)
The standard normal
probability density function
Illustrations:
Heights or weights of males (or of females) in large populations tend to follow normal distributions.
The logarithms of stock prices are often assumed to be normally distributed.
The cumulative distribution function: The c.d.f. of the normal distribution is as shown in Figure normcdf. This integral cannot be given a simple mathematical expression so numerical methods are used to compute its value for given values of and . This function is included in many software packages and some calculators.
The standard normal c.d.f.
In the statistical packages and -Plus we get above using the function .
Before computers, people produced tables of probabilities using mechanical calculators. Fortunately it is necessary to do this only for a single normal distribution: the one with and . This is called the ``standard" normal distribution and denoted .
Its easy to see that if
then the "new" r.v.
is distributed as
.
(Just use the change of variables methods in Section 9.1.) We'll use this to
compute
and probabilities for
below, but first we show that
integrates to 1 and that
and
Var.
For the first result, note
that
Mean, Variance, Moment generating function: Recall that an odd function, , has the property that . If is an odd function then , provided the integral exists.
Consider
Let
.
Then
where
is an odd function so that
.
But since
,
this implies
and so
is the mean. To obtain the variance,
We can obtain a gamma function by letting
.
Then
and so
is the variance. We now find the moment generating function of the
distribution. If
has the
distribution, then
where the last step follows since
is just the integral of a
probability density function and is therefore equal to one. This confirms the
values we already obtained for the mean and the variance of the normal
distribution
from which we obtain
Finding Normal Probabilities Via
Tables As noted above,
does not have an explicit closed form so numerical computation is needed. The
following result shows that if we can compute the c.d.f. for the standard
normal distribution
,
then we can compute it for any other normal distribution
as well.
Let and define . Then and
Proof: The fact that
has p.d.f.
follows immediately by change of variables. Alternatively, we can just note
that
A table of probabilities is given on the last page of these notes. A space-saving feature is that only the values for are shown; for negative values we use the fact that p.d.f. is symmetric about 0.
The following examples illustrate how to get probabilities for
using the tables.
Examples: Find the following probabilities, where .
Solution:
Look up 2.11 in the table by going down the left column to 2.1 then across to
the heading .01. We find the number .9826. Then
.
See Figure exercisenormal1.
Now we have to use symmetry:
See Figure exercisenormal2.
In addition to using the tables to find the probabilities for given numbers,
we sometimes are given the probabilities and asked to find the number. With
or
-Plus
software , the function qnorm
gives the 100
-th
percentile (where
.
We can also use tables to find desired values.
Examples:
Find a number such that
Find a number such that
Find a number such that
Solutions:
We can look in the body of the table to get an entry close to .8500. This occurs for between 1.03 and 1.04; gives the closest value to .85. For greater accuracy, the table at the bottom of the last page is designed for finding numbers, given the probability. Looking beside the entry .85 we find .
Since we have . There is no entry for which so we again have to use symmetry, since will be negative.
The key to this solution lies in recognizing that
will be negative. If you can picture the situation it will probably be easier
to handle the question than if you rely on algebraic
manipulations.
Exercise: Will be positive or negative if ? What if ?
If
we again use
symmetry.
The probability outside the interval must be .05, and this is evenly split between the area above and the area below .
Looking in the table, .
To find
probabilities in general, we use the theorem given earlier, which implies that
if
then
where
.
Example: Let .
Find
Find a number
such that
.
Solution:
Gaussian Distribution: The normal distribution is also known
as the Gaussian Note_1
distribution. The notation
means that
has Gaussian (normal) distribution with mean
and standard deviation
.
So, for example, if
then we could also write
.
Example: The heights of adult males in Canada are close to
normally distributed, with a mean of 69.0 inches and a standard deviation of
2.4 inches. Find the 10th and 90th percentiles of the height distribution.
(Recall that the a-th percentile is such that a% of the population has height
less that this value.)
Solution: We are being told that if
is the height of a randomly selected Canadian adult male, then
,
or equivalently
.
To find the 90th percentile
,
we
use
From the table we see
so we need
which gives
inches. Similarly, to find
such that
we find that
,
so we need
or
inches, as the 10th percentile.
Linear Combinations of Independent Normal Random Variables
Linear combinations of normal r.v.'s are important in many applications. Since we have not covered continuous multivariate distributions, we can only quote the second and third of the following results without proof. The first result follows easily from the change of variables method.
Let and , where and are constant real numbers. Then
Let
and
be independent, and let
and
be constants.
Then
.
In
general if
are independent and
are constants,
then
.
Let
be independent
random variables.
Then
and
.
Actually, the only new result here is that the distributions are normal. The
means and variances of linear combinations of r.v.'s were previously obtained
in section 8.3.
Example: Let and be independent. Find .
Solution: Whenever we have variables on both sides of the
inequality we should collect them on one side, leaving us with a linear
combination.
Example: Three cylindrical parts are joined end to end to
make up a shaft in a machine; 2 type A parts and 1 type B. The lengths of the
parts vary a little, and have the distributions:
and
.
The overall length of the assembled shaft must lie between 46.8 and 47.5 or
else the shaft has to be scrapped. Assume the lengths of different parts are
independent. What percent of assembled shafts have to be scrapped?
Exercise: Why would it be wrong to represent the length of
the shaft as 2A + B? How would this length differ from the solution given
below?
Solution: Let , the length of the shaft, be .
Then and so
i.e. 23.18% are acceptable and 76.82% must be scrapped. Obviously we have to
find a way to reduce the variability in the lengths of the parts. This is a
common problem in manufacturing.
Exercise: How could we reduce the percent of shafts being
scrapped? (What if we reduced the variance of
and
parts each by 50%?)
Example:
The heights of adult females in a large population is well represented by a
normal distribution with mean 64 in. and variance 6.2
in.
Find the proportion of females whose height is between 63 and 65 inches.
Suppose 10 women are randomly selected, and let be their average height ( i.e. , where are the heights of the 10 women). Find .
How large must
be so that a random sample of
women gives an average height
so that
?
Solution:
so for the height of a random woman,
so
If then
iff
.
(This is because
.
So
iff
which is true if
,
or
.
Thus we require
since
is an integer.
Remark: This shows that if we were to select a random sample of persons, then their average height would be with 1 inch of the average height of the whole population of women. So if we did not know then we could estimate it to within inch (with probability .95) by taking this small a sample.
Exercise: Find how large
would have to be to make
.
These ideas form the
basis of statistical sampling and estimation of unknown parameter values in
populations and processes. If
and we know roughly what
is, but don't know
,
then we can use the fact that
to find the probability that the mean
from a sample of size
will be within a given distance of
.
Problems:
Let and be independent. Find the probability
where is the sample mean of 25 independent observations on .
Let have a normal distribution. What percent of the time does lie within one standard deviation of the mean? Two standard deviations? Three standard deviations?
Let . An independent variable is also normally distributed with mean 7 and standard deviation 3. Find:
The probability differs from by more than 4.
The minimum number,
,
of independent observations needed on
so
that
is the sample mean)
The normal distribution can, under certain conditions, be used to approximate probabilities for linear combinations of variables having a non-normal distribution. This remarkable property follows from an amazing result called the central limit theorem. There are actually several versions of the central limit theorem. The version given below is one of the simplest.
Central Limit Theorem (CLT):
The major reason that the normal distribution is so commonly used is that it tends to approximate the distribution of sums of random variables. For example, if we throw fair dice and is the sum of the outcomes, what is the distribution of The tables below provide the number of ways in which a given value can be obtained. The corresponding probability is obtained by dividing by For example on the throw of dice the probable outcomes are 1,2,...,6 with probabilities all as indicated in the first panel of the histogram in Figure clt.
The probability histogram of
the sum of
discrete uniform {1,2,3,4,5,6}Random variables
If we sum the values on two fair dice, the possible outcomes are the values 2,3,...,12 as shown in the following table and the probabilities are the values below:
Values | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
Probabilities | 1 | 2 | 3 | 4 | 5 | 6 | 5 | 4 | 3 | 2 | 1 |
The probability histogram of these values is shown in the second panel. Finally for the sum of the values on three independent dice, the values range from 3 to 18 and have probabilities which, when multiplied by result in the values
1 | 3 | 6 | 10 | 15 | 21 | 25 | 27 | 27 | 25 | 21 | 15 | 10 | 6 | 3 | 1 |
Let be independent random variables all having the same distribution, with mean and variance . Then as , and This is actually a rough statement of the result since, as , both the and distributions fail to exist. (The former because both and , the latter because .) A precise version of the results is:
If be independent random variables all having the same distribution, with mean and variance , then as , the cumulative distribution function of the random variable approaches the c.d.f. Similarly, the c.d.f. of approaches the standard normal c.d.f.
Although this is a theorem about limits, we will use it when is large, but finite, to approximate the distribution of or by a normal distribution, so the rough version of the theorem in (cltsum) and (cltmean) is adequate for our purposes.
Notes:
This theorem works for essentially all distributions which could have. The only exception occurs when has a distribution whose mean or variance don't exist. There are such distributions, but they are rare.
We will use the Central Limit Theorem to approximate the distribution of sums or averages . The accuracy of the approximation depends on (bigger is better) and also on the actual distribution the 's come from. The approximation works better for small when 's p.d.f. is close to symmetric.
If you look at the section on linear combinations of independent normal random variables you will find two results which are very similar to the central limit theorem. These are:
For independent and , , and .
Thus, if the 's themselves have a normal distribution, then and have exactly normal distributions for all values of . If the 's do not have a normal distribution themselves, then and have approximately normal distributions when is large. From this distinction you should be able to guess that if the 's distribution is somewhat normal shaped the approximation will be good for smaller values of than if the 's distribution is very non-normal in shape. (This is related to the second remark in (2)).
Example: Hamburger patties are packed 8 to a box, and each box is supposed to have 1 Kg of meat in it. The weights of the patties vary a little because they are mass produced, and the weight of a single patty is actually a random variable with mean kg and standard deviation kg. Find the probability a box has at least 1 kg of meat, assuming that the weights of the 8 patties in any given box are independent.
Solution: Let be the weights of the 8 patties in a box, and be their total weight. By the Central Limit Theorem, is approximately ; we'll assume this approximation is reasonable even though is small. (This is likely ok because 's distribution is likely fairly close to normal itself.) Thus and (We see that only about 95% of the boxes actually have 1 kg or more of hamburger. What would you recommend be done to increase this probability to 95%?)
Example: Suppose fires reported to a fire station satisfy the conditions for a Poisson process, with a mean of 1 fire every 4 hours. Find the probability the fire of the year is reported on the day of the year.
Solution: Let be the time between the and fires ( is the time to the fire). Then has an exponential distribution with hrs, or day. Since is the time until the 500th fire, we want to find . While the exponential distribution is not close to normal shaped, we are summing a large number of independent exponential variables. Hence, by the central limit theorem, has approximately a distribution, where and .
For exponential distributions, and so
Example: This example is frivolous but shows how the normal distribution can approximate even sums of discrete r.v.'s. In an orchard, suppose the number of worms in an apple has probability function:
0 | 1 | 2 | 3 | |
.4 | .3 | .2 | .1 |
Find the probability a basket with 250 apples in it has between 225 and 260
(inclusive) worms in it.
Solution:
By the central limit theorem,
has approximately a
distribution, where
is the number of worms in the
apple.
i.e.
While this approximation is adequate, we can improve its accuracy, as follows. When has a discrete distribution, as it does here, will always remain discrete no matter how large gets. So the distribution of , while normal shaped, will never be precisely normal. Consider a probability histogram of the distribution of , as shown in Figure p167. (Only part of the histogram is shown.)
The area of each bar of this histogram is the probability at the value in the centre of the interval. The smooth curve is the p.d.f. for the approximating normal distribution. Then is the total area of all bars of the histogram for from 225 to 260. These bars actually span continuous values from 224.5 to 260.5. We could then get a more accurate approximation by finding the area under the normal curve from 224.5 to 260.5.
i.e.
Unless making this adjustment greatly complicates the solution, it is
preferable to make this "continuity
correction".
Notes:
A continuity correction should not be applied when approximating a continuous distribution by the normal distribution. Since it involves going halfway to the next possible value of , there would be no adjustment to make if takes real values.
Rather than trying to guess or remember when to add .5 and when to subtract .5, it is often helpful to sketch a histogram and shade the bars we wish to include. It should then be obvious which value to use.
Example: Normal approximation to the Poisson Distribution
Let be a random variable with a Poisson distribution and suppose is large. For the moment suppose that is an integer and recall that if we add independent Poisson random variables, each with parameter then the sum has the Poisson distribution with parameter In general, a Poisson random variable with large expected value can be written as the sum of a large number of independent random variables, and so the central limit theorem implies that it must be close to normally distributed. We can prove this using moment generating functions. In Section 7.5 we found the moment generating function of a Poisson random variable Then the standardized random variable is and this has moment generating function This is easier to work with if we take logarithms, Now as and so Therefore the moment generating function of the standardized Poisson random variable approaches the moment generating function of the standard normal and this implies that the Poisson distribution approaches the normal as
Normal approximation to the Binomial Distribution
It is well-known that the binomial distribution, at least for large values of
resembles a bell-shaped or normal curve. The most common demonstration of this
is with a mechanical device common in science museums called a "Galton board"
or "Quincunx" Note_2 which drop
balls through a mesh of equally spaced pins (see Figure
balldrop and the applet at
http://javaboutique.internet.com/BallDrop/
). Notice
that if balls either go to the right or left at each of the 8 levels of pins,
independently of the movement of the other balls, then
number
of moves to right has a
distribution. If the balls are dropped from location
(on the
axis)
then the ball eventually rests at location
which is approximately normally distributed since
is approximately normal.
A "Galton Board" or
"Quincunx"
The following result is easily proved using the Central Limit Theorem.
Let have a binomial distribution, . Then for large, the r.v.
Proof: We use indicator variables
where
if the
th
trial in the binomial process is an
"" outcome
and 0 if it is an
"" outcome.
Then
and we can use the CLT. Since
we have that as
is
,
as stated. \framebox[0.10in]{}
An alternative proof uses moment generating functions and is essentially a proof of this particular case of the Central Limit Theorem. Recall that the moment generating function of the binomial random variable is As we did with the standardized Poisson random variable, we can show with some algebraic effort that the moment generating function of proving that the standardized binomial random variable approaches the standard normal distribution.
Remark: We can write the normal approximation either as or as .
Remark: The continuity correction method can be used here. The following numerical example illustrates the procedure.
Example: If (i) , use the theorem to find the approximate probability and (ii) if find the approximate probability . Compare the answer with the exact value in each case.
Solution (i) By the theorem above, approximately. Without the continuity correction, where . Using the continuity correction method, we get The exact probability is , which (using the function ) is .963. As expected the continuity correction method gives a more accurate approximation.
(ii) approximately so without the continuity correction With the continuity correction The exact value, , equals .866 (to 3 decimals). The error of the normal approximation decreases as increases, but it is a good idea to use the CC when it is convenient.
Example: Let be the proportion of Canadians who think Canada should adopt the US dollar.
Suppose 400 Canadians are randomly chosen and asked their opinion. Let be the number who say yes. Find the probability that the proportion, , of people who say yes is within .02 of , if is .20.
Find the number, , who must be surveyed so there is a 95% chance that lies within .02 of . Again suppose is .20.
Repeat (b) when the value of
is unknown.
Solution:
. Using the normal approximation we take
If lies within , then , so . Thus, we find
Since is unknown, it is difficult to apply a continuity correction, so we omit it in this part. By the normal approximation, Therefore, is the condition we need to satisfy. This gives Therefore, and so giving In other words, we need to survey 1537 people to be at least 95% sure that lies within .02 either side of .
Now using the normal approximation to the binomial, approximately
and
so
We wish to find
such that
As is part
(b),
Solving for
Unfortunately this does not give us an explicit expression for
because we don't know
.
The way out of this dilemma is to find the maximum value
could take. If we choose
this large, then we can be sure of having the required precision in our
estimate,
,
for any
.
It's easy to see that
is a maximum when
.
Therefore we take
i.e., if we survey 2401 people we can be 95% sure that
lies within .02 of
,
regardless of the value of
.
Remark: This method is used when poll results are reported
in the media: you often see or hear that "this poll is accurate to with 3
percent 19 times out of 20". This is saying that
was big enough so that
was 95%. (This requires
of about 1067.)
Problems:
Tomato seeds germinate (sprout to produce a plant) independently of each other, with probability 0.8 of each seed germinating. Give an expression for the probability that at least 75 seeds out of 100 which are planted in soil germinate. Evaluate this using a suitable approximation.
A metal parts manufacturer inspects each part produced. 60% are acceptable as produced, 30% have to be repaired, and 10% are beyond repair and must be scrapped. It costs the manufacturer $10 to repair a part, and $100 (in lost labour and materials) to scrap a part. Find the approximate probability that the total cost associated with inspecting 80 parts will exceed $1200.
The diameters of spherical particles produced by a machine are randomly distributed according to a uniform distribution on [.6,1.0] (cm). Find the distribution of , the volume of a particle.
A continuous random variable has p.d.f.
Find and the c.d.f. of . Graph and the c.d.f.
Find the value of such that .
When people are asked to make up a random number between 0 and 1, it has been found that the distribution of the numbers, , has p.d.f. close to (rather than the distribution which would be expected). Find the mean and variance of .
For 100 ``random'' numbers from the above distribution find the probability their sum lies between 49.0 and 50.5.
What would the answer to (b) be if the 100 numbers were truly ?
Let have p.d.f. , and let . Find the p.d.f. of .
A continuous random variable which takes values between 0 and 1 has probability density function
For what values of is this a p.d.f.? Explain.
Find and
Find the probability density function of .
The magnitudes of earthquakes in a region of North America can be modelled by an exponential distribution with mean 2.5 (measured on the Richter scale).
If 3 earthquakes occur in a given month, what is the probability that none exceed 5 on the Richter scale?
If an earthquake exceeds 4, what is the probability it also exceeds 5?
A certain type of light bulb has lifetimes that follow an exponential distribution with mean 1000 hours. Find the median lifetime (that is, the lifetime such that 50% of the light bulbs fail before ).
The examination scores obtained by a large group of students can be modelled by a normal distribution with a mean of 65% and a standard deviation of 10%.
Find the percentage of students who obtain each of the following letter grades:
Find the probability that the average score in a random group of 25 students exceeds 70%.
Find the probability that the average scores of two distinct random groups of 25 students differ by more than 5%.
The number of litres that a filling machine in a water bottling plant deposits in a nominal two litre bottle follows a normal distribution , where (litres) and is the setting on the machine.
If , what is the probability a bottle has less than 2 litres of water in it?
What should be set at to make the probability a bottle has less than 2 litres be less than ?
A turbine shaft is made up of 4 different sections. The lengths of those
sections are independent and have normal distributions with
and
:
(8.10, .22), (7.25, .20),
(9.75, .24), and (3.10, .20). What is the
probability an assembled shaft meets the specifications
?
Let and be independent.
Find:
a number such that .
The amount, , of wine in a bottle (Note: means liters.)
The bottle is labelled as containing . What is the probability a bottle contains less than ?
Casks are available which have a volume, , which is . What is the probability the contents of 20 randomly chosen bottles will fit inside a randomly chosen cask?
In problem 8.18, calculate the probability of passing the exam, both with and
without guessing if (a) each
= .45; (b) each
.
What
is the best strategy for passing the course if (a)
(b)
?
Suppose that the diameters in millimeters of the eggs laid by a large flock of hens can be modelled by a normal distribution with a mean of 40 mm. and a variance of 4 mm. The wholesale selling price is 5 cents for an egg less than 37 mm in diameter, 6 cents for eggs between 37 and 42 mm, and 7 cents for eggs over 42 mm. What is the average wholesale price per egg?
In a survey of voters from a given riding in Canada, the proportion who say they would vote Conservative is used to estimate , the probability a voter would vote P.C. ( is the number of Conservative supporters in the survey.) If Conservative support is actually 16%, how large should be so that with probability .95, the estimate will be in error at most .03?
When blood samples are tested for the presence of a disease, samples from 20 people are pooled and analysed together. If the analysis is negative, none of the 20 people is infected. If the pooled sample is positive, at least one of the 20 people is infected so they must each be tested separately; i.e., a total of 21 tests is required. The probability a person has the disease is .02.
Find the mean and variance of the number of tests required for each group of 20.
For 2000 people, tested in groups of 20, find the mean and variance of the total number of tests. What assumption(s) has been made about the pooled samples?
Find the approximate probability that more than 800 tests are required for the 2000 people.
Suppose 80% of people who buy a new car say they are satisfied with the car when surveyed one year after purchase. Let be the number of people in a group of 60 randomly chosen new car buyers who report satisfaction with their car. Let be the number of satisfied owners in a second (independent) survey of 62 randomly chosen new car buyers. Using a suitable approximation, find . A continuity correction is expected.
Suppose that the unemployment rate in Canada is 7%.
Find the approximate probability that in a random sample of 10,000 persons in the labour force, the number of unemployed will be between 675 and 725 inclusive.
How large a random sample would it be necessary to choose so that, with probability , the proportion of unemployed persons in the sample is between 6.9% and 7.1%?
Gambling. Your chances of winning or losing money can be calculated in many games of chance as described here.
Suppose each time you play a game (or place a bet) of $1 that the probability you win (thus ending up with a profit of $1) is .49 and the probability you lose (meaning your ``profit" is -$1) is .51
Let represent your profit after independent plays or bets. Give a normal approximation for the distribution of .
If , determine . (This is the probability you are ``ahead" after 20 plays.) Also find if and . What do you conclude?
Note: For many casino games (roulette, blackjack) there are bets for which your probability of winning is only a little less than .5. However, as you play more and more times, the probability you lose (end up ``behind") approaches 1.
Suppose now you are the casino. If all players combined place $1 bets in an evening, let be your profit. Find the value with the property that . Explain in words what this means.
Gambling: Crown and Anchor. Crown and Anchor is a game that is sometimes played at charity casinos or just for fun. It can be played with a ``wheel of fortune" or with 3 dice, in which each die has its 6 sides labelled with a crown, an anchor, and the four card suits club, diamond, heart and spade, respectively. You bet an amount (let's say $1) on one of the 6 symbols: let's suppose you bet on ``heart". The 3 dice are then rolled simultaneously and you win if hearts turn up ().
Let represent your profits from playing the game times. Give a normal approximation for the distribution of .
Find (approximately) the probability that if (i) , (ii) .
Binary classification. Many situations require that we ``classify" a unit of some type as being one of two types, which for convenience we will term Positive and Negative. For example, a diagnostic test for a disease might be positive or negative; an email message may be spam or not spam; a credit card transaction may be fraudulent or not. The problem is that in many cases we cannot tell for certain whether a unit is Positive or Negative, so when we have to decide which a unit is, we may make errors. The following framework helps us to deal with these problems.
For a randomly selected unit from the population being considered, define the indicator random variable Suppose that we cannot know for certain whether or for a given unit, but that we can get a measurement with the property that where . We now decide to classify units as follows, based on their measurement : select some value between and , and then
if , classify the unit as Positive
if , classify the unit as Negative
Suppose . Find the probability that
If a unit is really Positive, they are wrongly classified as Negative. (This is called the ``false negative" probability.)
If a unit is really Negative, they are wrongly classified as Positive. (This is called the ``false positive" probability.)
Repeat the calculations if as in (a), but . Explain in plain English why the false negative and false positive misclassification probabilities are smaller than in (a).
Binary classification and spam detection. The approach in the preceding question can be used for problems such as spam detection, which was discussed earlier in Problems 4.17 and 4.18. Instead of using binary features as in those problems, suppose that for a given email message we compute a measure , designed so that tends to be high for spam messages and low for regular (non-spam) messages. (For example can be a composite measure based on the presence or absence of certain words in a message, as well as other features.) We will treat as a continuous random variable.
Suppose that for spam messages, the distribution of is approximately , and that for regular messages, it is approximately , where . This is the same setup as for Problem 9.21. We will filter spam by picking a value , and then filtering any message for which . The trick here is to decide what value of to use.
Suppose that . Calculate the probability of a false positive (filtering a message that is regular) and a false negative (not filtering a message that is spam) under each of the three choices (i) (ii) (iii) .
What factors would determine which of the three choices of would be best to use?
Random chords of a circle. Given a circle, find the
probability that a chord chosen at random be longer than the side of an
inscribed equilateral triangle. For example in Figure
bertrand, the line joining
and
satisfies the condition, the other lines do not.
Bertrand's Paradox
This is called Bertrand's
paradox (see the Java applet at http://www.cut-the-knot.org/bertrand.shtml)
and there various possible solutions, depending on exactly how you interpret
the phrase "a chord chosen at random". For example, since the only important
thing is the position of the second point relative to the first one, we can
fix the point
and consider only the chords that emanate from this point. Then it becomes
clear that 1/3 of the outcomes (those with angle with the tangent at that
point between 60 and 120 degrees) will result in a chord longer than the side
of an equilateral triangle. But a chord is fully determined by its midpoint.
Chords whose length exceeds the side of an equilateral triangle have their
midpoints inside a smaller circle with radius equal to 1/2 that of the given
one. If we choose the midpoint of the chord at random and uniformly from the
points within the circle, what is the probability that corresponding chord has
length greater than the side of the triangle? Can you think of any other
interpretations which lead to different answers?
A model for stock returns. A common model for stock returns is as follows: the number of trades of stock XXX in a given day has a Poisson distribution with parameter At each trade, say the 'th trade, the change in the price of the stock is and has a normal distribution with mean and variance say and these changes are independent of one another and independent of Find the moment generating function of the total change in stock price over the day. Is this a distribution that you recognise? What is its mean and variance?
Let be independent random variable with a Normal distribution having mean and variance Find the moment generating function for