6. Discrete Random Variables and Probability Models

Random Variables and Probability Functions

Probability models are used to describe outcomes associated with random processes. So far we have used sets $A,B,C, \dots$ in sample spaces to describe such outcomes. In this chapter we introduce numerical-valued variables $X, Y, \dots$ to describe outcomes. This allows probability models to be manipulated easily using ideas from algebra, calculus, or geometry.

A random variable (r.v.) is a numerical-valued variable that represents outcomes in an experiment or random process. For example, suppose a coin is tossed 3 times; then MATH would be a random variable. Associated with any random variable is a range or domain , which is the set of possible values for the variable. For example, the random variable defined above has range $A=\{0,1,2,3\}$ .

Random variables are denoted by capital letters like $X,Y,\dots$ and their possible values are denoted by $x,y,\dots$ . This gives a nice short-hand notation for outcomes: for example, in the experiment above stands for ``2 heads occurred''.

Random variables are always defined so that if the associated random process or experiment is carried out, then one and only one of the outcomes $``X = x' (x \in A)$ occurs. In other words, the possible values for form a partition of the points in the sample space for the experiment. In more advanced mathematical treatments of probability, a random variable is defined as a function on a sample space, as follows:

Definition

A random variable is a function that assigns a real number to each point in a sample space .

To understand this definition, consider the experiment in which a coin is tossed 3 times, and suppose that we used the sample space MATH Then each of the outcomes (where = number of heads) represents an event (either simple or compound) and so a real number can be associated with each point in . In particular, the point corresponds to ; the points correspond to ; the points correspond to ; and the point corresponds to .

As you may recall, a function is a mapping of each point in a domain into a unique point. e.g. The function $y = x^{3}$ maps the point in the domain into the point in the range. We are familiar with this rule for mapping being defined by a mathematical formula. However, the rule for mapping a point in the sample space (domain) into the real number in the range of a random variable is most often given in words rather than by a formula. As mentioned above, we generally denote random variables, in the abstract, by capital letters , etc.) and denote the actual numbers taken by random variables by small letters , etc.). You may, in your earlier studies, have had this distinction made between a function and the value of a function .

Since represents an outcome of some kind, we will be interested in its probability, which we write as . To discuss probabilities for random variables, it is easiest if they are classified into two types, according to their ranges:

Discrete r.v.'s take integer values or, more generally, values in a countable set (recall that a set is countable if its elements can be placed in a one-one correspondence with a subset of the positive integers).

Continuous r.v.'s take values in some interval of real numbers.

Examples might be:

Discrete Continuous

number of people in a car total weight of people in a car

number of cars in a parking lot distance between cars in a parking lot

number of phone calls to 911 time between calls to 911.

In theory there could also be mixed r.v.'s which are discrete-valued over part of their range and continuous-valued over some other portion of their range. We will ignore this possibility here and concentrate first on discrete r.v.'s. Continuous r.v.'s are considered in Chapter 9.

Our aim is to set up general models which describe how the probability is distributed among the possible values a random variable can take. To do this we define for any discrete random variable the probability function.

Definition

The probability function (p.f.) of a random variable is the function MATH

The set of pairs is called the probability distribution of . All probability functions must have two properties:

$f(x)\geq0$ for all values of (i.e. for $x\in A$ )

By implication, these properties ensure that $f(x)\leq1$ for all . We consider a few ``toy'' examples before dealing with more complicated problems.

Example 1: Let be the number obtained when a die is thrown. We would normally use the probability function for $x=1,2,3,\cdots,6$ . In fact there probably is no absolutely perfect die in existence. For most dice, however, the 6 sides will be close enough to being equally likely that is a satisfactory model for the distribution of probability among the possible outcomes.

Example 2: Suppose a "fair" coin is tossed 3 times, and let be the number of heads occurring. Then a simple calculation using the ideas from earlier chapters gives MATH Note that instead of writing , we have given a simple algebraic expression.

Example 3: Find the value of which makes below a probability function.

0 1 2 3

Using gives . Hence .

While the probability function is the most common way of describing a probability model, there are other possibilities. One of them is by using the cumulative distribution function (c.d.f.).

Definition

The cumulative distribution function of is the function usually denoted by MATH defined for all real numbers .

In the last example, with , we have for $x\in A$

0 1 2 3

0.1 0.3 0.6 1

	0	1	2	3
	0.1	0.3	0.6	1

since, for instance, MATH Similarly, can be defined for real numbers not in the range of the random variable, for example MATH The c.d.f. for this example is plotted in Figure cdf.
cdf.eps

A simple cumulative distribution function

In general, can be obtained from by the fact that MATH

A c.d.f. has certain properties, just as a probability function does. Obviously, since it represents a probability, must be between 0 and 1. In addition it must be a non-decreasing function (e.g. $P(X\leq8)$ cannot be less than $P(X\leq7)$ ). Thus we note the following properties of a c.d.f. :

is a non-decreasing function of
$0\leq F(x)\leq1$ for all
and

We have noted above that can be obtained from . The opposite is also true; for example the following result holds:

If takes on integer values then for values such that $x\in A$ and $x-1\in A$ , MATH This says that is the size of the jump in at the point

To prove this, just note that MATH

When a random variable has been defined it is sometimes simpler to find its probability function (p.f.) first, and sometimes it is simpler to find first. The following example gives two approaches for the same problem.

Example: Suppose that balls labelled $1,2,\dots,N$ are placed in a box, and balls $(n\leq N)$ are randomly selected without replacement. Define the r.v. MATH Find the probability function for .

Solution 1: If then we must select the number plus numbers from the set $\{1,2,\dots,x-1\}$ . (Note that this means we need $x\geq n$ .) This gives MATH

Solution 2: First find $F(x)=P(X\leq x)$ . Noting that $X\leq x$ if and only if all balls selected are from the set $\{1,2,\dots,x\}$ , we get MATH We can now find MATH as before.

Remark: When you write down a probability function or a cumulative distribution function, don't forget to give the range of the function (i.e. the possible values of the random variable). This is part of the function's definition.

Sometimes we want to graph a probability function . The type of graph we will use most is called a (probability) histogram. For now, we'll define this only for r.v.'s whose range is some set of consecutive integers $\{0,1,2,\dots\}$ . A histogram of is then a graph consisting of adjacent bars or rectangles. At each we place a rectangle with base on and with height . In the above Example 3, a histogram of looks like that in Figure probhist.
probhist1.eps

Probability histogram for

Notice that the areas of these rectangles correspond to the probabilities, so for example $P(1\leq X\leq3)$ is the sum of the area of the three rectangles above the points and In general, probabilities are depicted by areas.

Model Distributions:

Many processes or problems have the same structure. In the remainder of this course we will identify common types of problems and develop probability distributions that represent them. In doing this it is important to be able to strip away the particular wording of a problem and look for its essential features. For example, the following three problems are all essentially the same.

A fair coin is tossed 10 times and the ``number of heads obtained'' is recorded.
Twenty seeds are planted in separate pots and the ``number of seeds germinating'' is recorded.
Twelve items are picked at random from a factory's production line and examined for defects. The number of items having no defects is recorded.

What are the common features? In each case the process consists of ``trials" which are repeated a stated number of times - 10, 20, and 12. In each repetition there are two types of outcomes - heads/tails, germinate/don't germinate, and no defects/defects. These repetitions are independent (as far as we can determine), with the probability of each type of outcome remaining constant for each repetition. The random variable we record is the number of times one of these two types of outcome occurred.

Six model distributions for discrete r.v.'s will be developed in the rest of this chapter. Students often have trouble deciding which one (if any) to use in a given setting, so be sure you understand the physical setup which leads to each one. Also, as illustrated above you will need to learn to focus on the essential features of the situation as well as the particular content of the problem.

Statistical Computing

A number of major software systems have been developed for probability and statistics. We will use a system called , which has a wide variety of features and which has Unix and Windows versions. Appendix 6.1 at the end of this chapter gives a brief introduction to , and how to access it. For this course, can compute probabilities for all the distributions we consider, can graph functions or data, and can simulate random processes. In the sections below we will indicate how can be used for some of these tasks.

Problems:

Let have probability function . Find .
Suppose that 5 people, including you and a friend, line up at random. Let be the number of people standing between you and your friend. Tabulate the probability function and the cumulative distribution function for .

Discrete Uniform Distribution

We define each model in terms of an abstract ``physical setup", or setting, and then consider specific examples of the setup.

Physical Setup: Suppose takes values $a,a+1,a+2,\cdots,b$ with all values being equally likely. Then has a discrete uniform distribution, on .

Illustrations:

If is the number obtained when a die is rolled, then has a discrete uniform distribution with and .
Computer random number generators give uniform variables, for a specified positive integer . These are used for many purposes, e.g. generating lottery numbers or providing automated random sampling from a set of items.

Probability Function: There are values can take so the probability at each of these values must be $\frac{1}{b-a+1}$ in order that . Therefore MATH

Problem 6.2.1

Let be the largest number when a die is rolled 3 times. First find the c.d.f., , and then find the probability function, .

Hypergeometric Distribution

Physical Setup: We have a collection of objects which can be classified into two distinct types. Call one type "success" Note_1 and the other type "failure" . There are successes and failures. Pick objects at random without replacement. Let be the number of successes obtained. Then has a hypergeometric distribution.

Illustrations:

The number of aces in a bridge hand has a hypergeometric distribution with , and .
In a fleet of 200 trucks there are 12 which have defective brakes. In a safety check 10 trucks are picked at random for inspection. The number of trucks with defective brakes chosen for inspection has a hypergeometric distribution with .

Probability Function: Using counting techniques we note there are points in the sample space if we don't consider order of selection. There are ways to choose the success objects from the available and ways to choose the remaining objects from the failures. Hence MATH The range of values for is somewhat complicated. Of course, $x\geq0$ . However if the number, , picked exceeds the number, , of failures, the difference, must be successes. So . Also, $x\leq r$ since we can't get more successes than the number available. But $x\leq n$ , since we can't get more successes than the number of objects chosen. .

Example: In Lotto 6/49 a player selects a set of six numbers (with no repeats) from the set $\{1,2,\dots,49\}$ . In the lottery draw six numbers are selected at random. Find the probability function for , the number from your set which are drawn.

Solution: Think of your numbers as the objects and the remainder as the objects. Then has a hypergeometric distribution with and , so MATH For example, you win the jackpot prize if ; the probability of this is , or about 1 in 13.9 million.

Remark: Hypergeometric probabilities are tedious to compute using a calculator. The functions and can be used to evaluate and the c.d.f . In particular, gives and gives . Using this we find for the Lotto 6/49 problem here, for example, that is calculated by typing in , which returns the answer or .

For all of our model distributions we can also confirm that . To do this here we use a summation result from Chapter 5 called the hypergeometric identity. Letting in that identity we get MATH

Problems:

A box of 12 tins of tuna contains which are tainted. Suppose 7 tins are opened for inspection and none of these 7 is tainted.
1. Calculate the probability that none of the 7 is tainted for .
2. Do you think it is likely that the box contains as many as 3 tainted tins?
Derive a formula for the hypergeometric probability function using a sample space in which order of selection is considered.

Binomial Distribution

Physical Setup:

Suppose an "experiment" has two types of distinct outcomes. Call these types "success" and "failure" , and let their probabilities be (for ) and (for ). Repeat the experiment independent times. Let be the number of successes obtained. Then has what is called a binomial distribution. (We write as a shorthand for " is distributed according to a binomial distribution with repetitions and probability of success".) The individual experiments in the process just described are often called "trials", and the process is called a Bernoulli Note_2 process or a binomial process.

Illustrations:

Toss a fair die 10 times and let be the number of sixes that occur. Then .
In a microcircuit manufacturing process, 60% of the chips produced work (40% are defective). Suppose we select 25 independent chips and let be the number that work. Then $X\sim Bi(25,.6)$ .

Comment: We must think carefully whether the physical process we are considering is closely approximated by a binomial process, for which the key assumptions are that (i) the probability of success is constant over the trials, and (ii) the outcome ( or ) on any trial is independent of the outcome on the other trials. For Illustration 1 these assumptions seem appropriate. For Illustration 2 we would need to think about the manufacturing process. Microcircuit chips are produced on "wafers" containing a large number of chips and it is common for defective chips to cluster on wafers. This could mean that if we selected 25 chips from the same wafer, or from only 2 or 3 wafers, that the "trials" (chips) might not be independent.

Probability Function: There are different arrangements of 's and 's over the trials. The probability for each of these arrangements has multiplied together times and multiplied times, in some order, since the trials are independent. So each arrangement has probability $p^{x}(1-p)^{n-x}$ . MATH Checking that $\sum f(x)=1$ : MATH We graph in Figure binomhist the probability function for the Binomial distribution with parameters and Although the formula for may seem complicated this shape is typically, increasing to a maximum value near and then decreasing thereafter.
binomhist.eps

The Binomial probability histogram.

Computation: Many software packages and some calculators give binomial probabilities. In we use the function to compute and to compute the corresponding c.d.f. $F(x)=P(X\leq x)$ .

Example Suppose that in a weekly lottery you have probability .02 of winning a prize with a single ticket. If you buy 1 ticket per week for 52 weeks, what is the probability that (a) you win no prizes, and (b) that you win 3 or more prizes?

Solution: Let be the number of weeks that you win; then $X\sim Bi(52,.02)$ . We find

MATH
(Note that $P(X\leq2)$ is given by the command .)

Comparison of Binomial and Hypergeometric Distributions:

These distributions are similar in that an experiment with 2 types of outcome ( and ) is repeated times and is the number of successes. The key difference is that the binomial requires independent repetitions with the same probability of , whereas the draws in the hypergeometric are made from a fixed collection of objects without replacement. The trials (draws) are therefore not independent. For example, if there are objects and objects, then the probability of getting an on draw 2 depends on what was obtained in draw 1. If these draws had been made with replacement, however, they would be independent and we'd use the binomial rather than the hypergeometric model.

If is large and the number, , being drawn is relatively small in the hypergeometric setup then we are unlikely to get the same object more than once even if we do replace it. So it makes little practical difference whether we draw with or without replacement. This suggests that when we are drawing a fairly small proportion of a large collection of objects the binomial and the hypergeometric models should produce similar probabilities. As the binomial is easier to calculate, it is often used as an approximation to the hypergeometric in such cases.

Example: Suppose we have 15 cans of soup with no labels, but 6 are tomato and 9 are pea soup. We randomly pick 8 cans and open them. Find the probability 3 are tomato.

Solution: The correct solution uses hypergeometric, and is (with = number of tomato soups picked) MATH If we incorrectly used binomial, we'd get MATH As expected, this is a poor approximation since we're picking over half of a fairly small collection of cans.

However, if we had 1500 cans - 600 tomato and 900 pea, we're not likely to get the same can again even if we did replace each of the 8 cans after opening it. (Put another way, the probability we get a tomato soup on each pick is very close to .4, regardless of what the other picks give.) The exact, hypergeometric, probability is now . Here the binomial probability, MATH is a very good approximation.

Problems:

Megan audits 130 clients during a year and finds irregularities for 26 of them.
1. Give an expression for the probability that 2 clients will have irregularities when 6 of her clients are picked at random,
2. Evaluate your answer to (a) using a suitable approximation.
The flash mechanism on camera fails on 10% of shots, while that of camera fails on 5% of shots. The two cameras being identical in appearance, a photographer selects one at random and takes 10 indoor shots using the flash.
1. Give the probability that the flash mechanism fails exactly twice. What assumption(s) are you making?
2. Given that the flash mechanism failed exactly twice, what is the probability camera was selected?

Negative Binomial Distribution

Physical Setup:

The setup for this distribution is almost the same as for binomial; i.e. an experiment (trial) has two distinct types of outcome ( and ) and is repeated independently with the same probability, , of success each time. Continue doing the experiment until a specified number, , of success have been obtained. Let be the number of failures obtained before the $k^{\QTR{rm}{th}}$ success. Then has a negative binomial distribution. We often write $X\sim NB(k,p)$ to denote this.

Illustrations:

If a fair coin is tossed until we get our $5^{\QTR{rm}{th}}$ head, the number of tails we obtain has a negative binomial distribution with and $p = \frac{1}{2}$ .
As a rough approximation, the number of half credit failures a student collects before successfully completing 40 half credits for an honours degree has a negative binomial distribution. (Assume all course attempts are independent, with the same probability of being successful, and ignore the fact that getting more than 6 half credit failures prevents a student from continuing toward an honours degree.)

Probability Function: In all there will be trials ( 's and 's) and the last trial must be a success. In the first trials we therefore need failures and successes, in any order. There are different orders. Each order will have probability $p^{k}(1-p)^{x}$ since there must be trials which are failures and which are success. Hence MATH

Note: An alternate version of the negative binomial distribution defines to be the total number of trials needed to get the $k^{\QTR{rm}{th}}$ success. This is equivalent to our version. For example, asking for the probability of getting 3 tails before the $5^{\QTR{rm}{th}}$ head is exactly the same as asking for a total of 8 tosses in order to get the $5^{\QTR{rm}{th}}$ head. You need to be careful to read how is defined in a problem rather than mechanically "plugging in" numbers in the above formula for .

Checking that $\sum f(x)=1$ requires somewhat more work for the negative binomial distribution. We first re-arrange the term, MATH Factor a (-1) out of each of the terms in the numerator, and re-write these terms in reverse order, MATH Then (using the binomial theorem) MATH

Comparison of Binomial and Negative Binomial Distributions

These should be easily distinguished because they reverse what is specified or known in advance and what is variable.

Binomial: Know the number, , of repetitions in advance. Don't know the number of successes we'll obtain until after the experiment.

Negative Binomial: Know the number, , of successes in advance. Don't know the number of repetitions needed until after the experiment.

Example:

The fraction of a large population that has a specific blood type is .08 (8%). For blood donation purposes it is necessary to find 4 people with type blood. If randomly selected individuals from the population are tested one after another, then (a) What is the probability persons have to be tested to get 5 type persons, and (b) What is the probability that over 80 people have to be tested?

Solution:

Think of a type person as a success and a non-type as an . Let = number of persons who have to be tested and let = number of non-type persons in order to get 5 's. Then and MATH We are actually asked here about . Thus MATH Thus we have the answer to (a) as given above, and
(b) MATH

Note: Calculating such probabilities is easy with . To get we use and to get $F(x)=P(X\leq x)$ we use .

Problems:

You can get a group rate on tickets to a play if you can find 25 people to go. Assume each person you ask responds independently and has a 20% chance of agreeing to buy a ticket. Let be the total number of people you have to ask in order to find 25 who agree to buy a ticket. Find the probability function of .
A shipment of 2500 car headlights contains 200 which are defective. You choose from this shipment without replacement until you have 18 which are not defective. Let be the number of defective headlights you obtain.
1. Give the probability function, .
2. Using a suitable approximation, find .

Geometric Distribution

Physical Setup:

This is a special case of the negative binomial distribution with , i.e., an experiment is repeated independently with two types of outcome ( and ) each time, and the same probability, , of success each time. Let be the number of failures obtained before the first success.

Illustrations:

The probability you win a lottery prize in any given week is a constant . The number of weeks before you win a prize for the first time has a geometric distribution.
If you take STAT 230 until you pass it and attempts are independent with the same probability of a pass each time, then the number of failures would have a geometric distribution. (These assumptions are unlikely to be true for most persons! Why is this?)

Probability Function:

There is only the one arrangement with failures followed by 1 success. This arrangement has probability MATH which is the same as for .

Checking that $\sum f(x) = 1$ , we will be evaluating a geometric series, MATH

Note: The names of the models so far derive from the summation results which show sums to 1. The geometric distribution involved a geometric series; the hypergeometric distribution used the hypergeometric identity; both the binomial and negative binomial distributions used the binomial theorem.

Bernoulli Trials
The binomial, negative binomial and geometric models involve trials (experiments) which:

(1) are independent

(2) have 2 distinct types of outcome ( and )

(3) have the same probability of ``success'' each time.

Such trials are known as Bernoulli trials; they are named after an 18th century mathematician.

Problem 6.6.1

Suppose there is a 30% chance of a car from a certain production line having a leaky windshield. The probability an inspector will have to check at least cars to find the first one with a leaky windshield is .05. Find .

Poisson Distribution from Binomial

The Poisson Note_3 distribution has probability function (p.f.) of the form MATH where $\mu>0$ is a parameter whose value depends on the setting for the model. Mathematically, we can see that has the properties of a p.f., since $f(x)\geq0$ for $x=0,1,2,\dots$ and since MATH The Poisson distribution arises in physical settings where the random variable represents the number of events of some type. In this section we show how it arises from a binomial process, and in the following section we consider another derivation of the model.

We will sometimes write to denote that has the p.f. above.

Physical Setup: One way the Poisson distribution arises is as a limiting case of the binomial distribution as and $p\rightarrow0$ . In particular, we keep the product fixed at some constant value, $\mu$ , while letting $n\rightarrow\infty$ . This automatically makes $p\rightarrow0$ . Let us see what the limit of the binomial p.f. is in this case.

Probability Function: Since and MATH (For the binomial the upper limit on is , but we are letting $n\rightarrow\infty$ .) This result allows us to use the Poisson distribution with $\mu=np$ as a close approximation to the binomial distribution in processes for which is large and is small.

Example: 200 people are at a party. What is the probability that 2 of them were born on Jan. 1?

Solution: Assuming all days of the year are equally likely for a birthday (and ignoring February 29) and that the birthdays are independent (e.g. no twins!) we can use the binomial distribution with and for = number born on January 1, giving MATH Since is large and is close to 0, we can use the Poisson distribution to approximate this binomial probability, with , giving MATH As might be expected, this is a very good approximation.

Notes:

If is close to 1 we can also use the Poisson distribution to approximate the binomial. By interchanging the labels ``success'' and ``failure'', we can get the probability of ``success'' (formerly labelled ``failure'') close to 0.
The Poisson distribution used to be very useful for approximating binomial probabilities with large and near 0 since the calculations are easier. (This assumes values of $e^{x}$ to be available.) With the advent of computers, it is just as easy to calculate the exact binomial probabilities as the Poisson probabilities. However, the Poisson approximation is useful when employing a calculator without a built in binomial function.
The functions $dpois(x,\mu)$ and $ppois(x,\mu)$ give and .

Problem 6.7.1

An airline knows that 97% of the passengers who buy tickets for a certain flight will show up on time. The plane has 120 seats.

They sell 122 tickets. Find the probability that more people will show up than can be carried on the flight. Compare this answer with the answer given by the Poisson approximation.
What assumptions does your answer depend on? How well would you expect these assumptions to be met?

Poisson Distribution from Poisson Process

We now derive the Poisson distribution as a model for the number of events of some type (e.g. births, insurance claims, web site hits) that occur in time or in space. To this end, we use the ``order'' notation as to mean that the function approaches faster than $\Delta t$ as $\Delta t$ approaches zero, or that MATH For example but $(\Delta t)^{1/2}$ is not $o(\Delta t).$

Physical Setup:

Consider a situation in which events are occurring randomly over time (or space) according to the following conditions:

Independence: the number of occurrences in non-overlapping intervals are independent.
Individuality: for sufficiently short time periods of length $\Delta t,$ the probability of 2 or more events occurring in the interval is close to zero i.e. events occur singly not in clusters. More precisely, as the probability of two or more events in the interval of length $\Delta t$ must go to zero faster than or that
Homogeneity or Uniformity: events occur at a uniform or homogeneous rate $\lambda$ over time so that the probability of one occurrence in an interval $(t,t+\Delta t)$ is approximately $\lambda\Delta t$ for small $\Delta t$ for any value of More precisely,

These three conditions together define a Poisson Process.

Let be the number of event occurrences in a time period of length . Then it can be shown (see below) that has a Poisson distribution with $\mu=\lambda t$ .

Illustrations:

The emission of radioactive particles from a substance follows a Poisson process. (This is used in medical imaging and other areas.)
Hits on a web site during a given time period often follow a Poisson process.
Occurrences of certain non-communicable diseases sometimes follow a Poisson process.

Probability Function: We can derive the probability function from the conditions above. We are interested in time intervals of arbitrary length , so as a temporary notation, let $f_{t}(x)$ be the probability of occurrences in a time interval of length . We now relate $f_{t}(x)$ and $f_{t+\Delta t}(x)$ . From that we can determine what $f_{t}(x)$ is.

To find $f_{t+ \Delta t}(x)$ we note that for $\Delta t$ small there are only 2 ways to get event occurrences by time $t + \Delta t$ . Either there are events by time and no more from to $t + \Delta t$ or there are by time and 1 more from to $t + \Delta t$ . (, other possibilities are negligible if $\Delta t$ is small). This and condition 1 above (independence) imply that MATH Re-arranging gives .

Taking the limit as we get MATH This ``differential-difference" equation can be ``solved" by using the ``boundary" conditions $f_{0}(0) = 1$ and $f_{0}(x) = 0$ for $x = 1,2,3, \cdots$ . You can confirm that MATH satisfies these conditions and the equation above, even though you don't know how to solve the equations. If we let $\mu= \lambda t$ , we can re-write as , which is the Poisson distribution from Section 6.7. That is:

In a Poisson process with rate of occurrence $\lambda$ , the number of event occurrences

in a time interval of length has a Poisson distribution with $\mu=\lambda t$ .

Interpretation of $\mu$ and $\lambda$ :

$\lambda$ is referred to as the intensity or rate of occurrence parameter for the events. It represents the average rate of occurrence of events per unit of time (or area or volume, as discussed below). Then $\lambda t=\mu$ represents the average number of occurrences in units of time. It is important to note that the value of $\lambda$ depends on the units used to measure time. For example, if phone calls arrive at a store at an average rate of 20 per hour, then $\lambda=20$ when time is in hours and the average in 3 hours will be $3\times20$ or 60. However, if time is measured in minutes then $\lambda=20/60=1/3$ ; the average in 180 minutes (3 hours) is still .

Examples:

Suppose earthquakes recorded in Ontario each year follow a Poisson process with an average of 6 per year. The probability that 7 earthquakes will be recorded in a 2 year period is . We have used $\lambda=6$ and to get $\mu=12$ .
At a nuclear power station an average of 8 leaks of heavy water are reported per year. Find the probability of 2 or more leaks in 1 month, if leaks follow a Poisson process.

Solution: Assuming leaks satisfy the conditions for a Poisson process and that a month is of a year, we'll use the Poisson distribution with $\lambda=8$ and , so $\mu=8/12$ . Thus MATH

Random Occurrence of Events in Space

The Poisson process also applies when ``events'' occur randomly in space (either 2 or 3 dimensions). For example, the ``events'' might be bacteria in a volume of water or blemishes in the finish of a paint job on a metal surface. If is the number of events in a volume or area in space of size and if $\lambda$ is the average number of events per unit volume (or area), then has a Poisson distribution with $\mu=\lambda v$ .

For this model to be valid, it is assumed that the Poisson process conditions given previously apply here, with ``time'' replaced by ``volume'' or ``area''. Once again, note that the value of $\lambda$ depends on the units used to measure volume or area.

Example: Coliform bacteria occur in river water with an average intensity of 1 bacteria per 10 cubic centimeters (cc) of water. Find (a) the probability there are no bacteria in a 20cc sample of water which is tested, and (b) the probability there are 5 or more bacteria in a 50cc sample. (To do this assume that a Poisson process describes the location of bacteria in the water at any given time.)

Solution: Let = number of bacteria in a sample of volume cc. Since $\lambda$ = 0.1 bacteria per cc (1 per 10cc) the p.f. of is Poisson with $\mu=.1v$ , MATH Thus we find

With
With and

(Note: we can use the command to get $P(X\leq4)$ .)

Exercise: In each of the above examples, how well are each of the conditions for a Poisson process likely to be satisfied?

Distinguishing Poisson from Binomial and Other Distributions

Students often have trouble knowing when to use the Poisson distribution and when not to use it. To be certain, the 3 conditions for a Poisson process need to be checked. However, a quick decision can often be made by asking yourself the following questions:

Can we specify in advance the maximum value which can take?
If we can, then the distribution is not Poisson. If there is no fixed upper limit, the distribution might be Poisson, but is certainly not binomial or hypergeometric, e.g. the number of seeds which germinate out of a package of 25 does not have a Poisson distribution since we know in advance that $X \leq25$ . The number of cardinals sighted at a bird feeding station in a week might be Poisson since we can't specify a fixed upper limit on . At any rate, this number would not have a binomial or hypergeometric distribution.
Does it make sense to ask how often the event did not occur?
If it does make sense, the distribution is not Poisson. If it does not make sense, the distribution might be Poisson. For example, it does not make sense to ask how often a person did not hiccup during an hour. So the number of hiccups in an hour might have a Poisson distribution. It would certainly not be binomial, negative Binomial, or hypergeometric. If a coin were tossed until the $3^{\QTR{rm}{rd}}$ head occurs it does make sense to ask how often heads did not come up. So the distribution would not be Poisson. (In fact, we'd use negative binomial for the number of non-heads; i.e. tails.)

Problems:

Suppose that emergency calls to 911 follow a Poisson process with an average of 3 calls per minute. Find the probability there will be
1. 6 calls in a period of 2 $\frac{1}{2}$ minutes.
2. 2 calls in the first minute of a 2 $\frac{1}{2}$ minute period, given that 6 calls occur in the entire period.
Misprints are distributed randomly and uniformly in a book, at a rate of 2 per 100 lines.
1. What is the probability a line is free of misprints?
2. Two pages are selected at random. One page has 80 lines and the other 90 lines. What is the probability that there are exactly 2 misprints on each of the two pages?

Combining Models

While we've considered the model distributions in this chapter one at a time, we will sometimes need to use two or more distributions to answer a question. To handle this type of problem you'll need to be very clear about the characteristics of each model. Here is a somewhat artificial illustration. Lots of other examples are given in the problems at the end of the chapter.

Example: A very large (essentially infinite) number of ladybugs is released in a large orchard. They scatter randomly so that on average a tree has 6 ladybugs on it. Trees are all the same size.

Find the probability a tree has ladybugs on it.
When 10 trees are picked at random, what is the probability 8 of these trees have ladybugs on them?
Trees are checked until 5 with ladybugs are found. Let be the total number of trees checked. Find the probability function, .
Find the probability a tree with ladybugs on it has exactly 6.
On 2 trees there are a total of ladybugs. Find the probability that of these are on the first of these 2 trees.

Solution:

If the ladybugs are randomly scattered the most suitable model is the Poisson distribution with $\lambda= 6$ and (i.e. any tree has a ``volume" of 1 unit), so $\mu= 6$ and
Using the binomial distribution where ``success'' means ladybugs on a tree, we have and
Using the negative binomial distribution, we need the number of successes, , to be 5, and the number of failures to be . Then
This is conditional probability. Let = { 6 ladybugs} and
= { ladybugs }. Then
Again we need to use conditional probability.

Use the Poisson distribution to calculate each, with $\mu=6\times2=12$ in the denominator since there are 2 trees.

MATH

Caution: Don't forget to give the range of . If the total is , there couldn't be more than ladybugs on the $1^{\QTR{rm}{st}}$ tree.

Exercise: The answer to (e) is a binomial probability function. Can you reach this answer by general reasoning rather than using conditional probability to derive it?

Problems:

In a Poisson process the average number of occurrences is $\lambda$ per minute. Independent 1 minute intervals are observed until the first minute with no occurrences is found. Let be the number of 1 minute intervals required, including the last one. Find the probability function, .
Calls arrive at a telephone distress centre during the evening according to the conditions for a Poisson process. On average there are 1.25 calls per hour.
1. Find the probability there are no calls during a 3 hour shift.
2. Give an expression for the probability a person who starts working at this centre will have the first shift with no calls on the 15 $^{\QTR{rm}{th}}$ shift.
3. A person works one hundred 3 hour evening shifts during the year. Give an expression for the probability there are no calls on at least 4 of these 100 shifts. Calculate a numerical answer using a Poisson approximation.

Summary of Single Variable Discrete Models

Name Probability Function

Discrete Uniform =

Hypergeometric =

Binomial =

Negative Binomial =

Geometric =

Poisson =

Appendix: Software

The software system is a powerful tool for handling probability distributions and data concerning random variables. The following short notes describe basic features of ; further information and links to other resources are available on the course web page. Unix and Windows versions of are available on Math Faculty undergraduate servers, and free copies can be downloaded from the web.

You should make yourself familiar with , since some problems (and most applications of probability) require computations or graphics which are not feasible by hand.

Some R Basics

R is a statistical software system that has excellent numerical,

graphical and statistical capabilities. There are Unix and Windows

versions. These notes are a very brief introduction to a few of the

features of R. Web resources have much more information. Links can

be found on the Stat 230 web page. You can also download a Unix or

Windows version of R for free.

1.PRELIMINARIES

R is invoked on Math Unix machines by typing R. The R prompt

is >. R objects include variables, functions, vectors, arrays, lists and

other items. To see online documentation about something, we use the help

function. For example, to see documentation on the function mean(), type

help(mean). In some cases help.search() is helpful.

The assignment symbol is <- : for example,

      x<- 15   assigns the value 15 to variable x.

To quit an R session, type  q()

2.VECTORS

Vectors can consist of numbers or other symbols; we will consider only

numbers here. Vectors are defined using c(): for example,

   x<- c(1,3,5,7,9)

defines a vector of length 5 with the elements given. Vectors and other

classes of objects possess certain attributes. For example, typing

length(x) will give the length of the vector x. Vectors are a convenient

way to store values of a function (e.g. a probability function or a c.d.f)

or values of a random variable that have been recorded in some experiment

or process.

3.ARITHMETIC

The following R commands and responses should explain arithmetic

operations.

> 7+3

[1] 10

> 7*3

[1] 21

> 7/3

[1] 2.333333

> 2^3

[1] 8

4.SOME FUNCTIONS

Functions of many types exist in R. Many operate on vectors in a

transparent way, as do arithmetic operations. (For example, if x and y

are vectors then x+y adds the vectors element-wise; thus x and y must

be the same length.) Some examples, with comments, follow.

> x<- c(1,3,5,7,9)   # Define a vector x

> x            # Display x

[1] 1 3 5 7 9

> y<- seq(1,2,.25)    #A useful function for defining a vector whose

                      elements are an arithmetic progression

> y

[1] 1.00 1.25 1.50 1.75 2.00

> y[2]   # Display the second element of vector y

[1] 1.25

> y[c(2,3)]    # Display the vector consisting of the second and

                   third elements of vector y.

[1] 1.25 1.50

> mean(x)      #Computes the mean of the elements of vector x

[1] 5

> summary(x)    # A useful function which summarizes features of

                   a vector x

 Min. 1st Qu. Median Mean 3rd Qu. Max.

    1       3      5    5       7    9

> var(x)    # Computes the (sample) variance of the elements of x

[1] 10

> exp(1)    # The exponential function

[1] 2.718282

> exp(y)

[1] 2.718282 3.490343 4.481689 5.754603 7.389056

> round(exp(y),2)   # round(y,n) rounds the elements of vector y to

                       n decimals

[1] 2.72 3.49 4.48 5.75 7.39

> x+2*y

[1]  3.0  5.5  8.0 10.5 13.0

5. GRAPHS

To open a graphics window in Unix, type x11(). Note that in R, a graphics

window opens automatically when a graphical function is used.

There are various plotting and graphical functions. Two useful ones

are

plot(x,y)  # Gives a scatterplot of x versus y; thus x and y must

              be vectors of the same length.

hist(x)    # Creates a frequency histogram based on the values in

              the vector x. To get a relative frequency histogram

              (areas of rectangles sum to one) use hist(x,prob=T).

Graphs can be tailored with respect to axis labels, titles, numbers of

plots to a page etc. Type help(plot), help(hist) or help(par) for some

information.

To save/print a graph in R using UNIX, you generate the graph you would

like to save/print in R using a graphing function like plot() and type:

dev.print(device,file="filename")

where device is the device you would like to save the graph to (i.e. x11)

and filename is the name of the file that you would like the graph saved

to. To look at a list of the different graphics devices you can save to,

type help(Devices).

To save/print a graph in R using Windows, you can do one of two things.

a) You can go to the File menu and save the graph using one of several

   formats (i.e. postscript, jpeg, etc.). It can then be printed. You

   may also copy the graph to the clipboard using one of the formats

   and then paste to an editor, such as MS Word. Note that the graph

   can be printed directly to a printer using this option as well.

b) You can right click on the graph. This gives you a choice of copying

   the graph and then pasting to an editor, such as MS Word, or saving

   the graph as a metafile or bitmap. You may also print directly to a

   printer using this option as well.

6.DISTRIBUTIONS

There are functions which compute values of probability or probability

density functions, cumulative distribution functions, and quantiles for

various distributions. It is also possible to generate (pseudo) random

samples from these distributions. Some examples follow for Binomial and

Poisson distributions. For other distribution information, type

help(rhyper), help(rnbinom) and so on. Note that R does not have any

function specifically designed to generate random samples from a discrete

uniform distribution (although there is one for a continous uniform

distribution). To generate n random samples from a discrete UNIF(a,b), use

sample(a:b,n,replace=T).

> y<- rbinom(10,100,0.25)    # Generate 10 random values from the Binomial

                               distribution Bi(100,0.25). The values are

                               stored in the vector y.

> y    # Display the values

 [1] 24 24 26 18 29 29 33 28 28 28

> pbinom(3,10,0.5)     # Compute P(Y<=3) for a Bi(10,0.5) random variable.

[1] 0.171875

> qbinom(.95,10,0.5)   # Find the .95 quantile (95th percentile) for

[1] 8                    Bi(10,0.5).

> z<- rpois(10,10)    # Generate 10 random values from the Poisson

                        distribution Poisson(10). The values are stored in the

                        vector z.

> z   # Display the values

 [1]  6  5 12 10  9  7  9 12  5  9

> ppois(3,10)     # Compute P(Y<=3) for a Poisson(10) random variable.

[1] 0.01033605

> qpois(.95,10)   # Find the .95 quantile (95th percentile) for

[1] 15              Poisson(10).

To illustrate how to plot the probability function for a random variable,

a Bi(10,0.5) random variable is used.

# Assign all possible values of the random variable, X ~ Bi(10,0.5)

x <- seq(0,10,by=1)

# Determine the value of the probability function for possible values of X

x.pf <- dbinom(x,10,0.5)

# Plot the probability function

barplot(x.pf,xlab="X",ylab="Probability Function",

names.arg=c("0","1","2","3","4","5","6","7","8","9","10"))

Problems on Chapter 6

Suppose that the probability a person born in 1950 lives at least to certain ages is as given in the table below.

: 30 50 70 80 90

Females .980 .955 .910 .595 .240

Males .960 .920 .680 .375 .095

	:	30	50	70	80	90
Females		.980	.955	.910	.595	.240
Males		.960	.920	.680	.375	.095

If a female lives to age 50, what is the probability she lives to age 80? To age 90? What are the corresponding probabilities for males?
If 51% of persons born in 1950 were male, find the fraction of the total population (males and females) that will live to age 90.

Let be a non-negative discrete random variable with cumulative distribution function
- Find the probability function of .
- Find the probability of the event ; the event $X \geq5$ .
Two balls are drawn at random from a box containing ten balls numbered . Let random variable be the larger of the numbers on the two balls and random variable be their total.
- Tabulate the p.f. of and of if the sampling is without replacement.
- Repeat (a) if the sampling is with replacement.
Let have a geometric distribution with . Find the probability function of , the remainder when is divided by 4.
- Todd decides to keep buying a lottery ticket each week until he has 4 winners (of some prize). Suppose 30% of the tickets win some prize. Find the probability he will have to buy 10 tickets.
- A coffee chain claims that you have a 1 in 9 chance of winning a prize on their ``roll up the edge" promotion, where you roll up the edge of your paper cup to see if you win. If so, what is the probability you have no winners in a one week period where you bought 15 cups of coffee?
- Over the last week of a month long promotion you and your friends bought 60 cups of coffee, but there was only 1 winner. Find the probability that there would be this few (i.e. 1 or 0) winners. What might you conclude?
An oil company runs a contest in which there are 500,000 tickets; a motorist receives one ticket with each fill-up of gasoline, and 500 of the tickets are winners.
- If a motorist has ten fill-ups during the contest, what is the probability that he or she wins at least one prize?
- If a particular gas bar distributes 2,000 tickets during the contest, what is the probability that there is at least one winner among the gas bar's customers?
Jury selection. During jury selection a large number of people are asked to be present, then persons are selected one by one in a random order until the required number of jurors has been chosen. Because the prosecution and defense teams can each reject a certain number of persons, and because some individuals may be exempted by the judge, the total number of persons selected before a full jury is found can be quite large.
- Suppose that you are one of 150 persons asked to be present for the selection of a jury. If it is necessary to select 40 persons in order to form the jury, what is the probability you are chosen?
- In a recent trial the numbers of men and women present for jury selection were 74 and 76. Let be the number of men picked for a jury of 12 persons. Give an expression for , assuming that men and women are equally likely to be picked.
- For the trial in part (b), the number of men selected turned out to be two. Find $P(Y \leq2)$ . What might you conclude from this?
A waste disposal company averages 6.5 spills of toxic waste per month. Assume spills occur randomly at a uniform rate, and independently of each other, with a negligible chance of 2 or more occurring at the same time. Find the probability there are 4 or more spills in a 2 month period.
Coliform bacteria are distributed randomly and uniformly throughout river water at the average concentration of one per twenty cubic centimetres of water.
- What is the probability of finding exactly two coliform bacteria in a 10 cubic centimetre sample of the river water?
- What is the probability of finding at least one coliform bacterium in a 1 cubic centimetre sample of the river water?
- In testing for the concentration (average number per unit volume) of bacteria it is possible to determine cheaply whether a sample has any (i.e. 1 or more) bacteria present or not. Suppose the average concentration of bacteria in a body of water is $\lambda$ per cubic centimetre. If 10 independent water samples of 10 c.c. each are tested, let the random variable be the number of samples with no bacteria. Find .
- Suppose that of 10 samples, 3 had no bacteria. Find an estimate for the value of $\lambda$ .
In a group of policy holders for house insurance, the average number of claims per 100 policies per year is $\lambda= 8.0$ . The number of claims for an individual policy holder is assumed to follow a Poisson distribution.
- In a given year, what is the probability an individual policy holder has at least one claim?
- In a group of 20 policy holders, what is the probability there are no claims in a given year? What is the probability there are two or more claims?
Assume power failures occur independently of each other at a uniform rate through the months of the year, with little chance of 2 or more occurring simultaneously. Suppose that 80% of months have no power failures.
- Seven months are picked at random. What is the probability that 5 of these months have no power failures?
- Months are picked at random until 5 months without power failures have been found. What is the probability that 7 months will have to be picked?
- What is the probability a month has more than one power failure?
- Let , and keep $p = \frac{r}{N}$ fixed. (e.g. If doubles then also doubles.) Prove that .
- What part of the chapter is this related to?
Spruce budworms are distributed through a forest according to a Poisson process so that the average is $\lambda$ per hectare.
- Give an expression for the probability that at least 1 of one hectare plots contains at least spruce budworms.
- Discuss briefly which assumption(s) for a Poisson process may not be well satisfied in this situation.
A person working in telephone sales has a 20% chance of making a sale on each call, with calls being independent. Assume calls are made at a uniform rate, with the numbers made in non-overlapping periods being independent. On average there are 20 calls made per hour.
- Find the probability there are 2 sales in 5 calls.
- Find the probability exactly 8 calls are needed to make 2 sales.
- If 8 calls were needed to make 2 sales, what is the probability there was 1 sale in the first 3 of these calls?
- Find the probability of 3 calls being made in a 15 minute period.
A bin at a hardware store contains 35 forty watt lightbulbs and 70 sixty watt bulbs. A customer wants to buy 8 sixty watt bulbs, and withdraws bulbs without replacement until these 8 bulbs have been found. Let be the number of 40 watt bulbs drawn from the bin. Find the probability function, .
During rush hour the number of cars passing through a particular intersection has a Poisson distribution with an average of 540 per hour.
- Find the probability there are 11 cars in a 30 second interval and the probability there are 11 or more cars.
- Find the probability that when 20 disjoint 30 second intervals are studied, exactly 2 of them had 11 cars.
- We want to find 12 intervals having 11 cars in 30 seconds.
  - Give an expression for the probability 1400 30 second intervals have to be observed to find the 12 having the desired traffic flow.
  - Use an approximation which involves the Poisson distribution to evaluate this probability and justify why this approximation is suitable.
- Bubbles are distributed in sheets of glass, as a Poisson process, at an intensity of 1.2 bubbles per square metre. Let be the number of sheets of glass, in a shipment of sheets, which have no bubbles. Each sheet is 0.8m $^{2}$ . Give the probability function of .
- The glass manufacturer wants to have at least 50% of the sheets of glass with no bubbles. How small will the intensity $\lambda$ need to be to achieve this?

Random variable takes values 1,2,3,4,5 and has c.d.f.

0 1 2 3 4 5

0 .1 .2 .5 $4k^{2}$

	0	1	2	3	4	5
	0	.1	.2	.5		$4k^{2}$

Find and . Draw a histogram of .

Let random variable have a geometric distribution for $y = 0,1,2,... \ \$ .
- Find an expression for $P(Y \geq y)$ , and show that for all non-negative integers , .
- What is the most probable value of ?
- Find the probability that is divisible by 3.
- Find the probability function of random variable , the remainder when is divided by 3.
Polls and Surveys. Polls or surveys in which people are selected and their opinions or other characteristics are determined are very widely used. For example, in a survey on cigarette use among teenage girls, we might select a random sample of girls from the population in question, and determine the number who are regular smokers. If is the fraction of girls who smoke, then $X \sim B i (n, p)$ . Since is unknown (that is why we do the survey) we then estimate it as $\hat{p} = X / n$ . (In probability and statistics a ``hat" is used to denote an estimate of a model parameter based on data.) The binomial distribution can be used to study how ``good" such estimates are, as follows
- Suppose and . Find the probability . Many surveys try to get an estimate which is within 3% (.03) of with high probability. What do you conclude here?
- Repeat the calculation in (a) if and . What do you conclude?
- If instead of .3, find when and 1000.
- Your employer asks you to design a survey to estimate the fraction of persons age 25-34 who download music via the internet. The objective is to get an estimate accurate to within 3%, with probability close to .95. What size of sample would you recommend?
Telephone surveys. In some ``random digit dialing" surveys, a computer phones randomly selected telephone numbers. However, not all numbers are ``active" (belong to a telephone account) and they may belong to businesses as well as to individual or residences.

Suppose that for a given large set of telephone numbers, 57% are active residential or individual numbers. We will call these ``personal" numbers.

Suppose that we wish to interview (over the phone) 1000 persons in a survey.
- Suppose that the probability a call to a personal number is answered is .8, and that the probability the person answering agrees to be interviewed is .7. Give the probability distribution for , the number of calls needed to obtain 1000 interviews.
- Use R software to find $P(X \leq x)$ for the values .
- Suppose instead that 3200 randomly selected numbers were dialed. Give the probability distribution for , the number of interviews obtained, and find $P(Y \geq1000)$ .
(Note: The R functions pnbinom and pbinom give negative binomial and binomial probabilities, respectively.)

Discrete	Continuous
number of people in a car	total weight of people in a car
number of cars in a parking lot	distance between cars in a parking lot
number of phone calls to 911	time between calls to 911.

Binomial:	Know the number, , of repetitions in advance. Don't know the number of successes we'll obtain until after the experiment.
Negative Binomial:	Know the number, , of successes in advance. Don't know the number of repetitions needed until after the experiment.

(1)	are independent
(2)	have 2 distinct types of outcome ( and )
(3)	have the same probability of ``success'' each time.

Name	Probability Function
Discrete Uniform		=

Hypergeometric		=

Binomial		=

Negative Binomial		=

Geometric		=

Poisson		=