6. Discrete Random Variables and Probability Models

Random Variables and Probability Functions

Probability models are used to describe outcomes associated with random processes. So far we have used sets $A,B,C, \dots$ in sample spaces to describe such outcomes. In this chapter we introduce numerical-valued variables $X, Y, \dots$ to describe outcomes. This allows probability models to be manipulated easily using ideas from algebra, calculus, or geometry.

A random variable (r.v.) is a numerical-valued variable that represents outcomes in an experiment or random process. For example, suppose a coin is tossed 3 times; then MATH would be a random variable. Associated with any random variable is a range or domain $A$, which is the set of possible values for the variable. For example, the random variable $X$ defined above has range $A=\{0,1,2,3\}$.

Random variables are denoted by capital letters like $X,Y,\dots$ and their possible values are denoted by $x,y,\dots$. This gives a nice short-hand notation for outcomes: for example, $``X=2'$ in the experiment above stands for ``2 heads occurred''.

Random variables are always defined so that if the associated random process or experiment is carried out, then one and only one of the outcomes $``X = x' (x \in A)$ occurs. In other words, the possible values for $X$ form a partition of the points in the sample space for the experiment. In more advanced mathematical treatments of probability, a random variable is defined as a function on a sample space, as follows:

Definition

A random variable is a function that assigns a real number to each point in a sample space $S$.

To understand this definition, consider the experiment in which a coin is tossed 3 times, and suppose that we used the sample space MATH Then each of the outcomes $``X=x'$ (where $X$ = number of heads) represents an event (either simple or compound) and so a real number $x$ can be associated with each point in $S$. In particular, the point $HHH$ corresponds to $x=3$; the points $THH,HTH,HHT$ correspond to $x=2$; the points $HTT,THT,TTH$ correspond to $x=1$; and the point $TTT$ corresponds to $x=0$.

As you may recall, a function is a mapping of each point in a domain into a unique point. e.g. The function $y = x^{3}$ maps the point $x = 2 $ in the domain into the point $y = 8$ in the range. We are familiar with this rule for mapping being defined by a mathematical formula. However, the rule for mapping a point in the sample space (domain) into the real number in the range of a random variable is most often given in words rather than by a formula. As mentioned above, we generally denote random variables, in the abstract, by capital letters $(X,Y$, etc.) and denote the actual numbers taken by random variables by small letters $(x,y$, etc.). You may, in your earlier studies, have had this distinction made between a function $(X)$ and the value of a function $(x)$.

Since $``X=x'$ represents an outcome of some kind, we will be interested in its probability, which we write as $P(X=x)$. To discuss probabilities for random variables, it is easiest if they are classified into two types, according to their ranges:

Discrete r.v.'s take integer values or, more generally, values in a countable set (recall that a set is countable if its elements can be placed in a one-one correspondence with a subset of the positive integers).

Continuous r.v.'s take values in some interval of real numbers.


Examples might be:

Discrete Continuous
number of people in a car total weight of people in a car
number of cars in a parking lot distance between cars in a parking lot
number of phone calls to 911 time between calls to 911.

In theory there could also be mixed r.v.'s which are discrete-valued over part of their range and continuous-valued over some other portion of their range. We will ignore this possibility here and concentrate first on discrete r.v.'s. Continuous r.v.'s are considered in Chapter 9.

Our aim is to set up general models which describe how the probability is distributed among the possible values a random variable can take. To do this we define for any discrete random variable $X$ the probability function.

Definition

The probability function (p.f.) of a random variable $X$ is the function MATH

The set of pairs MATH is called the probability distribution of $X$. All probability functions must have two properties:

  1. $f(x)\geq0$ for all values of $x$ (i.e. for $x\in A$)

  2. MATH

By implication, these properties ensure that $f(x)\leq1$ for all $x $. We consider a few ``toy'' examples before dealing with more complicated problems.


Example 1: Let $X$ be the number obtained when a die is thrown. We would normally use the probability function $f(x)=1/6$ for $x=1,2,3,\cdots,6$. In fact there probably is no absolutely perfect die in existence. For most dice, however, the 6 sides will be close enough to being equally likely that $f(x)=1/6$ is a satisfactory model for the distribution of probability among the possible outcomes.

Example 2: Suppose a "fair" coin is tossed 3 times, and let $X$ be the number of heads occurring. Then a simple calculation using the ideas from earlier chapters gives MATH Note that instead of writing MATH, we have given a simple algebraic expression.

Example 3: Find the value of $k$ which makes $f(x)$ below a probability function.

$x$ 0 1 2 3
$f(x)$ $k$ $2k$ $0.3$ $4k$

Using MATH gives $7k+0.3=1$. Hence $k=0.1$.

While the probability function is the most common way of describing a probability model, there are other possibilities. One of them is by using the cumulative distribution function (c.d.f.).

Definition

The cumulative distribution function of $X$ is the function usually denoted by $F(x)$MATH defined for all real numbers $x$.

In the last example, with $k=0.1$, we have for $x\in A$

$x$ 0 1 2 3
$F(x)$ 0.1 0.3 0.6 1

since, for instance, MATH Similarly, $F(x)$ can be defined for real numbers $x$ not in the range of the random variable, for example MATH The c.d.f. for this example is plotted in Figure cdf.
cdf.eps
A simple cumulative distribution function

In general, $F(x)$ can be obtained from $f(x)$ by the fact that MATH

A c.d.f. $F(x)$ has certain properties, just as a probability function $f(x)$ does. Obviously, since it represents a probability, $F(x)$ must be between 0 and 1. In addition it must be a non-decreasing function (e.g. $P(X\leq8)$ cannot be less than $P(X\leq7)$). Thus we note the following properties of a c.d.f. $F(x)$:

  1. $F(x)$ is a non-decreasing function of $x.$

  2. $0\leq F(x)\leq1$ for all $x.$

  3. MATH and MATH

We have noted above that $F(x)$ can be obtained from $f(x)$. The opposite is also true; for example the following result holds:

If $X$ takes on integer values then for values $x$ such that $x\in A$ and $x-1\in A$, MATH This says that $f(x)$ is the size of the jump in $F(x)$ at the point $x.$

To prove this, just note that MATH

When a random variable has been defined it is sometimes simpler to find its probability function (p.f.) $f(x)$ first, and sometimes it is simpler to find $F(x)$ first. The following example gives two approaches for the same problem.


Example: Suppose that $N$ balls labelled $1,2,\dots,N$ are placed in a box, and $n$ balls $(n\leq N)$ are randomly selected without replacement. Define the r.v. MATH Find the probability function for $X$.


Solution 1: If $X=x$ then we must select the number $x$ plus $n-1$ numbers from the set $\{1,2,\dots,x-1\}$. (Note that this means we need $x\geq n$.) This gives MATH


Solution 2: First find $F(x)=P(X\leq x)$. Noting that $X\leq x$ if and only if all $n$ balls selected are from the set $\{1,2,\dots,x\}$, we get MATH We can now find MATH as before.


Remark: When you write down a probability function or a cumulative distribution function, don't forget to give the range of the function (i.e. the possible values of the random variable). This is part of the function's definition.

Sometimes we want to graph a probability function $f(x)$. The type of graph we will use most is called a (probability) histogram. For now, we'll define this only for r.v.'s whose range is some set of consecutive integers $\{0,1,2,\dots\}$. A histogram of $f(x)$ is then a graph consisting of adjacent bars or rectangles. At each $x$ we place a rectangle with base on $(x-.5,x+.5)$ and with height $f(x)$. In the above Example 3, a histogram of $f(x)$ looks like that in Figure probhist.
probhist1.eps
Probability histogram for MATH

Notice that the areas of these rectangles correspond to the probabilities, so for example $P(1\leq X\leq3)$ is the sum of the area of the three rectangles above the points $1,2,$ and $3.$ In general, probabilities are depicted by areas.


Model Distributions:

Many processes or problems have the same structure. In the remainder of this course we will identify common types of problems and develop probability distributions that represent them. In doing this it is important to be able to strip away the particular wording of a problem and look for its essential features. For example, the following three problems are all essentially the same.

  1. A fair coin is tossed 10 times and the ``number of heads obtained'' $(X)$ is recorded.

  2. Twenty seeds are planted in separate pots and the ``number of seeds germinating'' $(X)$ is recorded.

  3. Twelve items are picked at random from a factory's production line and examined for defects. The number of items having no defects $(X)$ is recorded.

What are the common features? In each case the process consists of ``trials" which are repeated a stated number of times - 10, 20, and 12. In each repetition there are two types of outcomes - heads/tails, germinate/don't germinate, and no defects/defects. These repetitions are independent (as far as we can determine), with the probability of each type of outcome remaining constant for each repetition. The random variable we record is the number of times one of these two types of outcome occurred.

Six model distributions for discrete r.v.'s will be developed in the rest of this chapter. Students often have trouble deciding which one (if any) to use in a given setting, so be sure you understand the physical setup which leads to each one. Also, as illustrated above you will need to learn to focus on the essential features of the situation as well as the particular content of the problem.


Statistical Computing

A number of major software systems have been developed for probability and statistics. We will use a system called $R$, which has a wide variety of features and which has Unix and Windows versions. Appendix 6.1 at the end of this chapter gives a brief introduction to $R$, and how to access it. For this course, $R$ can compute probabilities for all the distributions we consider, can graph functions or data, and can simulate random processes. In the sections below we will indicate how $R$ can be used for some of these tasks.


Problems:

  1. Let $X$ have probability function MATH. Find $c$.

  2. Suppose that 5 people, including you and a friend, line up at random. Let $X$ be the number of people standing between you and your friend. Tabulate the probability function and the cumulative distribution function for $X$.

Discrete Uniform Distribution

We define each model in terms of an abstract ``physical setup", or setting, and then consider specific examples of the setup.

Physical Setup: Suppose $X$ takes values $a,a+1,a+2,\cdots,b$ with all values being equally likely. Then $X$ has a discrete uniform distribution, on $[a,b]$.


Illustrations:

  1. If $X$ is the number obtained when a die is rolled, then $X$ has a discrete uniform distribution with $a = 1$ and $b = 6$.

  2. Computer random number generators give uniform $[1,N]$ variables, for a specified positive integer $N$. These are used for many purposes, e.g. generating lottery numbers or providing automated random sampling from a set of $N$ items.


Probability Function: There are $b-a+1$ values $X$ can take so the probability at each of these values must be $\frac{1}{b-a+1}$ in order that MATH. Therefore MATH

Problem 6.2.1

Let $X$ be the largest number when a die is rolled 3 times. First find the c.d.f., $F(x)$, and then find the probability function, $f(x)$.

Hypergeometric Distribution

Physical Setup: We have a collection of $N$ objects which can be classified into two distinct types. Call one type "success" Note_1 $(S)$ and the other type "failure" $(F)$. There are $r$ successes and $N-r$ failures. Pick $n$ objects at random without replacement. Let $X$ be the number of successes obtained. Then $X$ has a hypergeometric distribution.


Illustrations:

  1. The number of aces $X$ in a bridge hand has a hypergeometric distribution with $N = 52, ~r = 4$, and $n = 13$.

  2. In a fleet of 200 trucks there are 12 which have defective brakes. In a safety check 10 trucks are picked at random for inspection. The number of trucks $X$ with defective brakes chosen for inspection has a hypergeometric distribution with $N=200,r=12,n=10$.


Probability Function: Using counting techniques we note there are MATH points in the sample space $S$ if we don't consider order of selection. There are MATH ways to choose the $x$ success objects from the $r$ available and MATH ways to choose the remaining $(n-x)$ objects from the $(N-r)$ failures. Hence MATH The range of values for $x$ is somewhat complicated. Of course, $x\geq0$. However if the number, $n$, picked exceeds the number, $N-r$, of failures, the difference, $n-(N-r)$ must be successes. So MATH. Also, $x\leq r$ since we can't get more successes than the number available. But $x\leq n$, since we can't get more successes than the number of objects chosen. MATH.


Example: In Lotto 6/49 a player selects a set of six numbers (with no repeats) from the set $\{1,2,\dots,49\}$. In the lottery draw six numbers are selected at random. Find the probability function for $X$, the number from your set which are drawn.


Solution: Think of your numbers as the $S$ objects and the remainder as the $F$ objects. Then $X$ has a hypergeometric distribution with $N=49,r=6$ and $n=6$, so MATH For example, you win the jackpot prize if $X=6$; the probability of this is MATH, or about 1 in 13.9 million.


Remark: Hypergeometric probabilities are tedious to compute using a calculator. The $R$ functions $dhyper$ and $phyper$ can be used to evaluate $f(x)$ and the c.d.f $F(x)$. In particular, $dhyper(x,r,N-r,n)$ gives $f(x)$ and $phyper(x,r,N-r,n)$ gives $F(x)$. Using this we find for the Lotto 6/49 problem here, for example, that $f(6)$ is calculated by typing $dhyper(6,6,43,6)$ in $R$, which returns the answer MATH or $1/13,983,186$.

For all of our model distributions we can also confirm that MATH. To do this here we use a summation result from Chapter 5 called the hypergeometric identity. Letting $a=r,b=N-r$ in that identity we get MATH


Problems:

  1. A box of 12 tins of tuna contains $d$ which are tainted. Suppose 7 tins are opened for inspection and none of these 7 is tainted.

    1. Calculate the probability that none of the 7 is tainted for $d = 0,1,2,3$.

    2. Do you think it is likely that the box contains as many as 3 tainted tins?

  2. Derive a formula for the hypergeometric probability function using a sample space in which order of selection is considered.

Binomial Distribution

Physical Setup:

Suppose an "experiment" has two types of distinct outcomes. Call these types "success" $(S)$ and "failure" $(F)$, and let their probabilities be $p$ (for $S$) and $1-p$ (for $F$). Repeat the experiment $n$ independent times. Let $X$ be the number of successes obtained. Then $X$ has what is called a binomial distribution. (We write MATH as a shorthand for "$X$ is distributed according to a binomial distribution with $n$ repetitions and probability $p$ of success".) The individual experiments in the process just described are often called "trials", and the process is called a Bernoulli Note_2 process or a binomial process.


Illustrations:

  1. Toss a fair die 10 times and let $X$ be the number of sixes that occur. Then MATH.

  2. In a microcircuit manufacturing process, 60% of the chips produced work (40% are defective). Suppose we select 25 independent chips and let $X$ be the number that work. Then $X\sim Bi(25,.6)$.


Comment: We must think carefully whether the physical process we are considering is closely approximated by a binomial process, for which the key assumptions are that (i) the probability $p$ of success is constant over the $n$ trials, and (ii) the outcome ($S$ or $F$) on any trial is independent of the outcome on the other trials. For Illustration 1 these assumptions seem appropriate. For Illustration 2 we would need to think about the manufacturing process. Microcircuit chips are produced on "wafers" containing a large number of chips and it is common for defective chips to cluster on wafers. This could mean that if we selected 25 chips from the same wafer, or from only 2 or 3 wafers, that the "trials" (chips) might not be independent.


Probability Function: There are MATH different arrangements of $x$ $S$'s and $(n-x)~F$'s over the $n$ trials. The probability for each of these arrangements has $p$ multiplied together $x$ times and $(1-p)$ multiplied $(n-x)$ times, in some order, since the trials are independent. So each arrangement has probability $p^{x}(1-p)^{n-x}$. MATH Checking that $\sum f(x)=1$: MATH We graph in Figure binomhist the probability function for the Binomial distribution with parameters $n=20$ and $p=0.3.$ Although the formula for $f(x)$ may seem complicated this shape is typically, increasing to a maximum value near $np$ and then decreasing thereafter.
binomhist.eps
The Binomial$(20,0.3)$ probability histogram.




Computation: Many software packages and some calculators give binomial probabilities. In $R$ we use the function $dbinom(x,n,p)$ to compute $f(x)$ and $pbinom(x,n,p)$ to compute the corresponding c.d.f. $F(x)=P(X\leq x)$.


Example Suppose that in a weekly lottery you have probability .02 of winning a prize with a single ticket. If you buy 1 ticket per week for 52 weeks, what is the probability that (a) you win no prizes, and (b) that you win 3 or more prizes?


Solution: Let $X$ be the number of weeks that you win; then $X\sim Bi(52,.02)$. We find

  1. MATH


MATH
(Note that $P(X\leq2)$ is given by the $R$ command $pbinom(2,52,.02)$.)



Comparison of Binomial and Hypergeometric Distributions:

These distributions are similar in that an experiment with 2 types of outcome ($S$ and $F$) is repeated $n$ times and $X$ is the number of successes. The key difference is that the binomial requires independent repetitions with the same probability of $S$, whereas the draws in the hypergeometric are made from a fixed collection of objects without replacement. The trials (draws) are therefore not independent. For example, if there are $n = 10$ $S$ objects and $N - r = 10$ $F$ objects, then the probability of getting an $S$ on draw 2 depends on what was obtained in draw 1. If these draws had been made with replacement, however, they would be independent and we'd use the binomial rather than the hypergeometric model.

If $N$ is large and the number, $n$, being drawn is relatively small in the hypergeometric setup then we are unlikely to get the same object more than once even if we do replace it. So it makes little practical difference whether we draw with or without replacement. This suggests that when we are drawing a fairly small proportion of a large collection of objects the binomial and the hypergeometric models should produce similar probabilities. As the binomial is easier to calculate, it is often used as an approximation to the hypergeometric in such cases.


Example: Suppose we have 15 cans of soup with no labels, but 6 are tomato and 9 are pea soup. We randomly pick 8 cans and open them. Find the probability 3 are tomato.


Solution: The correct solution uses hypergeometric, and is (with $X$ = number of tomato soups picked) MATH If we incorrectly used binomial, we'd get MATH As expected, this is a poor approximation since we're picking over half of a fairly small collection of cans.

However, if we had 1500 cans - 600 tomato and 900 pea, we're not likely to get the same can again even if we did replace each of the 8 cans after opening it. (Put another way, the probability we get a tomato soup on each pick is very close to .4, regardless of what the other picks give.) The exact, hypergeometric, probability is now MATH. Here the binomial probability, MATH is a very good approximation.

Problems:

  1. Megan audits 130 clients during a year and finds irregularities for 26 of them.

    1. Give an expression for the probability that 2 clients will have irregularities when 6 of her clients are picked at random,

    2. Evaluate your answer to (a) using a suitable approximation.

  2. The flash mechanism on camera $A$ fails on 10% of shots, while that of camera $B$ fails on 5% of shots. The two cameras being identical in appearance, a photographer selects one at random and takes 10 indoor shots using the flash.

    1. Give the probability that the flash mechanism fails exactly twice. What assumption(s) are you making?

    2. Given that the flash mechanism failed exactly twice, what is the probability camera $A$ was selected?

Negative Binomial Distribution

Physical Setup:

The setup for this distribution is almost the same as for binomial; i.e. an experiment (trial) has two distinct types of outcome ($S$ and $F$) and is repeated independently with the same probability, $p$, of success each time. Continue doing the experiment until a specified number, $k$, of success have been obtained. Let $X$ be the number of failures obtained before the $k^{\QTR{rm}{th}}$ success. Then $X$ has a negative binomial distribution. We often write $X\sim NB(k,p)$ to denote this.


Illustrations:

  1. If a fair coin is tossed until we get our $5^{\QTR{rm}{th}}$ head, the number of tails we obtain has a negative binomial distribution with $k = 5$ and $p = \frac{1}{2}$.

  2. As a rough approximation, the number of half credit failures a student collects before successfully completing 40 half credits for an honours degree has a negative binomial distribution. (Assume all course attempts are independent, with the same probability of being successful, and ignore the fact that getting more than 6 half credit failures prevents a student from continuing toward an honours degree.)


Probability Function: In all there will be $x+k$ trials ($x$ $F$'s and $k$ $S$'s) and the last trial must be a success. In the first $x+k-1$ trials we therefore need $x$ failures and $(k-1)$ successes, in any order. There are MATH different orders. Each order will have probability $p^{k}(1-p)^{x}$ since there must be $x$ trials which are failures and $k$ which are success. Hence MATH

Note: An alternate version of the negative binomial distribution defines $X$ to be the total number of trials needed to get the $k^{\QTR{rm}{th}}$ success. This is equivalent to our version. For example, asking for the probability of getting 3 tails before the $5^{\QTR{rm}{th}}$ head is exactly the same as asking for a total of 8 tosses in order to get the $5^{\QTR{rm}{th}}$ head. You need to be careful to read how $X$ is defined in a problem rather than mechanically "plugging in" numbers in the above formula for $f(x)$.

Checking that $\sum f(x)=1$ requires somewhat more work for the negative binomial distribution. We first re-arrange the MATH term, MATH Factor a (-1) out of each of the $x$ terms in the numerator, and re-write these terms in reverse order,MATH Then (using the binomial theorem)MATH


Comparison of Binomial and Negative Binomial Distributions

These should be easily distinguished because they reverse what is specified or known in advance and what is variable.

Binomial: Know the number, $n$, of repetitions in advance. Don't know the number of successes we'll obtain until after the experiment.
Negative Binomial: Know the number, $k$, of successes in advance. Don't know the number of repetitions needed until after the experiment.

Example:

The fraction of a large population that has a specific blood type $T$ is .08 (8%). For blood donation purposes it is necessary to find 4 people with type $T$ blood. If randomly selected individuals from the population are tested one after another, then (a) What is the probability $y$ persons have to be tested to get 5 type $T$ persons, and (b) What is the probability that over 80 people have to be tested?


Solution:

Think of a type $T$ person as a success $(S)$ and a non-type $T$ as an $F$. Let $Y$ = number of persons who have to be tested and let $X$ = number of non-type $T$ persons in order to get 5 $S$'s. Then MATH and MATH We are actually asked here about $Y=X+5$. Thus MATH Thus we have the answer to (a) as given above, and
(b) MATH

Note: Calculating such probabilities is easy with $R$. To get $f(x)$ we use $dnbinom(x,k,p)$ and to get $F(x)=P(X\leq x)$ we use $pnbinom(x,k,p)$.


Problems:

  1. You can get a group rate on tickets to a play if you can find 25 people to go. Assume each person you ask responds independently and has a 20% chance of agreeing to buy a ticket. Let $X$ be the total number of people you have to ask in order to find 25 who agree to buy a ticket. Find the probability function of $X$.

  2. A shipment of 2500 car headlights contains 200 which are defective. You choose from this shipment without replacement until you have 18 which are not defective. Let $X$ be the number of defective headlights you obtain.

    1. Give the probability function, $f(x)$.

    2. Using a suitable approximation, find $f(2)$.

Geometric Distribution

Physical Setup:

This is a special case of the negative binomial distribution with $k=1$, i.e., an experiment is repeated independently with two types of outcome ($S$ and $F$) each time, and the same probability, $p$, of success each time. Let $X$ be the number of failures obtained before the first success.


Illustrations:

  1. The probability you win a lottery prize in any given week is a constant $p$. The number of weeks before you win a prize for the first time has a geometric distribution.

  2. If you take STAT 230 until you pass it and attempts are independent with the same probability of a pass each time, then the number of failures would have a geometric distribution. (These assumptions are unlikely to be true for most persons! Why is this?)


Probability Function:

There is only the one arrangement with $x$ failures followed by 1 success. This arrangement has probability MATH which is the same as $f(x)$ for $N B (k = 1, p)$.

Checking that $\sum f(x) = 1$, we will be evaluating a geometric series, MATH

Note: The names of the models so far derive from the summation results which show $f(x)$ sums to 1. The geometric distribution involved a geometric series; the hypergeometric distribution used the hypergeometric identity; both the binomial and negative binomial distributions used the binomial theorem.


Bernoulli Trials
The binomial, negative binomial and geometric models involve trials (experiments) which:

(1) are independent
(2) have 2 distinct types of outcome ($S$ and $F$)
(3) have the same probability of ``success'' $(S)$ each time.

Such trials are known as Bernoulli trials; they are named after an 18th century mathematician.


Problem 6.6.1

Suppose there is a 30% chance of a car from a certain production line having a leaky windshield. The probability an inspector will have to check at least $n$ cars to find the first one with a leaky windshield is .05. Find $n$.

Poisson Distribution from Binomial

The Poisson Note_3 distribution has probability function (p.f.) of the form MATH where $\mu>0$ is a parameter whose value depends on the setting for the model. Mathematically, we can see that $f(x)$ has the properties of a p.f., since $f(x)\geq0$ for $x=0,1,2,\dots$ and since MATH The Poisson distribution arises in physical settings where the random variable $X$ represents the number of events of some type. In this section we show how it arises from a binomial process, and in the following section we consider another derivation of the model.

We will sometimes write MATH to denote that $X$ has the p.f. above.


Physical Setup: One way the Poisson distribution arises is as a limiting case of the binomial distribution as MATH and $p\rightarrow0$. In particular, we keep the product $np$ fixed at some constant value, $\mu$, while letting $n\rightarrow\infty$. This automatically makes $p\rightarrow0$. Let us see what the limit of the binomial p.f. $f(x)$ is in this case.


Probability Function: Since MATH andMATH (For the binomial the upper limit on $x$ is $n$, but we are letting $n\rightarrow\infty$.) This result allows us to use the Poisson distribution with $\mu=np$ as a close approximation to the binomial distribution $Bi(n,p)$ in processes for which $n$ is large and $p$ is small.


Example: 200 people are at a party. What is the probability that 2 of them were born on Jan. 1?


Solution: Assuming all days of the year are equally likely for a birthday (and ignoring February 29) and that the birthdays are independent (e.g. no twins!) we can use the binomial distribution with $n=200$ and $p=1/365$ for $X$ = number born on January 1, giving MATH Since $n$ is large and $p$ is close to 0, we can use the Poisson distribution to approximate this binomial probability, with MATH, giving MATH As might be expected, this is a very good approximation.


Notes:

  1. If $p$ is close to 1 we can also use the Poisson distribution to approximate the binomial. By interchanging the labels ``success'' and ``failure'', we can get the probability of ``success'' (formerly labelled ``failure'') close to 0.

  2. The Poisson distribution used to be very useful for approximating binomial probabilities with $n$ large and $p$ near 0 since the calculations are easier. (This assumes values of $e^{x}$ to be available.) With the advent of computers, it is just as easy to calculate the exact binomial probabilities as the Poisson probabilities. However, the Poisson approximation is useful when employing a calculator without a built in binomial function.

  3. The $R$ functions $dpois(x,\mu)$ and $ppois(x,\mu)$ give $f(x)$ and $F(x)$.


Problem 6.7.1

An airline knows that 97% of the passengers who buy tickets for a certain flight will show up on time. The plane has 120 seats.

  1. They sell 122 tickets. Find the probability that more people will show up than can be carried on the flight. Compare this answer with the answer given by the Poisson approximation.

  2. What assumptions does your answer depend on? How well would you expect these assumptions to be met?

Poisson Distribution from Poisson Process

We now derive the Poisson distribution as a model for the number of events of some type (e.g. births, insurance claims, web site hits) that occur in time or in space. To this end, we use the ``order'' notation MATH as MATH to mean that the function $g$ approaches $0$ faster than $\Delta t$ as $\Delta t$ approaches zero, or that MATH For example MATH but $(\Delta t)^{1/2}$ is not $o(\Delta t).$


Physical Setup:

Consider a situation in which events are occurring randomly over time (or space) according to the following conditions:

  1. Independence: the number of occurrences in non-overlapping intervals are independent.

  2. Individuality: for sufficiently short time periods of length $\Delta t, $ the probability of 2 or more events occurring in the interval is close to zero i.e. events occur singly not in clusters. More precisely, as MATH the probability of two or more events in the interval of length $\Delta t$ must go to zero faster than MATH or that MATH

  3. Homogeneity or Uniformity: events occur at a uniform or homogeneous rate $\lambda$ over time so that the probability of one occurrence in an interval $(t,t+\Delta t)$ is approximately $\lambda\Delta t$ for small $\Delta t$ for any value of $t.$ More precisely, MATH




These three conditions together define a Poisson Process.

Let $X$ be the number of event occurrences in a time period of length $t$. Then it can be shown (see below) that $X$ has a Poisson distribution with $\mu=\lambda t$.

Illustrations:

  1. The emission of radioactive particles from a substance follows a Poisson process. (This is used in medical imaging and other areas.)

  2. Hits on a web site during a given time period often follow a Poisson process.

  3. Occurrences of certain non-communicable diseases sometimes follow a Poisson process.


Probability Function: We can derive the probability function $f(x)=P(X=x)$ from the conditions above. We are interested in time intervals of arbitrary length $t$, so as a temporary notation, let $f_{t}(x)$ be the probability of $x$ occurrences in a time interval of length $t$. We now relate $f_{t}(x)$ and $f_{t+\Delta t}(x)$. From that we can determine what $f_{t}(x)$ is.

To find $f_{t+ \Delta t}(x)$ we note that for $\Delta t$ small there are only 2 ways to get $x$ event occurrences by time $t + \Delta t$. Either there are $x$ events by time $t$ and no more from $t$ to $t + \Delta t $ or there are $x-1$ by time $t$ and 1 more from $t$ to $t + \Delta t$. (MATH, other possibilities are negligible if $\Delta t$ is small). This and condition 1 above (independence) imply that MATH Re-arranging gives MATH.

Taking the limit as MATH we get MATH This ``differential-difference" equation can be ``solved" by using the ``boundary" conditions $f_{0}(0) = 1$ and $f_{0}(x) = 0$ for $x = 1,2,3, \cdots$. You can confirm that MATH satisfies these conditions and the equation above, even though you don't know how to solve the equations. If we let $\mu= \lambda t$, we can re-write $f(x)$ as MATH, which is the Poisson distribution from Section 6.7. That is:

In a Poisson process with rate of occurrence $\lambda$, the number of event occurrences $X$
in a time interval of length $t$ has a Poisson distribution with $\mu=\lambda t$.




Interpretation of $\mu$ and $\lambda$:

$\lambda$ is referred to as the intensity or rate of occurrence parameter for the events. It represents the average rate of occurrence of events per unit of time (or area or volume, as discussed below). Then $\lambda t=\mu$ represents the average number of occurrences in $t$ units of time. It is important to note that the value of $\lambda$ depends on the units used to measure time. For example, if phone calls arrive at a store at an average rate of 20 per hour, then $\lambda=20$ when time is in hours and the average in 3 hours will be $3\times20$ or 60. However, if time is measured in minutes then $\lambda=20/60=1/3$; the average in 180 minutes (3 hours) is still $(1/3)(180)=60$.


Examples:

  1. Suppose earthquakes recorded in Ontario each year follow a Poisson process with an average of 6 per year. The probability that 7 earthquakes will be recorded in a 2 year period is MATH. We have used $\lambda=6$ and $t=2$ to get $\mu=12$.

  2. At a nuclear power station an average of 8 leaks of heavy water are reported per year. Find the probability of 2 or more leaks in 1 month, if leaks follow a Poisson process.

Solution: Assuming leaks satisfy the conditions for a Poisson process and that a month is $1/12$ of a year, we'll use the Poisson distribution with $\lambda=8$ and $t=1/12$, so $\mu=8/12$. Thus MATH


Random Occurrence of Events in Space

The Poisson process also applies when ``events'' occur randomly in space (either 2 or 3 dimensions). For example, the ``events'' might be bacteria in a volume of water or blemishes in the finish of a paint job on a metal surface. If $X$ is the number of events in a volume or area in space of size $v$ and if $\lambda$ is the average number of events per unit volume (or area), then $X$ has a Poisson distribution with $\mu=\lambda v$.

For this model to be valid, it is assumed that the Poisson process conditions given previously apply here, with ``time'' replaced by ``volume'' or ``area''. Once again, note that the value of $\lambda$ depends on the units used to measure volume or area.


Example: Coliform bacteria occur in river water with an average intensity of 1 bacteria per 10 cubic centimeters (cc) of water. Find (a) the probability there are no bacteria in a 20cc sample of water which is tested, and (b) the probability there are 5 or more bacteria in a 50cc sample. (To do this assume that a Poisson process describes the location of bacteria in the water at any given time.)


Solution: Let $X$ = number of bacteria in a sample of volume $v$ cc. Since $\lambda$ = 0.1 bacteria per cc (1 per 10cc) the p.f. of $X$ is Poisson with $\mu=.1v$, MATH Thus we find

  1. With MATH

  2. With MATH and MATH

(Note: we can use the $R$ command $ppois(4,5)$ to get $P(X\leq4)$.)


Exercise: In each of the above examples, how well are each of the conditions for a Poisson process likely to be satisfied?

Distinguishing Poisson from Binomial and Other Distributions

Students often have trouble knowing when to use the Poisson distribution and when not to use it. To be certain, the 3 conditions for a Poisson process need to be checked. However, a quick decision can often be made by asking yourself the following questions:

  1. Can we specify in advance the maximum value which $X$ can take?
    If we can, then the distribution is not Poisson. If there is no fixed upper limit, the distribution might be Poisson, but is certainly not binomial or hypergeometric, e.g. the number of seeds which germinate out of a package of 25 does not have a Poisson distribution since we know in advance that $X \leq25$. The number of cardinals sighted at a bird feeding station in a week might be Poisson since we can't specify a fixed upper limit on $X$. At any rate, this number would not have a binomial or hypergeometric distribution.

  2. Does it make sense to ask how often the event did not occur?
    If it does make sense, the distribution is not Poisson. If it does not make sense, the distribution might be Poisson. For example, it does not make sense to ask how often a person did not hiccup during an hour. So the number of hiccups in an hour might have a Poisson distribution. It would certainly not be binomial, negative Binomial, or hypergeometric. If a coin were tossed until the $3^{\QTR{rm}{rd}}$ head occurs it does make sense to ask how often heads did not come up. So the distribution would not be Poisson. (In fact, we'd use negative binomial for the number of non-heads; i.e. tails.)


Problems:

  1. Suppose that emergency calls to 911 follow a Poisson process with an average of 3 calls per minute. Find the probability there will be

    1. 6 calls in a period of 2$\frac{1}{2}$ minutes.

    2. 2 calls in the first minute of a 2$\frac{1}{2}$ minute period, given that 6 calls occur in the entire period.

  2. Misprints are distributed randomly and uniformly in a book, at a rate of 2 per 100 lines.

    1. What is the probability a line is free of misprints?

    2. Two pages are selected at random. One page has 80 lines and the other 90 lines. What is the probability that there are exactly 2 misprints on each of the two pages?

Combining Models

While we've considered the model distributions in this chapter one at a time, we will sometimes need to use two or more distributions to answer a question. To handle this type of problem you'll need to be very clear about the characteristics of each model. Here is a somewhat artificial illustration. Lots of other examples are given in the problems at the end of the chapter.


Example: A very large (essentially infinite) number of ladybugs is released in a large orchard. They scatter randomly so that on average a tree has 6 ladybugs on it. Trees are all the same size.

  1. Find the probability a tree has $> 3$ ladybugs on it.

  2. When 10 trees are picked at random, what is the probability 8 of these trees have $> 3$ ladybugs on them?

  3. Trees are checked until 5 with $> 3$ ladybugs are found. Let $X$ be the total number of trees checked. Find the probability function, $f(x)$.

  4. Find the probability a tree with $> 3$ ladybugs on it has exactly 6.

  5. On 2 trees there are a total of $t$ ladybugs. Find the probability that $x$ of these are on the first of these 2 trees.


Solution:

  1. If the ladybugs are randomly scattered the most suitable model is the Poisson distribution with $\lambda= 6$ and $v = 1$ (i.e. any tree has a ``volume" of 1 unit), so $\mu= 6$ and MATH

  2. Using the binomial distribution where ``success'' means $> 3$ ladybugs on a tree, we have $n = 10, p = .8488$ and MATH

  3. Using the negative binomial distribution, we need the number of successes, $k$, to be 5, and the number of failures to be $(x - 5)$. Then MATH

  4. This is conditional probability. Let $A$ = { 6 ladybugs} and
    $B$ = { $>3$ ladybugs }. Then MATH

  5. Again we need to use conditional probability.

    MATH

    MATH Use the Poisson distribution to calculate each, with $\mu=6\times2=12$ in the denominator since there are 2 trees.

MATH

Caution: Don't forget to give the range of $x$. If the total is $t$, there couldn't be more than $t$ ladybugs on the $1^{\QTR{rm}{st}}$ tree.


Exercise: The answer to (e) is a binomial probability function. Can you reach this answer by general reasoning rather than using conditional probability to derive it?


Problems:

  1. In a Poisson process the average number of occurrences is $\lambda$ per minute. Independent 1 minute intervals are observed until the first minute with no occurrences is found. Let $X$ be the number of 1 minute intervals required, including the last one. Find the probability function, $f(x)$.

  2. Calls arrive at a telephone distress centre during the evening according to the conditions for a Poisson process. On average there are 1.25 calls per hour.

    1. Find the probability there are no calls during a 3 hour shift.

    2. Give an expression for the probability a person who starts working at this centre will have the first shift with no calls on the 15$^{\QTR{rm}{th}}$ shift.

    3. A person works one hundred 3 hour evening shifts during the year. Give an expression for the probability there are no calls on at least 4 of these 100 shifts. Calculate a numerical answer using a Poisson approximation.

Summary of Single Variable Discrete Models

Name Probability Function
Discrete Uniform $f(x)$ = MATH
Hypergeometric $f(x)$ = MATH
Binomial $f(x)$ = MATH
Negative Binomial $f(x)$ = MATH
Geometric $f(x)$ = MATH
Poisson $f(x)$ = MATH


Appendix: $R$ Software

The $R$ software system is a powerful tool for handling probability distributions and data concerning random variables. The following short notes describe basic features of $R$; further information and links to other resources are available on the course web page. Unix and Windows versions of $R$ are available on Math Faculty undergraduate servers, and free copies can be downloaded from the web.

You should make yourself familiar with $R$, since some problems (and most applications of probability) require computations or graphics which are not feasible by hand.

Some R Basics
 
R is a statistical software system that has excellent numerical,
graphical and statistical capabilities. There are Unix and Windows
versions. These notes are a very brief introduction to a few of the
features of R. Web resources have much more information. Links can
be found on the Stat 230 web page. You can also download a Unix or
Windows version of R for free.
 
 
1.PRELIMINARIES
R is invoked on Math Unix machines by typing R. The R prompt
is >. R objects include variables, functions, vectors, arrays, lists and
other items. To see online documentation about something, we use the help
function. For example, to see documentation on the function mean(), type
help(mean). In some cases help.search() is helpful.
 
The assignment symbol is <- : for example,
      x<- 15   assigns the value 15 to variable x.
 
To quit an R session, type  q()
 
2.VECTORS
Vectors can consist of numbers or other symbols; we will consider only
numbers here. Vectors are defined using c(): for example,
   x<- c(1,3,5,7,9)
defines a vector of length 5 with the elements given. Vectors and other
classes of objects possess certain attributes. For example, typing
length(x) will give the length of the vector x. Vectors are a convenient
way to store values of a function (e.g. a probability function or a c.d.f)
or values of a random variable that have been recorded in some experiment
or process.
 
3.ARITHMETIC
The following R commands and responses should explain arithmetic
operations.
 
> 7+3
[1] 10
> 7*3
[1] 21
> 7/3
[1] 2.333333
> 2^3
[1] 8
 
4.SOME FUNCTIONS
Functions of many types exist in R. Many operate on vectors in a
transparent way, as do arithmetic operations. (For example, if x and y
are vectors then x+y adds the vectors element-wise; thus x and y must
be the same length.) Some examples, with comments, follow.
 
> x<- c(1,3,5,7,9)   # Define a vector x
> x            # Display x
[1] 1 3 5 7 9
> y<- seq(1,2,.25)    #A useful function for defining a vector whose
                      elements are an arithmetic progression
> y
[1] 1.00 1.25 1.50 1.75 2.00
> y[2]   # Display the second element of vector y
[1] 1.25
> y[c(2,3)]    # Display the vector consisting of the second and
                   third elements of vector y.
[1] 1.25 1.50
> mean(x)      #Computes the mean of the elements of vector x
[1] 5
> summary(x)    # A useful function which summarizes features of
                   a vector x
 Min. 1st Qu. Median Mean 3rd Qu. Max.
    1       3      5    5       7    9
> var(x)    # Computes the (sample) variance of the elements of x
[1] 10
> exp(1)    # The exponential function
[1] 2.718282
> exp(y)
[1] 2.718282 3.490343 4.481689 5.754603 7.389056
> round(exp(y),2)   # round(y,n) rounds the elements of vector y to
                       n decimals
[1] 2.72 3.49 4.48 5.75 7.39
> x+2*y
[1]  3.0  5.5  8.0 10.5 13.0
 
5. GRAPHS
To open a graphics window in Unix, type x11(). Note that in R, a graphics
window opens automatically when a graphical function is used.
There are various plotting and graphical functions. Two useful ones
are
 
plot(x,y)  # Gives a scatterplot of x versus y; thus x and y must
              be vectors of the same length.
 
hist(x)    # Creates a frequency histogram based on the values in
              the vector x. To get a relative frequency histogram
              (areas of rectangles sum to one) use hist(x,prob=T).
 
Graphs can be tailored with respect to axis labels, titles, numbers of
plots to a page etc. Type help(plot), help(hist) or help(par) for some
information.
 
To save/print a graph in R using UNIX, you generate the graph you would
like to save/print in R using a graphing function like plot() and type:
 
dev.print(device,file="filename")
 
where device is the device you would like to save the graph to (i.e. x11)
and filename is the name of the file that you would like the graph saved
to. To look at a list of the different graphics devices you can save to,
type help(Devices).
 
To save/print a graph in R using Windows, you can do one of two things.
 
a) You can go to the File menu and save the graph using one of several
   formats (i.e. postscript, jpeg, etc.). It can then be printed. You
   may also copy the graph to the clipboard using one of the formats
   and then paste to an editor, such as MS Word. Note that the graph
   can be printed directly to a printer using this option as well.
 
b) You can right click on the graph. This gives you a choice of copying
   the graph and then pasting to an editor, such as MS Word, or saving
   the graph as a metafile or bitmap. You may also print directly to a
   printer using this option as well.
 
6.DISTRIBUTIONS
There are functions which compute values of probability or probability
density functions, cumulative distribution functions, and quantiles for
various distributions. It is also possible to generate (pseudo) random
samples from these distributions. Some examples follow for Binomial and
Poisson distributions. For other distribution information, type
help(rhyper), help(rnbinom) and so on. Note that R does not have any
function specifically designed to generate random samples from a discrete
uniform distribution (although there is one for a continous uniform
distribution). To generate n random samples from a discrete UNIF(a,b), use
sample(a:b,n,replace=T).
 
> y<- rbinom(10,100,0.25)    # Generate 10 random values from the Binomial
                               distribution Bi(100,0.25). The values are
                               stored in the vector y.
> y    # Display the values
 [1] 24 24 26 18 29 29 33 28 28 28
> pbinom(3,10,0.5)     # Compute P(Y<=3) for a Bi(10,0.5) random variable.
[1] 0.171875
> qbinom(.95,10,0.5)   # Find the .95 quantile (95th percentile) for
[1] 8                    Bi(10,0.5).
 
> z<- rpois(10,10)    # Generate 10 random values from the Poisson
                        distribution Poisson(10). The values are stored in the
                        vector z.
> z   # Display the values
 [1]  6  5 12 10  9  7  9 12  5  9
> ppois(3,10)     # Compute P(Y<=3) for a Poisson(10) random variable.
[1] 0.01033605
> qpois(.95,10)   # Find the .95 quantile (95th percentile) for
[1] 15              Poisson(10).
 
To illustrate how to plot the probability function for a random variable,
a Bi(10,0.5) random variable is used.
 
# Assign all possible values of the random variable, X ~ Bi(10,0.5)
x <- seq(0,10,by=1)
 
# Determine the value of the probability function for possible values of X
x.pf <- dbinom(x,10,0.5)
 
# Plot the probability function
barplot(x.pf,xlab="X",ylab="Probability Function",
names.arg=c("0","1","2","3","4","5","6","7","8","9","10"))


Problems on Chapter 6

  1. Suppose that the probability $p(x)$ a person born in 1950 lives at least to certain ages $x$ is as given in the table below.

    $x$: 30 50 70 80 90
    Females .980 .955 .910 .595 .240
    Males .960 .920 .680 .375 .095

  2. Let $X$ be a non-negative discrete random variable with cumulative distribution function MATH

  3. Two balls are drawn at random from a box containing ten balls numbered $0,1,...,9$. Let random variable $X$ be the larger of the numbers on the two balls and random variable $Y$ be their total.

  4. Let $X$ have a geometric distribution with MATH. Find the probability function of $R$, the remainder when $X$ is divided by 4.

  5. An oil company runs a contest in which there are 500,000 tickets; a motorist receives one ticket with each fill-up of gasoline, and 500 of the tickets are winners.

  6. Jury selection. During jury selection a large number of people are asked to be present, then persons are selected one by one in a random order until the required number of jurors has been chosen. Because the prosecution and defense teams can each reject a certain number of persons, and because some individuals may be exempted by the judge, the total number of persons selected before a full jury is found can be quite large.

  7. A waste disposal company averages 6.5 spills of toxic waste per month. Assume spills occur randomly at a uniform rate, and independently of each other, with a negligible chance of 2 or more occurring at the same time. Find the probability there are 4 or more spills in a 2 month period.

  8. Coliform bacteria are distributed randomly and uniformly throughout river water at the average concentration of one per twenty cubic centimetres of water.

  9. In a group of policy holders for house insurance, the average number of claims per 100 policies per year is $\lambda= 8.0$. The number of claims for an individual policy holder is assumed to follow a Poisson distribution.

  10. Assume power failures occur independently of each other at a uniform rate through the months of the year, with little chance of 2 or more occurring simultaneously. Suppose that 80% of months have no power failures.

  11. Spruce budworms are distributed through a forest according to a Poisson process so that the average is $\lambda$ per hectare.

  12. A person working in telephone sales has a 20% chance of making a sale on each call, with calls being independent. Assume calls are made at a uniform rate, with the numbers made in non-overlapping periods being independent. On average there are 20 calls made per hour.

  13. A bin at a hardware store contains 35 forty watt lightbulbs and 70 sixty watt bulbs. A customer wants to buy 8 sixty watt bulbs, and withdraws bulbs without replacement until these 8 bulbs have been found. Let $X$ be the number of 40 watt bulbs drawn from the bin. Find the probability function, $f(x)$.

  14. During rush hour the number of cars passing through a particular intersection has a Poisson distribution with an average of 540 per hour.

  15. Random variable $X$ takes values 1,2,3,4,5 and has c.d.f.

    $x$ 0 1 2 3 4 5
    $F(x)$ 0 .1$k$ .2 .5$k$ $k$ $4k^{2}$

    Find $k, f(x)$ and MATH. Draw a histogram of $f(x) $.

  16. Let random variable $Y$ have a geometric distribution MATH for $y = 0,1,2,... \ \ $.

  17. Polls and Surveys. Polls or surveys in which people are selected and their opinions or other characteristics are determined are very widely used. For example, in a survey on cigarette use among teenage girls, we might select a random sample of $n$ girls from the population in question, and determine the number $X$ who are regular smokers. If $p$ is the fraction of girls who smoke, then $X \sim B i (n, p)$. Since $p$ is unknown (that is why we do the survey) we then estimate it as $\hat{p} = X / n$. (In probability and statistics a ``hat" is used to denote an estimate of a model parameter based on data.) The binomial distribution can be used to study how ``good" such estimates are, as follows

  18. Telephone surveys. In some ``random digit dialing" surveys, a computer phones randomly selected telephone numbers. However, not all numbers are ``active" (belong to a telephone account) and they may belong to businesses as well as to individual or residences.

    Suppose that for a given large set of telephone numbers, 57% are active residential or individual numbers. We will call these ``personal" numbers.

    Suppose that we wish to interview (over the phone) 1000 persons in a survey.

    (Note: The R functions pnbinom and pbinom give negative binomial and binomial probabilities, respectively.)