\documentstyle[psfig,fullpage]{article}
\title{Scientific Method, Statistical Method, and the Speed of Light.}
\author{R.J. Mackay and R.W. Oldford \thanks{Research supported by the Natural
Sciences and Engineering Research Council of Canada}\\
Department of Statistics and Actuarial Science\\
University of Waterloo}
\begin{document}
\bibliographystyle{plain}
\maketitle
\begin{abstract}
What is ``statistical method''?
Is it the same as ``scientific method''?
This paper answers the first question by specifying the elements and procedures
common to all statistical investigations and organizing these into a single structure.
This structure is illustrated by careful examination of the first scientific
study carried out by A.A. Michelson in 1879.
Our answer to the second question is negative.  To understand this 
a history on the speed of light up to the time of Michelson's study is presented.
The larger history and the details of a single study allow us to place the method
of statistics within the larger context of science.
\end{abstract}
\section{Introduction.}
%{\small
%\input{aristotle-know}
%}
{\small
\input{pearson}
}
{\small
\input{kendall-long}
}
The view that statistics entails the quantitative expression of scientific
method has been around since its birth as a discipline.
Yet statisticians have typically shied away from articulating the relationship between
statistics and scientific method.  Perhaps with very good reason.
For centuries great minds have debated what constitutes
science and its method without resolution (e.g. see \cite{Madden:methods}).
%And in this century historical examination of scientific episodes  (e.g. \cite{Kuhn:rev})
%has shown most definitions of scientific method to be found wanting.
And in this century historical examinations of scientific episodes  (e.g. \cite{Kuhn:rev})
have cast doubt on method in scientific discovery.
One radical position, established by examination of the work of Galileo, is that of the
philosopher Paul Feyerabend who writes of method in science:
{\small
\input{feyerabend1}
}
\noindent Feyerabend then proposed, somewhat facetiously, that the only universal method to
be found in science is ``anything goes.''
Whether Feyerabend's view holds for science in general is debatable;
that it does not hold for statistics is the primary thesis of this paper.

By examining in some detail one particular scientific study, namely A.A. Michelson's
1879 determination of the speed of light \cite{aamich:1880}, we illustrate what we consider to be
the common scientific structure of statistics.
There are several reasons for this choice.

First, physical science is sometimes regarded as presenting a greater challenge to the explication of statistical
method than, say, medical or social science where populations of individuals are obvious.
An early instance is Edgeworth's hesitation in 1884 to describe statistics as the ``Science of Means in general
(including physical observations)'', preferring instead the less ``philosophical'' compromise that
it is the science ``of those Means which are presented by social phenomena'' (\cite{Edge:methods}).


Second, the speed of light in vacuum is a fundamental constant whose
value has become ``known''; in 
1974,
it was {\em defined}
\footnote{
By that time
the determinations had so little variability that it
was considered known to 1 part in $10^9$, and the standard metre could
not be measured to that great a precision.
The second is similarly defined; it is the time taken for
9,192,631,770 periods of the radiation corresponding to the transition
between two hyperfine levels of the atom Cesium-133.
By {\em defining} these two quantities all uncertainty was shifted
to the unit of distance, a metre, now defined to be
distance travelled by light through a vacuum in 1/299792458
second! See \cite{metre:def}.
}
to be 299,792.458 km/s.
So we are in the extremely rare inferential position of ``knowing the answer.''

Third, Michelson wrote up his study at a time when it was possible to publish significant
amounts of detail, permitting others insight into the difficulties he faced and the solutions
he found.

Fourth, the determination of the speed of light has been
(and continues to be) important to science and to technology.
Consequently its history is rich enough to 
provide a backdrop on which large scale questions of the nature
of science can be discussed.

Fifth, the determinations are known in the statistical literature,
first appearing in Stigler's paper (\cite{Stigler:robust}) on robust
estimates of location.  Since then, the data have been incorporated into
some important introductory textbooks (e.g. \cite{MooreMcCabe:text}) and will no doubt
become more commonplace.

Finally, and most importantly, a historical study has the important characteristic of being
based entirely on public material.  Information gathered together into
a single source is information that can be checked against common sources,
that can be improved as new historical material becomes available,
and that can be a common test bed for others to use.
To these ends, we have tried to present the history without reference to method.

The structure which we propose as defining statistical method is described in Section 4.
Scientific method is examined in Section 5 and contrasted with statistical method in
Section 6.
These discussions require separate contexts of differing detail.
A broad historical sweep is necessary to appreciate what can be meant by scientific method.
It is provided in Section 2, where we give a history of the
determination of the speed of light from antiquity to the late 1800s.
The stage thus set, the optics, apparatus, and method of Michelson's first determinations of the speed of
light are described in Section 3.
These provide the details necessary for discussion
of statistical method.
A final section explores what we consider to be important ramifications of our approach.
\section{Historical background.}
The thought of Aristotle (384-322 BC) dominated western science for
nearly two millenia.
So powerful is his cosmology that it compels him to declare that
``$\ldots$ light is due to the presence of something,
but it is not a movement'' (\cite{Aristotle:sense}$446^b25-447^a10$).
No movement, no speed.
And if that were not enough, the argument for finite speed is easily dismissed:
{\small
\input{aristotle.tex}
}
{\noindent This view was echoed by many thinkers in
western history: Augustine (ca 354-430), John Pecham (ca 1230-1292),
Albert the Great (ca 1200-1280),
Thomas Aquinas (ca 1225-1274), and Witelo (ca 1230-ca 1275) to name a few.
So too, the opposite view was argued by some, notably Ibn Al-Haytham
(ca 965-1040) and Roger Bacon (ca 1219-1292).
But without empirical demonstration to the contrary, the case for instantaneous perception
of the source could always be made.
In the absence of data, arguments pro and con were forced to be based on
the contemporary theory of light, or on interpretation of the conflicting views
of ancient authorities, or on established religious doctrines, or on
mathematical arguments that demonstrated the necessity or absurdity of
one of the alternatives \cite{Lindberg:medieval}.}

The debate continued into the beginning of the ``scientific revolution''
of the seventeenth century.\footnote{C.D. Lindberg presents preliminary
evidence of the debate in medieval Europe \cite{Lindberg:medieval}.}
Such giants
as Francis Bacon\footnote{Bacon had doubts about the infinite
speed when considering the great distances that light must travel
from the stars to Earth but found such speed easier to swallow
given the already fantastic speeds at which stars must travel in their
daily orbit about the Earth! See Aphorism 46 of Book II of the Novum Organum
e.g. \cite{Bacon:Novum}}
 (1561-1626), Johannes Kepler (1571-1630),
and Ren\'{e} Descartes (1596-1650), believed the speed to be infinite.

Descartes, for example, likened the transmission of light to that of pushing
on a stiff stick  -- the instant one end (the source) was pushed the other end (the
perception) moved (pp. 258-9 of \cite{Gaukroger:Descartes}).
The analogy is powerful; there is no perceptible movement anywhere
along the stick, no matter how long a stick is used!
Descartes strongly held this view;
when his colleague and scientific mentor, Issac Beeckman
(1588-1637), claimed to have performed an experiment
which demonstrated the speed was finite,
Descartes dismisses the claim saying that if it were true, then
Descartes knows nothing of philosophy and his whole theory would
be refuted!\footnote{From \cite{Descartes:speed} page 307: {\em ``Contra ego,
si quae talis mora sensu perciperetur, totam meam Philosophiam
funditus eversam fore inquiebam.''} A rough translation, due to our
classically trained colleague G.W. Bennett, is
``On the contrary, I would be worried that my entire Philosophy would be
on the point of being completely overturned if any delay of this sort
were to be perceived by the senses.''}
Beeckman and Descartes could not agree on an experiment to resolve the
issue.\footnote{It is doubtful that Beeckman's 1629 experiment \cite{Beeckman:1629}
was successful.  The experiment involved firing a mortar and observing
its' flash in a mirror situated some 1851.85 metres away; the movement of a clock
situated at the side of the mortar would measure the time elapsed.
With today's value, the time for the flash to reach the mirror
and return would be about $\frac{1}{100,000}$ of a second!
Descartes argues that even if Beeckman could detect a delay of $\frac{1}{24}$ of
a pulse beat (or about $\frac{1}{24}$ of a second yielding
a speed of only around 89 km/s), then it should be possible to detect a delay
between the occurrence and perception of a lunar eclipse of about one hour.
The flaws in this argument are discussed in detail in \cite{Descartes:speed}.}

%At least since Aristotle (384-322 BC), many thinkers
%including Johannes Kepler (1571-1630) and Ren\'{e} Descartes (1596-1650)
%believed light's speed to be infinite.
%Galileo Galilei (1564-1642) disagreed:
Among these giants, Galileo Galilei (1564-1642) stands alone
in his disagreement;
he writes
{\small
%\input{/usr/people/rwoldford/admin/courses/st231/notes/cases/light/galileo.tex}
\input{galileo.tex}
}
{\noindent
In the same book, Galileo proposed a demonstration to determine whether light was instantaneous.
It was essentially the same that Beeckman had proposed earlier and drew similar fire from Descartes.
In a letter to the great experimental scientist Marin Mersenne (1588-1647),
dated 11 October 1638, Descartes gave a scathing review\footnote{E.g. ``... his fashion of
writing in dialogues, where he introduces three persons who do nothing but exalt
each of his inventions in turn, greatly assists in [over]pricing his merchandise.''
Page 388 of \cite{Drake:sci-bio}. The substantive criticisms are generally
directed at Galileo's not having identified the causes of the phenomena he investigated.
For most scientists at this time, and particularly for Descartes, that is the whole point of science.}
of Galileo's book; of the proposed demonstration Descartes wrote
``His experiment to know if light is
transmitted in an instant is useless, since eclipses of the moon, related so closely to
calculations made of them, prove this incomparably better than anything that could be tested on earth.''
\footnote{
Page 389 of \cite{Drake:sci-bio}.
This appears to be based on the argument he gave to Beeckman as described in note 5.}
Nevertheless, the demonstration was tried in 1667 by members of the Florentine Academy,
%
%A method was proposed by Galileo in 1638 \cite{Galileo:1638}
%and subsequently tried by the Florentine Academy in 1667,
but without success
\cite{cohen:1940}
-- light was either instantaneous or near enough so as to be too fast
to measure successfully.
}

In 1676 the first empirical evidence of a finite speed was presented.
The Danish astronomer Ole R\"{o}mer (1644-1710), while working on
something entirely different, gathered data and found a discrepancy
which led to the discovery.
Interestingly, this important and purely
scientific discovery came about while working on what we would today call
a very applied problem.

\subsection{Longitude.}
One of the great practical problems of that time was the
determination of longitude, particularly at sea.
This could be done by comparing the local time at sea with the time
at a fixed reference point --- the prime meridian.
If, for example, the local time is determined to be
two hours earlier than the time at the
prime meridian, the location must be 360 $\times$ 2/24 = 30 degrees
longitude west of the prime meridian.

These times can be determined astronomically.
For example local time zero can be defined to be that time when
some star, say Arcturus, is observed to cross the imaginary line
of longitude running directly north-south through the local position;
the corresponding standard time zero would be
that time when the same star crosses the prime meridian.
Stars are far enough away from us
that these two crossings will occur at
different moments of time.
Carefully determined tables of prime meridian crossing times
of various stars would allow navigators
to set their local clock.
To determine the difference between the local clock and the standard
clock, closer astronomical events like an eclipse or occultation
of the moon or a planet can be used.
These events are observed at essentially the same moment
of time whatever the observer's location on Earth.
So comparison of the local time of the close event with its tabulated
standard time would give the time difference necessary to calculate
longitude. 

In 1609, after hearing Flemish reports of a spyglass constructed from
two lenses that would enlarge the image of distant objects,
Galileo set about the design and construction of the first astronomically
useful telescope.\footnote{According to Stillman Drake
(\cite{Drake:disc} page 29), Hans Lipperhey
a lens grinder from the Netherlands is generally assigned credit for the
telescope's invention and applied for its patent in 1608.}
In March of the next year Galileo reported his discovery of the
four principal moons of Jupiter \cite{Galileo:starry}.
For the first time,
here was an orbital system that was demonstrably not centred about
the Earth.
Galileo argued that this was compelling evidence against
the the Ptolemaic system (all celestial
bodies revolve around a fixed Earth) and in favour of the
Copernican sun-centred system.
His public support of the Copernican system as a true
representation of the movement of the planets (as opposed to a convenient
calculational model)
brought Galileo into conflict with those who would interpret certain
Biblical passages literally \cite{Galileo:duchess}.
Some of these people wielded considerable influence
within the Catholic church of Rome;
by order of Pope Urban VIII he was banned from further publication 
and placed under house arrest from 1633 until his death in 1642.
This did not prevent him from continuing his
scientific work.\footnote{Today's visitor to Florence's Museum of Science can find
a glass and ivory case displaying an ironic relic
-- Galileo's bony middle finger pointing heavenward.}

But this momentous scientific
discovery also had commercial potential ---
King Philip III of Spain offered a handsome prize
to anyone who could come up with
a practical method of determining a ship's position
when out of sight of land.
Galileo hit upon the idea of using the predicted times of the eclipses
of Jupiter's moons to provide the common celestial clock
necessary to determine longitude.
In November of 1616 he began negotiations with
Spain for navigational uses of his astronomical discoveries
and in 1617 worked on developing a telescope for use at sea while
continuing his negotiations with Spain \cite{Drake:disc}.
Unfortunately the tables he produced were not accurate enough
for their intended purpose --- the theory at the time
did not account for the perturbations of the moons due to their
mutual interaction \cite{nauthist:1968}.

Although many writers advocated their use at sea,
those who appreciated the practical difficulty of directing a
very long telescope at Jupiter while aboard a lively ship
were skeptical and undoubtedly amused by the proposed
method.
It was never to become successful at sea.
But on land, very accurate determinations of
longitude could be obtained this way and resulted in
a substantial reform of geography in the 17th and 18th centuries.

\subsection{The first evidence.}
In 1671 R\"{o}mer went to Hven, an island community near Copenhagen,
to help redetermine the longitude of the observatory located there.
With others he began observing a series
of eclipses of Io, Jupiter's largest moon.
In the end they
had eight months of observations or, since Io makes one revolution
of Jupiter in 42 hours,
timings on about 140 eclipses over 2/3 of the
year.
The time intervals between these eclipses
were not regular but depended on where the Earth
was in its orbit.
The length of the
interval was shorter when the Earth approached Jupiter than it was when
the Earth moved away from the planet.
The mathematically predicted time of an eclipse was too early if the
Earth was near Jupiter and too late if the Earth was far from Jupiter.
This systematic lack of fit allowed R\"omer to announce in Paris
in September 1676 that the eclipse predicted for November 9 that year
would actually occur 10 minutes later.
The observation bore him out and R\"omer argued that
the discrepancy was due to the finite speed of light --- 
the light takes longer to reach us the farther we are from its source.

From his observations, R\"{o}mer estimated that light takes about twenty-two
minutes to cross the full diameter of Earth's orbit or about eleven minutes
for light from the sun to reach us on Earth.
From this he estimated its speed to be about 214,000 kilometres per
second.\footnote{For more on R\"{o}mer see \cite{Romer:bio}.  For more detail
on this study see \cite{cohen:1940}.}

R\"{o}mer's ``proof'' was not immediately accepted by all.
Alternative explanations were provided by Gian Domenico Cassini (1625-1712)
then also an astronomer at the newly formed Academie des Sciences in Paris.
He had observed inequalities in time intervals that depended on the location
of Jupiter in its own elliptical orbit.
And in 1666 Cassini had published tables on the eclipses of the satellites
of Jupiter from which work he also noticed the discrepancy.
He had briefly considered a finite speed
of light in 1675 but soon rejected it for a more traditional explanation.
Cassini, and later his nephew Giacomo Filippo Maraldi (1665-1729),
suggested that Jupiter's orbit and the motion of its satellites
might explain the observed inequalities
(\cite{Cassini:bio}, \cite{Newcomb:1882} and \cite{Romer:bio}).
Many astronomers continued to hold the view that
light was instantaneous.

It was not until a study by James Bradley (1693-1762)\footnote{See
\cite{Bradley:bio} and \cite{Romer:bio}.}
was reported in 1729 that nearly all agreed that the speed is finite.
Bradley had been studying the parallax of the stars and discovered an annual
variation in the position of stars that could not be explained by the parallax
effect.
However, it could be explained by the motion of the Earth if light's
speed were finite.
Based on careful observations, he estimated that light took 
eight minutes and twelve seconds to reach the Earth from the sun
resulting in a value for light's speed of 301,000 km/sec.

In 1809, based on observations on the eclipses of Jupiter's moons for 150
years, Jean-Baptiste Joseph Delambre (1749-1822) estimated the time
taken by light to travel from the sun to Earth to be eight minutes and
13.2 seconds resulting in a speed of about 300,267.64 $\approx$ 300,300 km/sec.\footnote{
The time here is as reported in \cite{Newcomb:1882}.
To calculate the speed, the distance between the Earth and sun must be known.
In the estimate reported here, the distance used was 148,092,000 km as derived from
Bradley's figures above.}

The results of these early astronomical estimates are summarized in Table
\ref{table:astronomy}.
\begin{table}[ht]
\scriptsize{
\begin{center}
\begin{tabular}{|lllc|}
\hline
Year & Authors & Observational Source & Speed (km/sec) \\
\hline
1676 & R\"{o}mer & Jupiter satellites & 214~000 \\
1726 & Bradley & Aberration of stars & 301~000 \\
1809 & Delambre & Jupiter satellites & 300~300 \\
\hline
\end{tabular}
\end{center}
\caption{Studies based on astronomical observation.}
\label{table:astronomy}
}
\end{table}

Unfortunately, measurements of the speed made in this way depended on the
astronomical theory and observations used.
Simon Newcomb (1835-1909) tells of an inaugural dissertation in 1875 by Glasenapp
whereby observations of the eclipses of Io from 1848 to 1870
show that widely ranging values for the speed
``could be obtained from different classes
of these observations by different hypotheses'' (\cite{Newcomb:1882} page 114).
It was shown that values for the sun to Earth time could be produced between 496 and
501 seconds resulting in
speeds between 295,592.8 $\approx$ 295,600 and 298,572.6 $\approx$ 298,600 km/s.
\footnote{Again, using Bradley's Earth to sun distance.}

Better determinations of the speed might be made if both
source and observer were terrestrial.
Because all would then be accessible, greater control could be exerted
over the study and hence the observations.
But this brings us back to the age old problem:
how could the speed be measured terrestrially?

\subsection{Terrestrial determinations.}
Imagine two people standing at either end of a very long track.
The first uncovers a powerful light source at an appointed time and
the second records the time at which the light is seen.
The length of the track divided by the difference between the start time
and the time the light is perceived would give a
measurement of the speed of light.\footnote{This is essentially the experiment proposed by Isaac
Beeckman to Descartes in 1629.  See footnote ??.}
The trouble, of course, is that light is so fast that the distance must either be
very large or the time taken very small.
Extremely large distances and extremely short time intervals
are very difficult to measure directly.

Matters can be improved some if both observers have light sources
which they cover with a screen.
Time measurement begins when the first observer removes the screen
sending light to the second.
The second light source is uncovered when the
second observer sees the first.
Now when the first observer sees the second light source
he again screens his source.
The time between uncovering and covering the first light source
is a measure of the time light takes to travel twice the
distance between the two observers.
The improvements are obvious: the distance is doubled and a single clock
has replaced two supposedly synchronized clocks.
Here was Galileo's proposed study of 1638; nearly 200 years would
pass before it was improved sufficiently to produce results.

The necessary innovations were introduced by Hippolyte Fizeau (1819-1896).
One innovation was to replace the second person by a fixed flat mirror
whose surface is perpendicular to the beam of light from the source.
If this could be done, then the light beam would be reflected directly
back at its origin and so remove completely
one human source of variation from the system.
The second innovation was to automate the covering and uncovering of
the source, thereby further reducing the variation from the first human source. 
Together, these allowed Fizeau to replace the direct measurement of time with
an indirect measurement of speed.

Rather than measure time between uncovering and covering, Fizeau
could measure the minimum speed that the screen must travel in order to
cover the source at the exact time the light returns.
The trick was to use an accurately machined toothed wheel
placed spinning in front of the source to act as the moving screen.
The teeth would screen the source while the gaps would uncover it
and so the wheel acted just as Galileo's observer.
Any light returning to the source would strike either a tooth or a gap
thus causing a flashing light to return.
The faster the wheel spun, the dimmer this return image would appear.
If the wheel was set spinning fast enough that every beam sent out 
struck a tooth on its way back, no image would be observed.
Twice this speed should produce a continuous beam as the beam sent out
returned through the next available gap.
Three times the speed produces no light, and so on.
This speed of rotation, coupled with the distance travelled
(twice 8,633 metres in Fizeau's setup),
could be transformed into a measure of the speed of light.
In this way, Fizeau produced the
first terrestrial determination of the speed of light in 1849.

Others were quick to build on this monumental achievement.
Only two years later Leon Foucault (1819-1868), a former collaborator of Fizeau,
produced more accurate measurements based on a rotating mirror rather than
a toothed wheel.
\section{Michelson's 1879 determinations of the speed of light}
In November of 1877 Albert Abraham Michelson (1852-1931),
then a twenty-four year old 
ensign in the US Navy and an instructor in physics at the
U.S. Naval Academy in Annapolis Maryland,
hit upon the means to improve Foucault's rotating mirror approach.
Even then, he needed to conduct many preliminary studies before being
confident of an improved value for the speed of light.
In his own words (\cite{aamich:1880} page 115) ``Between this time and March
of the following year a number of preliminary experiments were performed
in order to familiarize myself with the optical arrangements.
Thus far the only apparatus used was such as could be adapted from the
apparatus in the laboratory of the Naval Academy.''

In April he initiated contact with Professor Simon Newcomb (1835-1909)
of the US Navy
(\cite{swenson:1972} page 38)
who was then superintendent of the navy's {\em Nautical Almanac}
and renown in the navy and the scientific community as an astronomer.
Michelson discussed his work and methods with Newcomb.
At this point however, Michelson was still an unknown who would not
be funded by the US Navy for such specialized research.
Fortunately, having married Margaret McLean Heminway in the spring of 1877,
he could turn to a wealthy father-in-law for financial support.
His father-in-law\footnote{Referred to in \cite{aamich:1880} only as
a ``private gentleman''.}
had become deeply interested in
Michelson's preliminary results
and in July of 1878 provided him the \$2000 necessary to purchase the fine
optical instruments to carry out his measurements.
So began a lifelong quest for the speed of light.

\subsection{Optical theory.}
One of the difficulties with having great distances between the
source and the mirror is that the intensity of the light will decrease
with distance.
So as to keep the image as bright as possible, a lens is placed
between the source of the light and the mirror.
If, as in the diagram below,
\begin{figure}[htp]
\centerline{\psfig{figure=point-source.ps,height=.75in,width=5in}}
%\centerline{\psfig{figure=/usr/people/rwoldford/admin/courses/st231/notes/cases/light/point-source.ps,height=.75in,width=5in}}
\caption{S and M are placed at the point-source focus of each other.}
\label{fig:point-source}
\end{figure}
the source, S, and the mirror, M are placed so that a point-source light from
one is focused precisely on the other,
then the return image will be as bright and as crisp as possible.

Note that the distance between L and M
is not equal to that between L and S.
As M moves farther from the lens, S will need to be moved closer in
order for both points to remain at the focus of the other's point source.
This is true provided both points
are beyond the focal length of the lens (that point where
beams of light parallel on one side of the lens
would meet on the other side).

By moving S and M farther apart, all the while keeping each at the other's
point focus, we increase the distance the light must travel and therefore
the time it will take.
Even so, the time taken is exceedingly short and difficult to measure.

Instead of Fizeau's wheel, Foucault
used a rotating mirror interposed between S and L
as in the next diagram.\footnote{According to Newcomb (page 117) this had been
suggested much earlier by Charles Wheatstone (1802-1875)
and tried without success
by Dominique Francois Jean Arago (1786-1853) in 1838.}
\begin{figure}[htbp]
\centerline{\psfig{figure=focus.ps,height=1.5in,width=5in}}
%\centerline{\psfig{figure=/usr/people/rwoldford/admin/courses/st231/notes/cases/light/focus.ps,height=1.5in,width=5in}}
\caption{Interposing a mirror, R, between the source S and the lens L.}
\label{fig:focus}
\end{figure}
Light rays from the source that strike R and proceed through the lens L
will strike M and return to the source S.
If after the light beam first strikes R outbound from S, R can be rotated
\begin{figure}[htp]
\centerline{\psfig{figure=mirror.ps,height=2.0in,width=2in}}
%\centerline{\psfig{figure=/usr/people/rwoldford/admin/courses/st231/notes/cases/light/mirror.ps,height=2.0in,width=2in}}
\caption{Rotating the mirror R causes the returning beam to be deflected.}
\label{fig:deflect}
\end{figure}
before it is struck again by the beam returning from M, then the
returning beam will no longer return exactly to the source S but
will instead be deflected away from S in the direction of the rotation.
The amount of this deflection is exactly twice the angle that R was
rotated.

The solution then is to rotate the mirror fast enough that it changes its
orientation before the light strikes it on return from M.
By rotating it at a constant speed, the amount of deflection will be the
same for all light beams that go through L, strike M and return.
Then for a continuous beam of light from S and a constant high speed of rotation
of R, an image of the source will appear beside S instead of coincident
\begin{figure}[htp]
%\centerline{\psfig{figure=/usr/people/rwoldford/admin/courses/st231/notes/cases/light/displacement.ps,height=1.0in,width=2.6in}}
\centerline{\psfig{figure=displacement.ps,height=1.0in,width=2.6in}}
\caption{The return image I is displaced from the source S by the
rotating mirror R.}
\label{fig:displacement}
\end{figure}
upon it (as shown in Figure
\ref{fig:displacement}).
The faster R rotates the farther the returned image, I, will be displaced from
the source, S.

By carefully measuring the amount of displacement from S to I (see Figure
\ref{fig:displacement}),
and the distance from
S to R, the tangent of the angle of deflection can be determined as
$|$IS$|$/$|$SR$|$.
Together with the fixed speed of rotation, this angle can be used to
measure the time it took light to travel the distance from R to M and back.
Dividing distance by time gives a measurement of the speed of light.

In this arrangement the distances $|$IS$|$ and $|$SR$|$ should be as large
as possible. The distance $|$IS$|$ is maximized by
maximizing the speed of rotation of R and by maximizing the distance $|$RM$|$.
Michelson's principal innovation in Foucault's design allowed
$|$RM$|$ to be very large.
In Foucault's setup, M was spherical with centre at R.
The greatest distance $|$RM$|$ achieved by Foucault
was 20 metres
(page 117 \cite{aamich:1880})
which produced a displacement $|$IS$|$
of only 0.7mm
(page 118 \cite{Newcomb:1882}).
Michelson chose to place the rotating
mirror at the focal point of the lens
which allowed him to
use a flat mirror for M.
That is, R should be placed
at that point where {\em parallel} light beams passing through
the lens from M meet on the other side as in Figure \ref{fig:parallel}.
\begin{figure}[htp]
%\centerline{\psfig{figure=/usr/people/rwoldford/admin/courses/st231/notes/cases/light/parallel.ps,height=0.75in,width=5in}}
\centerline{\psfig{figure=parallel.ps,height=0.75in,width=5in}}
\caption{R at the focal point of L.}
\label{fig:parallel}
\end{figure}
Then if the diameter of M was as large as that of L
any single beam passing from R through L would {\em necessarily} strike
M {\em and return} through L to R {\em whatever the distance between L and M}.
This permitted M to be placed very far away.
The only difficulty is that the farther away M is from L, the closer the
point-source focus S will
be to the focal point R which conflicts
with maximizing the distance between S and R.
This can be remedied somewhat by using a lens of large focal length.

Supposing the speed of rotation to be such that a single image is produced on return
(that is the rotating mirror intercepts the returning beam every time).  *** MORE HERE ABOUT SMEARED AND
MULTIPLE IMAGES
UNTIL WE HAVE THE MIRROR MOVING FAST ENOUGH ****\\
Let $\theta$ denote the angle of deflection then the angle through which the mirror has rotated
is $\theta / 2$.
The angle $\theta$ in degrees is $arctan(|IS|/|IR|)}$.
If the speed of rotation is $f$ measured in cycles per second then the time taken for the light beam to travel
from $S$ to $M$ and back to $I$ should be $\frac{1}{f} \times \frac{\theta /2}{360}$ of a second.
The speed of light transmitted under the conditions of the study is therefore
\[
2 \frac{360 f}{arctan(|IS|/|IR|)} \times (|SR| + 2|RM| + |IR|)
\]
**************NEED THE CALCULATION FORMULA FOR THE SPEED OF LIGHT HERE ************

\subsection{Physical apparatus}

These details are taken from Michelson's description of his study
\cite{aamich:1880}.

``The study would take place on a clear, almost level, stretch along the north
sea-wall of the Naval Academy.  A frame building was erected at the western
end of the line, a plan of which is represented
in''\footnote{\cite{aamich:1880} page 118 } Figure
\ref{fig:room}.
\begin{figure}[htp]
%\centerline{\psfig{figure=/usr/people/rwoldford/admin/courses/st231/notes/cases/light/light-path.ps,height=2.0in}}
\centerline{\psfig{figure=light-path.ps,height=2.0in}}
\caption{Room showing experimental setup.}
\label{fig:room}
\end{figure}
``The building was 45 feet long and 14 feet wide, and raised so that the line
along which the light travelled was about 11 feet above the ground.
A heliostat at H reflected the sun's rays through the slit at S to the revolving
mirror R, thence through a hole in the shutter, through the lens, and to
the distant mirror.''\footnote{{\em Ibid.}}
The heliostat is an instrument used to focus the sun's rays and direct them
in a narrow beam.
Because it is easier than the heliostat to adjust,
a small mirror, F, directs the beam from the heliostat to the slit.

``The lens was mounted in a wooden frame, which was placed on a support moving
on a slide, about 16 feet long, placed about 80 feet from the building.
... The fixed mirror was ... about 7 inches in diameter, mounted in a brass
frame capable of adjustment in a vertical and horizontal plane by screw motion.
.... To facilitate adjustment, a small telescope furnished with cross-hairs was
attached to the mirror by a universal joint.
The heavy frame was mounted on a brick pier, and the whole surrounded by a
wooden case to protect it from the sun.''\footnote{{\em Ibid} page 122.}

Unlike Foucault, a flat mirror was used as the fixed mirror and
a lens of long focal length focused the light
(an eight inch non-achromatic lens with a 150 foot focus).
The lens was placed in position about 80 feet from the building
and the fixed mirror a distance of about 1920 feet from the building.
Each needed to be placed perpendicular to a common central axis
as in Figure \ref{fig:focus}.

Michelson gives no account in \cite{aamich:1880}
of how the lens came to be positioned, but he does
describe the positioning of the mirror in some detail.
First it was placed in position with the reflective surface
facing the hole in the building.
``A theodolite\footnote{A land surveying instrument used to measure
angles.} was placed at about 100 feet in front of the mirror,
and the latter was moved about by the screws till the observer at the theodolite
saw the image of his telescope reflected in the center of the mirror.
Then the telescope attached to the mirror was pointed (without moving
the mirror itself) at a mark on a piece of card-board attached to the
theodolite.''\footnote{{\em Ibid}, page 122.}
In this way the telescope atop the mirror was placed at right angles
to its reflective surface.
``The theodolite was then moved to 1,000 feet, and, if found necessary,
the adjustment\footnote{To the telescope.} repeated.''\footnote{{\em Ibid.}}
With the mirror thus placed, a final adjustment was made by having someone
focus a telescope at the fixed mirror from inside the building.
The mirror was then moved until the observer saw the image of his
telescope reflected centrally in the mirror.
This last adjustment had to be repeated before every series of observations
as the mirror would change its position between morning and evening.

The rotating mirror was a 1.25 inch circular disc (0.2 in. thick)
silvered on one side.
It was held on a vertical spindle that was in turn held in a cast iron frame.
This frame could be tilted side to side and forwards
and backwards by means of small cords.
The spindle had pointed ends which pivoted in
conical sockets in the frame; these were the only contact points between the
frame and the spindle.
The top part of the spindle passed through the centre of a small wheel
inside a circular enclosure attached to the frame.
This wheel held the spindle by friction.
Forcing air into the enclosure, over the surface of the wheel, and out
again in a circular fashion would cause the wheel, and hence the spindle,
to turn.
The spindle would have to be carefully balanced so that it turned smoothly
without wobbling.
The air to power this small turbine came
from a steam-powered pump located in the basement
of the building.
A tube connected the pump to the turbine.
Because
the mirror's rotational speed remains constant only while the pressure from
the pump is constant,
a valve was installed to adjust the pressure and hence the speed.

So as to further increase the distance $|$SR$|$,
the rotating mirror was placed slightly closer to the lens 
than at the focal point of the lens ({\em i.e.} its parallel beam focus).
This would make for a slightly less clear image than having R at the
focus as fewer rays strike and are returned from M.
``A limit is soon reached, however, for the quantity of light received
diminishes rapidly as the revolving mirror approaches the lens.''\footnote{{\em Ibid} page 118.}
This limit is about 15 feet closer to L than is its focal point.
Michelson's previous studies showed that
if R rotates at about 258 revolutions per second, and
the distance $|$SR$|$, or {\em radius},\footnote{Names of variates, like ``radius,''
whose values Michelson recorded 
are italicized here when first mentioned.}
 is about 28.6 feet, then the
deflection should be around 115 mm.

\subsection{Direct measurements}

Michelson made use of several pieces of measurement equipment.

Distances $|$SR$|$ and $|$RM$|$ were measured using a steel tape, nominally 100
feet long.

The {\em displacement} $|$IS$|$ was measured by means of a calibrated
micrometer as shown in Figure \ref{fig:micrometer}.
\begin{figure}[htbp]
%\centerline{\psfig{figure=/usr/people/rwoldford/admin/courses/st231/notes/cases/light/micrometer.ps,height=2.0in,width=2.0in}}
\centerline{\psfig{figure=micrometer.ps,height=2.0in,width=2.0in}}
\caption{Micrometer measures the displacement $|$IS$|$.}
\label{fig:micrometer}
\end{figure}
The source of the light was a narrow vertical {\em slit} that was
fixed in place on the micrometer.
The micrometer had a small telescope that could be moved left to right
using a dial at the right.
Each turn, or {\em screw}, of the dial
would move the telescope some small amount -- in Figure
\ref{fig:micrometer} the horizontal scale shown marks the amount turned.
At the focus of the telescope
lens (about 2 inches), and in nearly the same plane as
the slit, S, was a single vertical silk fibre that served as a vertical
cross-hair for alignment purposes.

By turning the dial the telescope could be positioned so that this fibre
was centred on the returning {\em image} I of the slit.
The amount the telescope had to be moved from its initial position at the slit,
to the position of the image would be the {\em displacement} $|$IS$|$.

The speed of rotation $n$, {\em number of revolutions per second}, of the
revolving mirror was set using an electric tuning fork. The valve from the pump
was opened to rotate the mirror R and make its speed in revolutions per second 
match the frequency of the electric tuning fork in vibrations per second.
The speed and frequency were matched by having a small mirror attached to one arm
of the tuning fork placed so that
some light reflected from the revolving mirror was in turn
reflected by the tuning fork's mirror to produce an image
of the disk of the revolving mirror on apiece of plane glass located
near the lens of the eyepiece of the micrometer.
If the tuning fork and the revolving mirror were not matched,
the image of the revolving mirror would be indistinct on the surface of the glass.
If the tuning fork frequency and the speed of the revolving mirror were the same, 
then the final image appearing on the glass would be distinct.
If the frequency of the fork was half that of the revolving mirror, then
two distinct images were produced, and so on. 

The frequency of the electric tuning fork was measured by counting the
{\em beats per second} between it and a standard tuning fork with known
frequency. A 60 second count period was used. The {\em temperature} was recorded
so as to later determine a correction for temperature of the frequency of the
standard fork. (The results were corrected to a 65 degrees Fahrenheit standard.)

The final result for the speed of the revolving mirror in revolutions
per second is the sum
of the recorded electric tuning fork value, the adjustment for the beat,
and the {\em correction} for temperature.

THIS NEEDS WORK. WHY IS THE FREQUENCY .5*(256.070 +B + COR) ? 

\subsection{Producing one determination}

The distance $|$RM$|$ from the rotating mirror to the fixed mirror was measured
five times, each time allowing for temperature,  and the average used as the
``true distance'' between the mirrors for all determinations. 

On each occasion that the apparatus was to be used, the fire for the pump was
started about a half
hour before measurement began --- after this time there was sufficient
pressure to begin the determinations. 

 The mirror was set revolving until two
distinct images of it were reflected from the electric tuning fork on the glass
surface  near the micrometer eyepiece.

The fixed mirror M was adjusted and the heliostat placed and adjusted. 

The revolving mirror was then adjusted in two different axes.
First it was inclined to the right or left so that the direct reflection of the
light from the slit fell above or below the eyepiece of the micrometer.
Michelson
found that he had to tilt the revolving mirror as ``Otherwise this light would
overpower that which forms the image to be observed.''\footnote{{\em Ibid}.} The
revolving mirror was then adjusted by being moved about, and inclined forward and
backward, till the light was seen reflected back from the distant
mirror.''\footnote{{\em Ibid}, page 124}. Some adjustment in the calculation
would need to be made for the tilting of the mirror.

The distance $|$SR$|$ from the revolving mirror to the cross-hair of the eyepiece
was measured using the steel tape.
 
The vertical cross-hair of the eyepiece of the micrometer
was centred on the slit and its position recorded in terms of the position of
the screw.

The electric tuning fork was started.  The  frequency of the fork was measured
two or three times for each set of observations, by recording 
the number of  beats per second difference from the standard fork.

The temperature was recorded.

The revolving mirror was started and if the image did not appear in the eyepiece,
the mirror was inclined forward or back until it came into sight. The speed of
rotation of the mirror was adjusted until the image of the revolving mirror came
to rest (WHY IS IT TWO BRIGHT DOTS- IS THIS THE SHAPE OF THE SLIT?)

The micrometer eyepiece was moved by turning the screw until its
vertical cross-hair was centred on the return image of the slit.
The number of turns of the screw was recorded.
The displacement is the difference in the two positions.
To express this as the distance $|$IS$|$ in millimetres
the measured number of turns was multiplied by the calibrated number of mm.
per turn of the screw.
The movement of the eyepiece from slit to location of the returned image
was repeated until ten determinations of the distance were made.
The average of the ten measurements and their range were recorded.

The rotating mirror was stopped, the temperature noted and the beats counted.

\section{Statistical Method and Michelson's 1879 Study}

Using the above apparatus and measuring equipment, Michelson began the first
of his many studies to determine the speed of light.
The study was conducted in 1879 and the results were published in 1880.
Here we examine the detail of the study, which is interesting in itself, 
to illustrate what we mean by {\em statistical method}. 

Statistical method, unlike Scientific Method as we will see in the next section,
can be usefully represented as a series of stages - {\em Problem, Plan, Data,
Analysis, Conclusion}. One stage leads to the next and is dependent on previous
stages. Looking back, this means that each stage is carried out and legitimized
(or not) in the context of the stages which precede it (e.g. there is little
value in a Plan that does not address the Problem). In such a case, one of the
two stages must be modified. Looking ahead at any stage, decisions can be made
that will simplify actions taken in a later stage (e.g. a well specified Problem
can be addressed by a simple Plan).

The structure for Statistical Method is useful in two ways: first to provide a
template for actively solving problems empirically and second to review
critically completed studies. All empirical studies implicitly or explicitly
involve all five stages. Below, we use the structure in the second manner to
examine Michelson's 1879 study.

Each stage of Statistical Method comes with its own issues to be understood
and addressed. In the context of  Michelson's study, we introduce language
appropriate to describe these issues.

\subsection{The Problem}
The purpose of this stage is to provide a clear statement of what is to be
learned from the study. 
To do so, it is important to translate the contextual problem under study into
a language that can guide the design and implementation of the subsequent stages
of Statistical Method. Understanding what is to be learned from the study is so
important that it is surprising that it is rarely treated in any introduction to 
Statistical Method ( do we need references here or exceptions?). The cost of this
omission is incalculable.

To execute the Problem stage, issues must be addressed using the following
terminology. 
\begin{enumerate}
\item
{\em Units and Target Population}-these specify the collective to which we are
interested in applying our learning.
\item
{\em Variates} - numerical or categorical values attached to every unit in the
target population (values may differ from unit to unit). 
\item
{\em Population Attributes} - functions that apply to the entire target
population calculated through the variate values on individual units.
\item
{\em Problem Aspect}  - either {\em causative} or {\em descriptive} corresponding
to whether interest lies in investigating the nature of a causative relationship
between two or more variates in the target population.
\end{enumerate}
In 1879, Michelson was keen to determine the speed of white light as it travels
between any two relatively stationary points in a vacuum.  A unit is one
transmission of such light between a source and destination, both located in a
vacuum.. The target population is all such transmissions, before, during and
after 1879. The primary variate of interest, which we call the {\em response
variate}, is the speed of light associated with each transmission.  There are
many other variates attached to  each unit  such as the distance between the two
points, the motion of the points with respect to each other, properties of the
source and so on.  In Michelson's problem, there is no interest in these other
variates. 

The attribute of interest is the speed averaged across all units in the target
population. This example is unusual in that it is believed that there is no
variation in the value of the response variate from unit to unit (did Michelson
think this?).

The problem here is descriptive. The aim is to describe a population attribute.
If Michelson had been attempting to show that the speed of light can be changed
by, for example, having the source move towards the destination, then the Problem
has a causative aspect. It is important to decide the aspect at the problem stage
because of the special requirements of the Plan needed to establish causation.

Careful description of the units and target population,  variates, attributes
and problem aspect provide input to the Plan.

\subsection {Plan}
The purpose of this stage is develop a plan for the collection (and analysis?)
of the data. We propose to break the planning into several sub-stages,  some of
which inevitably overlap. In an active use of PPDAC, some iteration may be
required within the stage before a satisfactory plan is developed.

\subsubsection{Study Units and Population}

The study population is the collective of study units for which the values of
the variates of interest could possibly be determined. The study units may or may
not be part of the target population, as is the case in Michelson's study.
Because the distances required to measure the speed of light were so large, it
was not practical to have the light travel through even a partial vacuum
(footnote here). All of the units in Michelson's study involved the transmission
of light through air at a particular location. The source and destination were a
fixed distance apart and both remained stationary over the course of the study.
Michelson also decided to look at transmission of light at one hour before
sunset  or one hour after sunrise during a few days in June 1879. Within these
constraints, he was free to choose the units on which he would determine the
speed of light. 

The study population and the study units are very different from the target
in this instance.  Michelson recognized that measuring the speed of light in air
would not allow a direct determination of the speed in a vacuum. He planned to
correct the measured values by a factor based on the refractive index of air.

\subsubsection{Selection of Variates To Be Measured}

The Plan must include a step in which we decide what variates we will measure
on each unit selected in the study. 

\noindent{\bf Choosing the Response Variates}

\noindent
{\em Response variates}, corresponding as much as possible to those used to
define attributes of interest in the target population, must be clearly defined. 

Michelson could not measure the speed of light on a unit directly with his
apparatus. Instead, for each determination,  he measured the following response
variates to calculate the speed of light.
\begin{enumerate}
\item
the displacement $d_{'}$ of the image in the slit. This was measured on each unit.
\item
the radius $r$, the distance between the cross-hairs of the slit and the front
face of the rotating mirror (this value was not always determined for units
measured in the same time period but was measured each morning or evening when
units were sampled.)
\item
the number of beats $B$ per second (the beats were counted for 60 seconds)
between the electric $Ut_2$ fork and the standard $Vt_3$. This variate was
determined once for each set of 10 determinations of $d$.
\item
the temperature $T$ measured once for each set of 10 determinations of $d$.
\end{enumerate}

The values of the response variates were combined with several constants
according to the formulae (3) and (4)  on page 133 to produce a value for the
speed of light in air at temperature $T$.

\noindent{\bf Dealing With Explanatory Variates}
\\
\noindent
There are many other variates associated with each unit in the study population.
We call these {\em explanatory} variates that can be used to explain differences
in the response variates from unit to unit in the study population.

It  is important to decide how explanatory variates will be dealt with during
the planning stage. There are three choices: first the study population can be
redefined by holding an explanatory variate fixed or by deliberately setting its
value.  Second, the explanatory variate can be measured for each unit included in
the study and its value possibly utilized in the analysis or third, the
explanatory variate can be ignored completely. The third course of action is
taken if it is known in advance that the explanatory variate is unimportant (i.e.
it does not explain variation in the response variates) or out of ignorance, not
recognizing the presence of the variate.

Reviewing Michelson's apparatus and proposed method, there are many explanatory
variates in the study population that may explain why the measured speed of
light  varies from unit to unit. We have constructed a fishbone diagram (Figure
8),  a useful tool for this task, that lists many of these explanatory variates.
Michelson recognized that it was important to consider these variates and in his
Plan dealt with them in all three ways. For example, he fixed the distance from
the rotating to the fixed mirror, thus further defining the study population. He
also deliberately varied the angle of inclination of the plane of rotation of the
revolving mirror from $tan^{-1}0.02$ in the early determinations to
$tan^{-1}0.015$ in the final twelve sets. He measured a large number of
explanatory variates such as the observer, the day, the quality of the image and
so on. He ignored barometric pressure because "... error due to neglecting
barometric height is exceedingly small".

\subsubsection{Measurement}
A key element of the Plan is to decide how to measure the selected response
and explanatory variates. To determine the value of any variate on a unit, we
call the measuring devices, methods and individuals involved the {\em measurement
process}. Once a measurement process is specified, it is important to understand
its properties. 

We call {\em measurement error} the difference between the value of the variate
determined by the measurement process and the ``true'' value. We define the
properties of the measurement process in terms of repeatedly measuring the same
unit. Two concepts are {\em bias}, an attribute of the measurement process
describing systematic measurement error, and {\em variability}, an attribute of
the measurement process describing the change in the error from one determination
to the next. Bias in a measurement process may lead to conclusions that are
incorrect. Standard statistical practice such as increasing the sample size
provides no remedy. Variability in a measurement process contributes to the
uncertainty in the conclusions. In many applications, an iteration  of PPDAC is
applied to investigate these attributes of the measurement process within the
overall study.

Michelson paid careful attention to the measurement processes he had specified
for his study. Consider, for example, the measurement of the distance between the
two mirrors. To avoid bias, he calibrated a steel tape against a Wurdeman copy of
the standard yard. The calibration used a comparator with two microscopes, one
fixed and one that can be moved towards or away from the fixed microscope by
turning a screw. The distance between the microscopes was set to 1 standard yard.
Then the tape was placed in the comparator so that .1 ft corresponded to the
cross-hairs of the fixed microscope and the length of the first yard of the tape
was determined by rotating the screw until the cross-hairs of the movable
microscope corresponded to 3.1 ft on the tape. This procedure was repeated 33
times to determine the cumulative number of turns of the screw corresponding to
the length of the tape from .1 ft to 99.1 ft. The temperature was recorded so
that an adjustment (unexplained) could be made.

Next, he carried out a separate study to determine the distance corresponding
to 1 turn of the screw of the movable microscope. This was accomplished by
measuring 20 times the number of turns that correspond to 1 mm and then
averaging. It is clear that Michelson appreciated the power of averaging to
reduce variability in measurement. Combining the results of the two studies and
adjusting  for temperature, the corrected length of the 100 ft steel tape was
100.006 ft. 

To measure the distance between the two mirrors (approximately 2000 ft), the
plan was to place lead markers along the ground and use the tape to measure the
distance from one to the next following a carefully defined standard procedure.
The tape was to be placed along the (nearly) level ground and stretched using a
constant weight of 10 lbs. This led Michelson to investigate the stretch of the
tape.

 To adjust for stretch, another small study was conducted in which the tape was
stretched using a 15 lb force and the stretch in mm at 20 ft intervals was
measured.  The data are shown below.
\begin{center}
\begin{tabular}{c c}
Length&Amount of Stretch \\
100&8.0 \\
80&5.0 \\
60&5.0 \\
40&3.5\\
20&1.5 \\
\end{tabular}
\end{center}
The correction, in mm,  for stretch in the tape to measure the distance between
the mirrors is then
\[
correction ~=~ \frac{8.0+5.0+5.0+3.5+1.5}{300}~ times ~100~ times ~ \frac{10}{15}
\]

Converted to feet and multiplied by 20, the overall correction for stretch was
+0.33 feet
In the language we have introduced, for this small study, the study population
using a 15lb force is different from the target population which requires a 10 lb
stretching force. Note also the curious weighted average for estimating the
amount of stretch per foot of tape.

The goal of introducing the corrections for stretch and length of the tape is
to reduce bias in the final measurement of the distance between the two mirrors.
To reduce the variability of the distance measurement, the procedure was repeated
5 times (with corrections for temperature on each). The temperature corrected
measurements varied from 1984.93 to 1985.17 ft. Michelson used the average of the
5 determinations and then corrected for stretch and error in the tape to get his
final measure of distance between the two mirrors. 

The case study is an excellent example of a careful scientist eliminating bias
from his measurement processes by calibration and correction for known systematic
errors and reducing variability by averaging. At the conclusion of his paper,
Michelson provides a careful discussion of the effects of possible bias on his
estimate of the speed of light. It is frightening to realize how often modern
data are produced and analyzed with little consideration for the properties of
the measurement system. 

\subsubsection{The Sampling Protocol}
The {\em sampling protocol} is the procedure used to select units from the
study population to be measured. The goal of the sampling protocol is to select
units that are representative of the study population with respect to the
attribute(s) of interest. The sampling protocol deals with how and when the units
are selected, who makes the selection and how many units are selected.

Michelson decided to sample a number of units one hour after sunrise and one
hour before sunset for a number of days between June 13 and July 2. The units
were selected in groups of 10 with from one to six groups taken per time period.
Units were selected by Michelson and, on two occasions, by his assistants
Lieutenant Nazro and Mr. Clason.  In all, 1000 units were sampled. Over the
course of the sampling, other explanatory variates were manipulated (speed of
rotation of the mirror, the angle of inclination of the rotating mirror etc.)
Michelson recognized the importance of selecting units with different values for
these explanatory variates so that he could verify that they did not effect the
measured velocity of light. Consider, for example, his discussion of observer
bias in the final section of the paper. To deal with this issue, additional sets
of measurements were taken by another observer who was blind to Michelson's
results. There was no systematic difference in the two sets of values. 

We call {\em sampling error} the difference between the attribute of interest
and the corresponding attribute in the sample. There is bias and variability
associated with the sampling protocol. These are properties of the protocol and
not of any particular sample of units. As with the measurement process, bias and
variability are defined in terms of the properties of the sampling error when
repeatedly applying the sampling protocol. These replications are almost always
hypothetical which means that we can describe sampling bias and variability only
through a representation of the sampling protocol by a mathematical model. We
postpone discussion of  this model to the Analysis section although in the active
use of PPDAC, mathematical  models for the potential sampling protocol (and
measurement processes) are used to help with issues such as sample size.

\subsubsection{The Data Collection Protocol}

The {\em data collection protocol} is the procedure for collecting and recording
the data. The goal is to avoid mistakes.  Michelson gives us no indication of how
he planned to record his data. However, the meticulous care he showed elsewhere
in the planning of his study suggests that he would have been especially careful
to ensure that the data were recorded as measured.

In today's context, this step will include consideration of data entry,
file structures, analysis software, and so on, especially for studies in which a
large amount of data is accumulated. 

\subsection{Data}
******* needs fixing *******

The primary purpose of the Data stage is to execute the Plan, noting any
deviations or exceptional occurrences as you proceed. Once the data are collected
and stored, we propose to search for anomalies and to cleanse the data set when
appropriate(refer to Chatfield ?). This is likely to be more profitable in an
active use of PPDAC as questions about the validity of any particular value can
be answered directly by the individuals making and recording the measurements.

As far as we can tell, Michelson  used all of the measurements on the 1000
units that he collected. Unfortunately, he did not report all 1000 data points
but instead gave the average value of the displacement $d$  for the 10
determinations in each set. All recorded explanatory variates were treated as
constant over the set. The values for the measured speed of light in air for each
set and the associated response and explanatory variates are given in Table 2. 

[Table 2, 3 and 4 near here - do we need a more than one table] 

Michelson did not question the validity of his data in print.
[Include paragraph from top of page 21 here] 

[What do we want to do here - search for relationships and anomalies or not?]

\subsection{Analysis}

The purpose of the analysis stage is to use the collected data and information
from the Plan to deal with the questions formulated in the problem step. The form
and formality of the Analysis depends on the complexity of Problem and Plan, the
skill of the analyst and the audience for the documentation of the study, the
amount of variability induced by the Plan, {any other factors?}. We propose the
following general breakdown of the stage:
\\******** needs fixing ***********
\begin{enumerate}
\item
graphical and numerical summaries
\item
modeling of the Plan and data
\item
model fitting and assessment
\item
formal statistical procedures
\end{enumerate}

All sub-stages are directed to addressing the Problem. Michelson limited his
analysis to the calculation of the average of the 100 measured velocities in air,
a numerical summary and an estimate of possible error, a formal procedure. The
error is based on a worse case scenario, combining  probable errors based on the
estimated standard deviations of replicate determinations and maximal systematic
error, based on Michelson's knowledge of his apparatus and the functions used to
calculate the speed of light from the measured response variates.. For more
discussion on the use of probable error, see Stigler ().  

After adjusting for temperature (in air) and correcting to a vacuum, Michelson
concludes his analysis by reporting the speed of light in vacuo  (kilometres per
second ) to be
\[
299944 ~\pm 51
\]  

****************
Although Michelson did not formally propose a model, he carried out numerous
checks that are equivalent to aspects of model assessment  See page 139. For
example, to see if the measured speed of light was systematically influenced by
the distinctness of the image, an explanatory variate, he calculated and compared
the average velocity stratified by distinctness of image.   

Today, we can contemplate any number of ways to analyze the data. For example,
we might construct a histogram and calculate a 5-number summary of the 100
reported values (see figure xx). A simple, tentative model  to describe the
hypothetical replication of the Plan is to let
\[
Y_i~=~\mu~+~R_i, ~~~i~=~1,~ .... ~,~100
\]
where the residuals $R_i$ are assumed to be independent with a gaussian
distribution having mean $0$ and unspecified standard deviation $\sigma$. In the
model, $\mu$ represents the attribute of interest, the speed of light in air and
$R_i$ the variability in the $i$th determination if the Plan were repeated. This
variability is due to the measurement processes and the sampling protocol which
would deliver different units in repeated application. The standard deviation
quantifies the size of the variability. Note that the model explicitly excludes
the possibility of bias. 

This model is fit using least squares which gives $\widehat{\mu} = 299852.3$
and $\widehat{\sigma}= 79.06$. The gaussian assumption can assessed, among many
other ways,  using a quantile plot. 
There is no evidence in the plot against the assumed model. Further checks can
be carried out by plotting the estimated residuals against various explanatory
variates (a modern equivalent to Michelson's approach). 
******** NOT TRUE ************ 
There is no evidence
against the assumed model.

********* The following points from rwo's version should appear\\
$\ldots$
Perhaps the speed depends on some of the explanatory variates that are
not part of its calculation.
For example, has the effect of temperature been successfully removed from
the determinations?
A plot of speed versus temperature is shown in Figure
\ref{fig:speed-temp}.
\begin{figure}[htp]
%\centerline{\psfig{figure=/usr/people/rwoldford/admin/courses/st231/notes/cases/light/speed-temp.ps,height=2.0in}}
\centerline{\psfig{figure=speed-temp.ps,height=2.0in}}
\caption{Adjusted speed of light (jittered) versus temperature.}
\label{fig:speed-temp}
\end{figure}
Because so many values were recorded as identical, the plotted values
had some uniform random noise in the range from -4 to 4 added to their
value; this has the desired visual effect of spreading the points out in
the plot.
A fairly weak increasing trend is discernible in the plot.
However, even this trend depends heavily on the three points in the lower
left corner and so is not likely to alter the result significantly.

In a similar fashion, we should consider a plot of the speed versus
any of the other explanatory variates measured to look for unsuspected
relationships.
Two such relationships that were not considered by Michelson, but would be
routinely considered today are the relationship between the determined
speed and the date of measurement, and that between
the speed and the time of measurement
(morning or evening).
If our model holds, there should be no relationship between the determinations
of light's speed and these explanatory variates.

Figure \ref{fig:speed-day}
\begin{figure}[hbp]
%\centerline{\psfig{figure=/usr/people/rwoldford/admin/courses/st231/notes/cases/light/speed-day.ps,height=2.0in}}
\centerline{\psfig{figure=speed-day.ps,height=2.0in}}
\caption{Adjusted speed of light (jittered) versus day.}
\label{fig:speed-day}
\end{figure}
shows the first, again with the values slightly jittered to allow all
points to be seen as in the previous figure.
There is an apparent decreasing relationship that is only stronger
if outlying values are ignored.
The noticeable exceptions to this relationship appear to be the values
obtained on the last three days.
However, checking with the data as presented in the table, we see that on
the third last day, Michelson inverted the rotating mirror R.
After two days in this position, he inverted it again to get the original
position.
Arguably, these changes affected the process and prior to that time the
study process seemed to be drifting downwards.
This also holds for morning and evening measurements separately.
Clearly, the date is acting as a surrogate for some other lurking variable.
What that could be is not known.

Curiously, in his comparisons of group averages, Michelson
did not compare morning and evening measurements
nor attempt to relate the measurement to the date.
There are other curious and interesting relationships to be found in this
data; we leave further exploration and modelling to the reader.

$\ldots$
****************** end of rwo insert ************

A confidence interval is an example of a formal statistical procedure, based
on the assumed model, that can be used to summarize the uncertainty in the
estimation of $\mu$. Here a $95\%$ confidence interval is
\[
299852.3 ~{\+} ~15.7
\]
Correcting for temperature, following Michelson, and converting to a vacuum,
a $95\%$ confidence interval for the speed of light (km/s) in vacuo is
\[
299944.3~{\+-}~15.7 
\]

Note that the confidence interval is much shorter than that reported by Michelson,
who included both variability and possible bias in his calculation. Other more
complex analyses and model assessment can be made. The above is used to
demonstrate the sub-stages within the Analysis stage of PPDAC. Again it is
evidence of Michelson's precision as a scientist that his analysis so carefully
parallels what can be done today. 

\subsection{Conclusion}

The purpose of the Conclusion stage is to report the results of the study in the
language of the Problem. As well, it provides an opportunity to discuss the
strengths and weaknesses of the Plan, especially in regards to bias and
variability. 

In Michelson's study he concludes by reporting the speed of light (km/s) in vacuo
as 
\[
299944 ~\pm 51
\]  
He then discusses possible ``Objections'' including among others not mentioned
above, uncertainty of the laws of reflection and refraction in media in rapid
rotation, retardation caused by reflection , imperfections in the lens, periodic
variation in friction at the pivots of the rotating mirror and change of speed of
rotation. In each case, he refers back to the Plan and the model assessment to
demonstrate that the objection would have little effect on the estimate of the
speed of light.

In our language, we would start with the reported speed of light based on the
confidence interval. Other than the discussion given by Michelson. We would add
the difference between the target and study populations as a source of possible
bias.  

\section{On method in science.}
When examining the writings of those who have thought long and hard about the nature of science
one finds the same difficulties appearing again and again.\footnote{John
Losee's book \cite{Losee:intro} provides a
reasonable starting point.}
There is, for the most part, a great enthusiasm that science is progressing in some sense,
that we are learning ever more about the world around us, that we are continually solidifying that
knowledge, that our increasingly sophisticated technology is testament to the power of science.
Yet, when pressed, not only can we not agree on the method of science,
we can't quite agree on what science
is, or even whether what it talks about is real!
Looking over the history described in this paper we can get some inkling as to why this state
of affairs persists.

The progress seems real enough, from the question of light's speed being meaningless, to
discussion of whether it is finite or not, to increasing evidence for finite speed, to
ever `better' estimates of its value.
It might seem that scientific knowledge is the conjunction of the facts accumulated so far,
that theories live or die according to their verification or falsification by these facts,
and that, eventually, the truth will be inferred from the collection of facts.

Kuhn's work \cite{Kuhn:rev} describes a framework for this progress --
within a scientific `paradigm' normal science is pursued as a puzzle-solving activity,
this eventually produces anomalies, anomalies accumulate until a crisis is reached, a new paradigm
is somehow introduced , normal science proceeds again, and so on.
For example, normal science was pursued within a paradigm where light was without speed,
astronomical anomalies began to appear, leading ultimately to a theory where light had
a finite speed, whereupon normal science set about solving problems to establish its value.
In a more elaborate history, many such Kuhnian cycles would have been detectable. 

But what about method?
Long ago Aristotle wrote that knowledge, being ``a state of capacity to demonstrate'',
required the teaching of the principles of demonstration and so
the teaching of science necessarily ``$\ldots$
proceeds sometimes through induction and sometimes by deduction''(\cite{Aristotle:Nicomachean}
1139$^b$19 - 36).
But each is tricky to apply -- Francis Bacon, that strongest of proponents
of inductive method, allowed his perception of the incredible speed at which
stars move in their orbit about the Earth to form his inductive base and so concluded that
an infinite speed of light was reasonable;
no lesser talents
than Aristotle and Descartes by pure deduction demonstrated that light could
not possibly have finite speed.
Using induction and deduction in combination as in the
hypothetico-deductive approach is no easier.
It appears explicitly only twice in the above history
-- once by Aristotle to dismiss the argument of Empedocles, and once
by Descartes to dismiss that of Beeckman -- and wrong in both cases!
At various times each of these has been suggested as {\em the} method of science.
 
A slightly different tack is to take one such method and raise it to the status of
a criterion to distinguish science from non-science.
Karl Popper did this in 1934 with the hypothetico-deductive approach.
Contemptuous of the widely held view that the use of inductive methods
distinguished science from non-science, Popper proposed instead that
``it must be possible for an empirical scientific system to be refuted by experience.''
\footnote{\cite{Popper:logic}, page 41. }
That is, to merit the name scientific a theory must be falsifiable;\footnote{In a
paper meant to be a general resource \cite{Good:science},
I.J. Good gives partial prior credit to R.A. Fisher since tests of significance
\cite{Fisher:methods} predate Popper.
This credit seems misplaced -- Popper uses falsifiability as a {\em demarcation criterion}
for science, Fisher does nothing of the sort.}
a decisive experiment which refutes the theory is a crucial falsifying experiment.
By this criterion, the geocentric theory of the universe is scientific being falsifiable
by any orbital system not centred about the Earth; Galileo's discovery of the moons of
Jupiter refuted this theory.
Similarly the scientific theories of light held by Aristotle and Descartes were refuted by
R\"{o}mer's determination of the speed of light.
This criterion is turned into method by having scientists focus on trying to refute theory;
theories are corroborated only by surviving the most stringent of testing.

But normal science is conservative. 
Crucial experiments are typically only recognized as such long after the fact
-- Cassini et al
showed at the time that R\"{o}mer's observations could be accommodated by existing
theory.\footnote{See \cite{Lakatos:meth} pages 71 - 90 for further examples and discussion.}
If theories were thrown out when first refuted, the result would
be chaos.  Instead normal science motors along, sometimes fine tuning its theory
to accommodate the new information,
sometimes patching the theory with auxiliary hypotheses, and sometimes just
tossing the information into the back seat
where Popper's refutations become Kuhn's anomalies.
As the anomalies accumulate, the ride gets rougher and some members of the scientific community
become increasingly uneasy that a crisis is around the corner.

It is here that Kuhn's work is most interesting and most troublesome.
Kuhn likens the transition from one paradigm to the next to that of a gestalt
shift in visual perception.
Like a gestalt shift, a paradigm shift is sudden and without reason.
Unlike a gestalt shift, a paradigm shift does not allow the scientist to switch
between paradigms; no neutral third viewpoint exists from which both paradigms can be seen
-- if there were then this would be the new paradigm.
This is not to say that the new paradigm cannot be reasoned about and justified to some
satisfaction, but rather that it may not be possible to do so by comparing it to the old.
For once the transition is complete, the convert's view of the
field will have changed -- its methods, its concepts, its questions, even its data --
and the old paradigm can only be viewed from the perspective of the new.
In a word, the two paradigms are incommensurate.  Concepts, theory, methods, and data that
are meaningful according to one might not be according to the other.

Consider the concept of light.
According to Aristotle, light required an intervening transparent substance (like air or water);
it could not exist in a vacuum.
Things are transparent, of course, only because they contain a `certain substance' which is `also
found in the eternal upper body' (possibly aether? itself a concept Aristotle tells us he has
changed from that of Anaxagoras\footnote{\cite{Aristotle:heaven} 270$^b$20-25.}).
`Of this substance, light is the activity.' But it is not movement.
Moreover, the visibility in the dark
of bioluminescent plants and animals does {\em not} depend upon light! 
\footnote{See \cite{Aristotle:soul} 418$^a$26 to
419$^b$2 for most of the points made here.}
From this Aristotle says he has explained light.
Not only is Aristotle's concept different from ours, but to really understand what he
means by light we would need to become immersed in his paradigm.
Scientific concepts like light change in irreversible ways; some like aether disappear
altogether -- even after thousands of years of service.

Nor are concepts alone determined by the paradigm. 
So too are the `empirical facts' --
Francis Bacon's data included fantastic speeds for the movement of the stars about the Earth;
Glaseknapp demonstrated that different theory produced different `observed' speeds of light.
Even relatively raw `sense data' can be dependent upon theory.
Soon after Galileo announced the discovery of Jupiter's moons, he had others verify his
observations using his telescopes.
Many could not see the satellites;
those who could see multiple lighted spots could not be certain that these were not
artefacts of the new instrument. 
Only once the optics of telescopes was developed could there be confidence in the verity of the
observations.\footnote{See chapter 9 of \cite{Feyerabend:method}.}
Modern instruments produce observations that are irrevocably `theory laden.'

Paradigm shifts, incommensurability, and theory laden data have all contributed
to what Ian Hacking \cite{Hacking:phil} calls ``a crisis in rationality''  -- at least for
philosophers of science.  Is there such a thing as scientific reasoning?
Are the entities with which science deals real or are they human constructs?
Does it make sense to think that there is in fact an ideal truth to which science might
converge?

\section{And what of statistics?}
When statisticians look at the nature of science, they
see reflected the nature of statistics.\footnote{A notable exception is Pearson's
{\em The Grammar of Science} \cite{pearson:grammar}.}
Deduction becomes probability theory, induction, statistical theory (e.g. 
pp 6-7 of \cite{Barnett:comparative});
scientific method is hypothetico-deductive
(e.g. \cite{Box:science}, \cite{Durbin:pres-rss}, \cite{Nelder:pres-rss}??),
self-evident in statistics through
formal hypothesis testing and model criticism; put it together and you have,
reminiscent of Aristotle,
what George Box has called ``the advancement of learning'' \cite{Box:science}.
But, as the previous section has shown, science is not really like that.
Neither should be our understanding of statistics.\footnote{
Indeed, John Tukey's long battle for the legitimacy of exploratory data analysis might have
been easier if there had been greater sympathy in the statistical research community
for separate contexts for discovery and for justification in science.
E.g. see \cite{Tukey:both}.}

Sure statistical investigation meets with the same issues raised in the previous section
but it can deal with them more easily. This is because has a considerably more focussed domain
of application.  For example,
consider the two old chestnuts of the philosophy of science -- the realist/anti-realist debate and the problem of
induction.

The realist/anti-realist debate concerns whether the entities of science are real or
mere theoretical constructs.
The primary entities of statistical investigation are the units of the {\em study} population
and the values of variates measured on them.
The units and their collective must be determined with sufficient care for it to be
possible to select any individual from the collective.
Sometimes considerable effort must be put into ensuring that measurement systems
return reliable values of the variates they purport to measure.
Within this context, statisticians become scientific realists in Hacking's sense --
if we can select them and take measurements on them, they are real \cite{Hacking:phil};
if we cannot, then statistical investigation ceases.
Whether future scientific study shows the units to be composites of other more `fundamental'
units or that the variates measured are to be interpreted differently
is beside the point.

As regards induction, for statistics the problem can be neatly separated into two pieces.
Ultimately, interests lies in the {\em target} population, as it is nearest
to the broad scientific concerns of the problem.
This population may be infinite, possibly uncountably so, and its definition can
involve phrases like `all units now and {\em in the future}.'
Drawing conclusions about this population will often require
arguments that are extra-statistical for they will be based on the similarities of, and
differences between, the {\em target} population and the {\em study} population.
Such arguments may ultimately be unable to avoid assuming
Hume's `uniformity of nature' principle (\cite{Hume:treatise} page 89) and hence what
philosophers mean by the `problem of induction.'
Such weighty problems dissipate when focus shifts to drawing
conclusions about the {\em study} population.
Such is its definition that
all study populations are finite in size and random selection of units
to form a sample is possible.
Random selection provides the strongest grounds for inductive inference.
When, for whatever reason, random selection has not been employed then either the case that
it has been near enough approximated, or that the sample is itself similar in its attributes
of interest to the study (or target) population must be made.
The latter is much like
making the case for the transfer of conclusions from the {\em study} to the
{\em target} population and so can be just as difficult.
In either case, the arguments will to a large extent be extra-statistical.

The critical reader might suppose that the structure we propose is designed
to relegate all the difficult problems to the realm of the `extra-statistical.'
But this is not sweeping them under the rug.  Just the opposite. They are exposed
as potentially weak links in the chain of inference about which statistics has nothing to say.\footnote{This does not
preclude further statistical studies being carried out to address some of these problems
(e.g. further investigation of study error)}
The five stage structure is a template for any statistical investigation
and so its applicability could be regarded as a demarcation criterion for statistics.
Post-hoc, the structure allows us to identify the strengths and weaknesses in the
statistical argument; in some investigations, even weak arguments may be all that
are available.
Ad hoc, it provides a useful strategy for finding out about populations.


\section{Conclusions}
Statistics is not about the method of science with its paradigm shifts and incommensurability;
it is about investigating phenomena as they relate to populations of units.
As fascinating as the questions raised in Section 5 might be, they are not our questions.
That is a good thing; the empirical evidence to date suggests that they may not be
resolvable.

*****  Should we address somewhere the question  ``Does it always work?''  ********
For example --- Three cases EDA target = study = sample, EDA target different, study=sample, EDA target, study, sample
different.


*****  What about a nod towards teaching?

The five stage PPDAC process with the associated language and sub-stages
provides a good framework for describing investigations such as Michelson's,
especially for people learning the intricacies of Statistics. More importantly,
in actively planning and executing an empirical investigation, we believe that
the framework is very valuable to ensure that important issues are at least
considered.  And this is the case for every statistical investigation.

Karl Pearson had it almost right.  Whatever the case for science, we can say that
the unity of Statistics consists alone in its method, not in its material.
And it is this method that should be given the broadest dissemination.

\section*{Acknowledgements}
Thanks are due to many people for many helpful discussions.
They include our colleagues Greg Bennett and
Winston Cherry of the Department of Statistics and Actuarial
Science,
astronomers Judith Irvin of Queen's University
and Dieter Brookner of Kingston who pointed out Cotter's book
\cite{nauthist:1968} to us,
and Stephen Stigler of the University of Chicago for his
helpful comments on early drafts of this paper.

All quantitative graphics were produced using the Quail statistical software
environment now available on the world-wide web.

\bibliography{research}
\end{document}


Better is the original definition of a crucial experiment  (Bacon) suggesting that its
purpose is to provide sign posts (i.e. different possible directions).

Serendipity will always play a large role.

Ultimately, there is the problem of induction that will never go away.
induction has its problems as it requires a uniformity of nature in order that what
we see today will co;  speed of light could change (suggeested in the 50s)
  -- Kuhn's solution.
Kuhn suggests that there is not.
We have gleaned this structure from examining many studies from our own experience
and from published sources.
We propose that the structure now be tested on other studies so that it
may be verified, falsified, or modified in true scientific fashion.