next up previous
Next: Conclusions Up: No Title Previous: On method in science.

And what of statistics?

When statisticians look at the nature of science, they see reflected the nature of statistics.66 Deduction becomes probability theory, induction, statistical theory (e.g. pp 6-7 of [8]); scientific method is hypothetico-deductive (e.g. [11], [17], [41]), self-evident in statistics through formal hypothesis testing and model criticism; put it together and you have, reminiscent of Aristotle, what George Box has called ``the advancement of learning'' [11]. But, as the previous section has shown, science is not really like that. Neither should be our understanding of statistics.67

Certainly statistical investigation meets with the same issues raised in the previous section but it can deal with them more easily. This is because it has a considerably more focussed domain of application. For example, consider the two old chestnuts of the philosophy of science - the realist/anti-realist debate and the problem of induction.

The realist/anti-realist debate concerns whether the entities of science are real or mere theoretical constructs. The primary entities of statistical investigation are the units of the study population and the values of variates measured on them. The units and their collective must be determined with sufficient care for it to be possible to select any individual from the collective. Sometimes considerable effort must be put into ensuring that measurement systems return reliable values of the variates they purport to measure. Within this context, statisticians become scientific realists in Hacking's sense - if we can select them and take measurements on them, they are real [27]; if we cannot, then statistical investigation ceases. Whether future scientific study shows the units to be composites of other more `fundamental' units or that the variates measured are to be interpreted differently is beside the point.


  
Figure 13: Induction from the set of measured values to the target population.
\begin{figure}
\centerline{\psfig{figure=induction.eps,height=3.0in}}
\end{figure}

As regards induction, for statistics the problem can be neatly separated into two pieces (see Figure 13). Ultimately, interests lies in the target population, as it is nearest to the broad scientific concerns of the problem. This population may be infinite, possibly uncountably so, and its definition can involve phrases like `all units now and in the future.' Drawing conclusions about this population will often require arguments that are extra-statistical for they will be based on the similarities of, and differences between, the target population and the study population. Such arguments may ultimately be unable to avoid assuming Hume's `uniformity of nature' principle ([30] page 89) and hence what philosophers mean by the `problem of induction.'

Such weighty problems dissipate when focus shifts to drawing conclusions about the study population. Such is its definition that all study populations are finite in size and random selection of units to form a sample is possible. Random selection provides the strongest grounds for inductive inference. When, for whatever reason, random selection has not been employed then either the case that it has been near enough approximated, or that the sample is itself similar in its attributes of interest to the study (or target) population must be made. The latter is much like making the case for the transfer of conclusions from the study to the target population and so can be just as difficult. In either case, the arguments will to a large extent be extra-statistical.

The critical reader might suppose that the structure we propose is designed to relegate all the difficult problems to the realm of the `extra-statistical.' But this is not sweeping them under the rug. Just the opposite. They are exposed as potentially weak links in the chain of inference about which statistics has nothing to say.68 The five stage structure is a template for any statistical investigation and so its applicability could be regarded as a demarcation criterion for statistics. Post-hoc, the structure allows us to identify the strengths and weaknesses in the statistical argument; in some investigations, even weak arguments may be all that are available. Ad hoc, it provides a useful strategy for finding out about populations and their attributes.


next up previous
Next: Conclusions Up: No Title Previous: On method in science.

2000-05-24