Discussion

$next$ $up$ $previous$
Next: On method in science. Up: Statistical Method and Michelson's Previous: The Conclusion

Discussion

Too often, statistics has been presented solely as a set of analysis tools. But as the above structure makes explicit, the analysis is but the fourth stage in a series of five which constitute the statistical method. The three stages which precede the analysis are critical to the enterprise - the entire structure forces the proper balance. Seen as a whole, statistical method is not only ubiquitous in empirical investigations but unavoidable.

Nowhere is the need for this balance more apparent than in the teaching of statistics. Over the past seven years we have taught a variety of courses at different levels using the PPDAC structure at the core of the course. Besides giving balance to method, we have found that the structure compels discussion of substantive problems which can be drawn from a wide variety of application areas - industrial, scientific, technological, social, and commercial. The statistical method can be taught at almost any level of mathematical sophistication. Substantive and interesting problems can be addressed without resort to complex analysis tools, large data sets, or even significant computational resources. What is required is a rich context for each example in order to describe the details within the structure; these examples tend to grow into case studies.

In our introductory courses we have found over time that the complexity of analysis methods has been reduced as more and more time is devoted to the stages other than Analysis. On final examinations, for example, only about one third of the marks are assigned to questions directly related to the Analysis stage. The major goals of our introductory course are first to understand the universal need for empirical methods and second to understand and be able to use the statistical method in a variety of contexts.

The structure and language introduced can also be used to clarify some statistical issues which have provoked controversy in the past. Here we give three examples.

Deming [15] characterized studies as enumerative and analytic. Hahn and Meeker [28] describe the concepts in detail. Deming was particularly interested in contrasting the use of formal statistical procedures in sample surveys to their use in studies of industrial processes⁵⁷ which include units not yet produced. Deming claimed that standard statistical inference procedures (e.g. confidence intervals) would not apply in analytic studies.
In our language, a study is enumerative if the target population can be listed so that a probabilistic sampling protocol giving every unit a positive inclusion probability can be used. Otherwise it is analytic. Deming's concern is essentially the possibility of study error which is not captured by the uncertainty expressed by the formal statistical procedures.
Tukey [51] characterized analyses as either exploratory or confirmatory. Confirmatory analysis is the assessment of pre-specified questions and is the traditional domain of inferential statistics. Tukey describes exploratory data analysis (EDA) more as an attitude and not as a bundle of techniques. According to Tukey, the five-stage PPDAC method⁵⁸ is well suited to confirmatory analysis but not to exploratory analysis (nor to science at large). However by fleshing out the stages as we have above, we can see where exploratory analysis fits in.
The attitude and tools of EDA are clearly important to meet the goals of the monitoring and examination tasks of the Data stage. These tasks amount to carrying out a small PPDAC investigation where the sample of the larger study is now regarded as identical to a target population within this smaller PPDAC. The Problem is to examine many attributes (typically graphical) looking for unexpected values of these attributes.
Alternatively EDA applies to those investigations where the sample is the entire study population. For example, when presented with a massive dataset the investigator is often interested in examining the attributes of that dataset as if it constitutes the entire population. In these instances the target population is still something different from the study population (however large that might be) and so the difficulty of study error remains, even for data miners.
Statistics is sometimes criticized as applying only to a single study whereas scientific progress demands replication. The statistical method described above would seem to reinforce that view. However, multiple studies can and should be examined within the PPDAC framework. There the difficulties inherent in `meta-analysis' are clarified. For example, one major issue is the inclusion or exclusion of studies from the analysis. One feature of this issue can be discussed by comparing the study population to the target for each investigation considered for inclusion. Alternatively the set of possible studies can be taken as the target population and the set of realized study taken as the study population. Then the sampling protocol determines which studies are included.

$next$ $up$ $previous$
Next: On method in science. Up: Statistical Method and Michelson's Previous: The Conclusion

2000-05-24