Nowhere is the need for this balance more apparent than in the teaching of statistics. Over the past seven years we have taught a variety of courses at different levels using the PPDAC structure at the core of the course. Besides giving balance to method, we have found that the structure compels discussion of substantive problems which can be drawn from a wide variety of application areas - industrial, scientific, technological, social, and commercial. The statistical method can be taught at almost any level of mathematical sophistication. Substantive and interesting problems can be addressed without resort to complex analysis tools, large data sets, or even significant computational resources. What is required is a rich context for each example in order to describe the details within the structure; these examples tend to grow into case studies.
In our introductory courses we have found over time that the complexity of analysis methods has been reduced as more and more time is devoted to the stages other than Analysis. On final examinations, for example, only about one third of the marks are assigned to questions directly related to the Analysis stage. The major goals of our introductory course are first to understand the universal need for empirical methods and second to understand and be able to use the statistical method in a variety of contexts.
The structure and language introduced can also be used to clarify some statistical issues which have provoked controversy in the past. Here we give three examples.
In our language, a study is enumerative if the target population can be listed so that a probabilistic sampling protocol giving every unit a positive inclusion probability can be used. Otherwise it is analytic. Deming's concern is essentially the possibility of study error which is not captured by the uncertainty expressed by the formal statistical procedures.
The attitude and tools of EDA are clearly important to meet the goals of the monitoring and examination tasks of the Data stage. These tasks amount to carrying out a small PPDAC investigation where the sample of the larger study is now regarded as identical to a target population within this smaller PPDAC. The Problem is to examine many attributes (typically graphical) looking for unexpected values of these attributes.
Alternatively EDA applies to those investigations where the sample is the entire study population. For example, when presented with a massive dataset the investigator is often interested in examining the attributes of that dataset as if it constitutes the entire population. In these instances the target population is still something different from the study population (however large that might be) and so the difficulty of study error remains, even for data miners.