I read the Wheeler's Workshop column in your magazine each month, with keen interest. However, in the May 2001 issue (p. 26), I found a comment in the "April Brain Teaser" section that is disconcerting. The comment was in regard to rounding of data values, and how excessive data rounding can lead to an incorrect standard deviation. I have found that frequently the data presented in the questions in this section will contain a certain number of significant figures--usually reflecting some measurement process or another, whether it is length, area, weight or whatever. But, when the average of the data is computed, and following this, the standard deviation, the answer is presented with quite a few more significant figures than any of the individual data points. This is incorrect, as it implies a level of knowledge that you don't have. By this, I mean that if you were to measure 100,000 items for length to the nearest 0.1-inch, not to 5 decimals or more to the right. This stands to reason, because all of the individual measurements are to the nearest 0.1-inch. Any smaller units, such as 0.001-inch or 0.01-inch are not gathered at all. If you were to measure a population of 10,000,000 of these items, you would expect that the most frequent value of length would be the average. You would further expect that if the population is normally distributed, such as gaussian distribution, that about 99.97 percent of all measurements would be within plus or minus 3 standard deviations of the average value. Again, you can see that the standard deviation really cannot be expressed with more decimals to the right than the smallest unit measured, in this case, 0.1-inch, because there is no knowledge of any significant figure more than 1 to the right. In this fashion, the length could be confidently stated to be xx.yI0.z inches, where z is 3 standard deviations to the nearest 0.1-inch. Again, reflecting measurement limitations, with 99.97 percent certainty. My degree is a Bachelor of Science in Chemistry, and as part of data handling, we were taught how to treat significant figures before further processing, and how to present averages and standard deviations, as well as how to correctly calculate these items.
Also, I hasten to point out that the May 2001 "Brain Teaser" has data tabulated to the nearest 0.1 gram, yet a specification of 28.375 grams I1 gram per slice is given. This is incorrect on two counts. First, if the specification is to 3 decimals to the right, all weighings must also be at least 3 and better than 4 decimals to the right. Second, with a specification that is 3 decimals to the right, such as 1000, you could legitimately weigh to the nearest gram. For example, 1 gram, 28 grams or 29 grams. Tolerance statements must be in agreement with limit statements or confusion and mistakes will rule the day. Too often, pure statisticians miss some of the items I mentioned above because they haven't had the actual usage of the activities or measurements involved, they see numbers, and don't necessarily understand what generates these numbers. I look forward to continued reading of this column.
William C. Wright III
Response to Wheeler's Comment
I appreciate readers reading the "Wheeler's Workshop" column and bringing the following issues to question. The issues that have been raised are, correct rounding of computed values from data, and the resolution of stated specifications with the resolution of the data values.
Please rest assured that I have spent 20 years working in the industry, including chemical industries. I recognize and appreciate the varied teachings about the practical use of numbers and that many people have only "textbook" experience with numbers. Also, the source of these "Brain Teasers" are from actual situations where I have worked with the data and the processes. My Ph.D in statistics might make me a "pure statistician," but I am familiar with all of these data and the situations. The answers to the "Brain Teasers" are the same answers that I gave my clients so that they could understand and make improvements on their processes.
Regarding the rounding issue, there are two considerations. First, there is the issue of the level of detail selected for recording the data. As was stated in the article, "Excessive data rounding leads to an underestimate of the process' standard deviation." In the real world, not the textbook world, many people have a hand in determining what instrumentation to select and how to set it up for that purpose of making measurements. Often Arial, these people are unaware of the need for a particular level of detail in the recorded data values. The data for the April "Brain Teaser" came from a factory where I worked. These data are recorded in exactly the way the people in the factory make the measurements. The problem is that these data suffer from the problem of inadequate measurement units. The data values are not recorded to enough detail to be able to get an appropriate estimate of the standard deviation. It is well known in statistics that you can virtually eliminate the variability by rounding too much in the recorded data values. An absurd example of this is to record the height of all adults to the nearest meter, or yard. You would see almost no variability and the standard deviation you calculate would be tiny and not useable. For further explanation and example, see Chapter 9, Section 1 "Understanding Statistical Process Control" or Chapter 1 of "Evaluating the Measurement Process" by Donald J. Wheeler.
The second consideration is that of rounding after calculations have been made from the data. It is not always the case that the average, or expected value in the theoretical treatment, will come out to be expressed at the same level of the values. The easiest example to use is that of rolling a die. On a conventional die you have six possible values, typically 1,2,3,4,5 and 6. The expected values of a die is 3.5 and this can be verified by theoretical computations or empirically through a large number of trials. Most introductory statistics texts have examples. So, what should we do in the practical world? When we do calculations by hand, the question arises quickly. When we leave it to the computer to do them, we can avoid the problem until the end. Once we have analyzed a set of data and have the final estimates of quantities, such as average and standard deviation that will characterize a process, then we are ready to round the answers. Until then we must be careful to record intermediate results to a sufficient level of detail, beyond that of the data, so that we do not overly alter the final estimates. In the case of statistical process control and process behavior charts, the final estimates are the grand average and the average range, the central lines on the charts. These values need only one or at most two extra digits beyond the resolution of the data. The software package that I use has the unfortunate habit of recording these values to many more digits than is reasonable. I will make edits on the graphs in the future to eliminate that problem.
Finally, some comments are in order regarding the resolution of specifications and the resolution of the data. In practice, those who set specifications are not often in contact with those who choose, install and use measuring devices. Nor are these people necessarily aware of the behavior of the process that produces the item for which the specifications were written. With these disconnects, it is common to find that specifications are written to one level of detail, the data are recorded to another level of detail, and the process has a behavior that would require data to a third level of detail. The problem is not in what is correct or incorrect, the problem is in the lack of communication and knowledge of all three of these different components in dealing with data from a process. In the case you site, one of the problems was in the conversion of values from ounces to grams and the fact that the packages state weight in ounces and the measurements made in the production area are in grams. You are quite right that this can and often does lead to confusion and mistakes.
I do hop you will continue to read the "Brain Teasers" and that they will prove to be useful to you.
Readers are encouraged to e-mail their comments to Quality magazine at firstname.lastname@example.org, or send them via mail to: Editor, c/o Quality, 1050 IL Route 83, Suite 200, Bensenville, IL 60106 or fax them to the editor at (630) 227-0204. Letters must include name and daytime phone number for verification.