Measurement

Why Measurement System Analysis Student Capstone Projects Lead to Lasting Benefits

In Search of Trustworthy Data

Firefly magnifying glass on a circuit board. — Image Source: Generative AI image created via Adobe Firefly

Over the last few years, students in our capstone program at Ohio State have led some very successful projects with a wide variety of sponsoring organizations. Projects in manufacturing, service, healthcare, not-for-profit, and educational settings have produced win-win outcomes. The sponsors are helping to reinforce students’ learning in live environments, and students are driving positive changes for their sponsors. Each project has its distinguishing features, as no two projects are exactly alike, and the use of data to diagnose problems and to drive decisions is an important feature in all of them.

One technique that I strongly urge all students to deploy is measurement system analysis (MSA), which is the assessment of how the operators measure their work. This is not the same as the all-important concept of how operators perform their work, which is commonly what we are trying to improve. These studies are important because a process or system is easier to understand and improve if we know that we are using accurate data. The reason I insist that students conduct an MSA is so that they will have the confidence to challenge data after graduation, and the skill to lead a study of data integrity.

As an example of unreliable data in the workplace, suppose that you ask one operator whether a work order is complete and ready for action, and that operator expresses that it is ready. Then you ask a second operator about the same work order, and that person states that the work order is incomplete. This mismatch reveals an inconsistency in the way the process is being measured, not necessarily the way the work is being done. At this stage, we are not asking about job performance.

During the early phases of a project, what might be considered the Measure phase in the popular DMAIC framework, the MSA is conducted so that the project team can be confident that all operators are measuring the process consistently. The setup of an MSA is deliberate. It must be challenging enough to reveal inconsistencies, while not being unrealistic. In other words, the MSA must be built to reflect the expected range of conditions that the operators will typically experience while doing the work.

MSA Setup

In this section I will describe a practical application of MSA. The deeper statistical calculations will not be shared, as my focus is on setting up a project-based study and using the results to help the organization make better decisions. A good MSA will present operators with opportunities to measure a significant aspect of their work accurately. There are three types of mismatches we are trying to uncover in an MSA:

Repeatability is the ability for one operator to measure something exactly the same way more than once. If we ask an operator to measure the outside diameter of a pipe or to evaluate the completeness of an information packet, and then we ask them to evaluate the same item again, will they record the exact same result?
Reproducibility is the ability for two or more operators to evaluate the same item consistently.
Accuracy is the ability of any operator to record the correct measurement and not err or evaluate the item incorrectly.

I always encourage the students and their project teams to select actual items from recent history to include in the MSA. For example, if the operators are going to evaluate inbound goods in a retail warehouse to determine whether they are assembled correctly, we will request recent items that successfully passed such an inspection as well as items that failed.

An MSA textbook will recommend that you use 10 different items and three operators, and that you ask the operators to evaluate every part two times each. My suggestion, in general, is to use nine or 12 parts for the study. One third of those parts should have a defect that is easy to identify, one third of them should have a defect that is tricky to identify, and one third should not have any defects at all. The reason for the clean parts is to detect false positives, times when the operator might fail a part that was actually acceptable. To help the operators forget how they scored each item in round one, the coordinator must allow sufficient time before round two.

Figure 1: Setting up a measurement study. *Source: Tracy Owens*

MSA Results

After all operators have evaluated all parts twice, it is time to conduct the analysis and learn how well the measurement system is working today. In a pass/fail study, like the completeness of a work order or information packet, there will be three numeric methods for evaluating repeatability, reproducibility, and accuracy. First is the percent agreement from the study that was just conducted. This will usually be a round number, like 80% or 90%, because there are so few samples being studied. But does that mean that, if we gave the operators 100 more parts to evaluate, they would be consistent exactly 80 times? No. Instead we calculate the second metric, which is the confidence interval.

Figure 2: Three numeric methods for evaluating a pass/fail measurement system. *Source: Tracy Owens*

The confidence interval for a proportion is calculated statistically using a constant from the standard normal table, the sample size, and the proportion defective from the actual study. The confidence intervals tell us what the likely percent agreement would be if we did, indeed, give the operators 100 more samples to measure.

The third metric used to evaluate a pass/fail measurement system is the Kappa score, which is calculated not simply based on whether the responses were the same or not, like scoring “pass” in round one and “fail” in round two for the same item. Kappa scores also consider the direction of the mismatch, like saying “pass” when it should have been “fail” and saying “fail” when it should have been “pass”. A Kappa score of 1.0 indicates a perfect match, while a score of 0.0 gives the process the same chance as flipping a coin, like an operator saying, “if it’s heads, I’ll say ‘pass’”. A desirable Kappa score is between 0.9 and 1.0, and a Kappa score between 0.7 and 0.9 can be acceptable if the process is monitored closely.

Figure 3: Kappa calculator built by the author in Excel. *Source: Tracy Owens*

I regard the MSA as a project within the project, for if the data cannot be trusted we must pause the improvement project and improve our ability to measure the process. Figure 4 shows a process with an unacceptable amount of measurement variation, and Figure 5 shows that same process after improvements were made to the way operators measure their work. You can see in the graphs that there is less variation in Figure 5.

Figure 4: Gage study with continuous data – prior to improvements, built in EngineRoom. *Source: Tracy Owens*

Figure 5: Gage study with continuous data – after improvements, built in EngineRoom. *Source: Tracy Owens*

Note that these two MSA results are not built for a pass/fail type of evaluation. They are based on a continuous measure, like the outside diameter example above. For this type of variable, a Gage R&R study is conducted. The R&R letters stand for repeatability and reproducibility. Figure 6 shows the graphical output of a Gage R&R study where each operator’s assessment of all ten parts is shown. Success in such a study would look like a flat horizontal line, like you see in the frames for part numbers 5 and 9. Any diagonal lines show a mismatch and lack of repeatability, and staggered clusters show a lack of reproducibility among the three operators.

Figure 6: Gage run chart built in Minitab. *Source: Tracy Owens*

Showing and explaining these graphs to the project team and to a steering committee helps them understand the opportunity for improvement. Once the study is complete and the measurement system is improved, the project team can continue with confidence in the data.

A Bonus Outcome

One unexpected benefit of the MSAs that our students have conducted in the last year is the ongoing applicability of this method in the workplace. Several of the project sponsors were so impressed by the MSA setup and interpretation of results that they have incorporated the measurement study into their new hire orientation. Figure 7 is an example of an executive presentation where an MSA has been added to new hire training. For me, this is a thrilling outcome – the use of statistical tools to help people do their jobs more effectively. The win-win relationship between capstone sponsors and their students continues to grow stronger through the detailed and relevant deployment of trusted analytical tools.

Looking for a reprint of this article?
From high-res PDFs to custom plaques, order your copy today!

As a founding member of ASQ’s Innovation Division, Tracy Owens has promoted a growing body of knowledge on innovation management. His latest book F-Notes: Facilitation for Quality was released in late 2020. Tracy Owens | LinkedIn