Quality Magazine logo
search
cart
facebook twitter linkedin youtube
  • Sign In
  • Create Account
  • Sign Out
  • My Account
Quality Magazine logo
  • NEWS
  • PRODUCTS
    • FEATURED PRODUCTS
    • SUBMIT YOUR PRODUCT
  • CHANNELS
    • AUTOMATION
    • MANAGEMENT
    • MEASUREMENT
    • NDT
    • QUALITY 101
    • SOFTWARE
    • TEST & INSPECTION
    • VISION & SENSORS
  • MARKETS
    • AEROSPACE
    • AUTOMOTIVE
    • ENERGY
    • GREEN MANUFACTURING
    • MEDICAL
  • MEDIA
    • A WORD ON QUALITY PUZZLE
    • EBOOK
    • PODCASTS
    • VIDEOS
    • WEBINARS
  • EVENTS
    • EVENT CALENDAR
    • IMTS
  • DIRECTORIES
    • BUYERS GUIDE >
      • Supplier Insights
    • NDT SOURCEBOOK
    • VISION & SENSORS
    • TAKE A TOUR
  • INFOCENTERS
    • Digital Quality Management Systems
    • NEXT GENERATION SPC & QUALITY ANALYTICS
  • AWARDS
    • ROOKIE OF THE YEAR
    • PLANT OF THE YEAR
    • PROFESSIONAL OF THE YEAR
  • MORE
    • Expert Columns
    • NEWSLETTERS
    • QUALITY STORE
    • INDUSTRY LINKS
    • SPONSOR INSIGHTS
  • EMAG
    • eMAGAZINE
    • ARCHIVES
    • CONTACT
    • ADVERTISE
  • SIGN UP!
Measurement

Measurement

Cohen’s Kappa: Measuring Agreement Beyond Chance

Given how each inspector tends to classify items, how much agreement would we expect purely by chance?

By Ray Harkins
This image shows a person in personal protective equipment (PPE) using a tablet, likely in an industrial or construction setting
Credit: Starvetiger, E+ Collection, Creative #1487505430 (Royalty-free)
January 21, 2026

Attribute inspection is one of the most widespread, yet difficult-to-control, measurement methods in manufacturing. Whether inspecting machined surfaces for cosmetic defects, checking weld quality, reviewing molded parts, evaluating assembly completeness, or verifying diameters with go/no-go gages, many operations depend on human inspectors making these pass-fail, subjective judgments.

To evaluate these inspection systems, many organizations still rely on observed agreement, the simplest method of measurement consistency. Unfortunately, attempts to understand the variation introduced by an attribute gage are frequently led astray by excellent observed agreement results masking an unreliable measurement system.

Let’s consider a case study in which two appraisers evaluate an attribute gage pin by each measuring the same 35 samples and comparing their results. Each appraiser declares each of the 35 samples as either “G” or “NG” on the basis of the output of the attribute gage. A useful presentation of the study data follows:

Answer: (a) Conceptual homework problem

Observed agreement (Po) represents the proportion of parts on which two inspectors -- or an inspector and a master standard -- give the same classification. To calculate it, you count the number of times both evaluations match (e.g. both calling a part G or both calling it NG) and divide that by the total number of parts inspected. For a G/NG inspection,


Pₒ = (nGG + nNG) / N


where N is the total number of parts.

In our case study, the appraisers agreed 21 times that the sample was Good (nGG) and five times that the sample was Not Good (nNGNG) for a total of 26 points of agreement in 35 opportunities (N) yielding an observed agreement (Po) of,


Po = (21 + 5) / 35 = 74.3%


While observed agreement is easy to compute and intuitive, it often overestimates the true reliability of a gage system, especially when one category (usually G) dominates the population.

Consider this: If 95% of all parts are “Good,” then an inspector who simply calls everything “Good” will achieve 95% agreement, even if they cannot properly detect defects. This false sense of capability leads to poor decisions and recurring quality escapes.

To avoid this trap, quality engineers can use Cohen’s Kappa, K, a statistical measure of agreement beyond chance. Kappa tells you how much appraisers agree in a meaningful way, not merely due to guesswork.

In attribute gage studies, Kappa can be used to quantify:

  • Repeatability - Does the same inspector classify the same part consistently across trials?
  • Reproducibility - Do multiple inspectors agree with each other?
  • Accuracy - Does the inspector’s classification match a reference or master standard?

Observed agreement treats all agreement as equally meaningful. Kappa corrects this bias by asking:

Given how each inspector tends to classify items, how much agreement would we expect purely by chance?

In the Kappa calculations, chance agreement is the baseline, and the K statistic measures agreement beyond that base. This makes Kappa far more reliable in evaluating human classification systems, especially those involving G/NG decisions.

The Core Formulas:  ,  , and Kappa

At this point in our discussion, we understand that Po is the observed agreement of our study, the ratio of the total number of times the two appraisers agreed about a sample’s inspection status to the total number of samples in the study. But another critical question is, what is the probability that our appraisers declared the same inspection status by pure chance? This is where Expected Agreement ( ) enters the discussion.

is how often the inspectors would agree just by chance, based on how frequently each uses each category (G or NG). And this total expected agreement is naturally the sum of the expected agreement of G and the expected agreement of NG.

For a 2×2 G/NG system:

Pe = PG + PNG

PG = (number of GApp 1 / Total) * (number of GApp 2 / Total)

PNG = (number of NGApp 1 / Total) * (number of NGApp 2) / Total)                       

Drawing from the data in Table A in our case study,

PG = ((21 + 6) / 35) * ((21 + 3) / 35) = .529

PNG = ((3 + 5) / 35) * ((6 + 5) / 35) = .072

Pe = .529 + .072 = .601

Once you have  and, calculating Cohen’s Kappa, K is straightforward:


κ = (Po − Pe) / (1 − Pe)


Again, referring to our case study,


κ = (0.743 − 0.601) / (1 − 0.601) = 0.356


A quality engineer’s interpretation of Kappa will vary with industry and application, but a good starting point for interpreting values of Kappa can be found in Table B.

Answer: (a) Conceptual homework problem

A Kappa of 1 implies perfect agreement. A Kappa of 0 implies no agreement beyond what chance produces. Kappa calculation can also produce negative values (i.e. worse than chance) implying systematic disagreement. This unusual case is typically caused by a structural problem with the inspection process e.g. a misunderstanding about what is G and NG, recording data incorrectly, etc.

In our case, even though the inspectors agree 74.3% of the time, once chance agreement is removed, the true agreement is only 35.6%, an unacceptable result in most environments.

This means the attribute inspection system needs improvement, possibly via:

  • Better defect acceptance standards
  • Boundary samples
  • Lighting, magnification, or fixture improvements
  • Inspector training

This example demonstrates why Kappa is essential for meaningful attribute gage analysis. Observed agreement alone masks underlying variability and can allow an unreliable inspection system into production.

Human classification systems are among the most variable measurement systems in manufacturing. But Cohen’s Kappa provides a rigorous, chance-corrected measure of inspection consistency and is one of the most important tools for evaluating:

  • Visual inspection
  • Go/no-go attribute checks
  • Defect classification
  • Assembly verification
  • NDT/NDE categorical decisions

For manufacturing quality engineers and process engineers, understanding and applying Kappa is essential for ensuring reliable attribute gage performance and preventing costly inspection errors.

READ MORE

  • Making Sense of Gage R&R Analysis 
  • Why Measurement System Analysis Student Capstone Projects Lead to Lasting Benefits 
  • The new VDA Volume 5 – Obligation and Opportunity 
KEYWORDS: manufacturing metrology quality

Share This Story

Looking for a reprint of this article?
From high-res PDFs to custom plaques, order your copy today!

Ray Harkins is the General Manager of Lexington Technologies in Lexington, North Carolina. He earned his Master of Science from Rochester Institute of Technology and his Master of Business Administration from Youngstown State University. He also taught over 100,000 student quality-related skills such as Gage R&R Simplified: Essential Tools for Quality Engineers, Quality Engineering Statistics, and Root Cause Analysis and the 8D Corrective Action Process through the online learning platform, Udemy. He can be reached via LinkedIn at linkedin.com/in/ray-harkins or by email at [email protected]. www.TheManufacturingAcademy.com.

Recommended Content

JOIN TODAY
to unlock your recommendations.

Already have an account? Sign In

  • 2024 Quality Rookie of the Year Justin Wise 1440x750px banner with "Quality Rookie of the Year" logo inset

    Meet the 2024 Quality Rookie of the Year: Justin Wise

    Justin Wise is an exceptional individual who has been...
    Aerospace
    By: Michelle Bangert
  • Man with umbrella and coat stands outside while it rains at night looking at a building.

    Nondestructive Testing: Is there an ethics problem?

    I was a whistleblower who exposed fraudulent activities...
    NDT
    By: Dale Norwood
  • Unraveling Deflategate: Football stadium with closeup of football on field

    Unraveling the Tom Brady Deflategate

    The Deflategate scandal erupted following the 2014 AFC...
    Measurement
    By: Greg Cenker and Henry Zumbrun
Manage My Account
  • eMagazine Subscriptions
  • Newsletters
  • Online Registration
  • Subscription Customer Service
  • Manage My Preferences

More Videos

Sponsored Content

Sponsored Content is a special paid section where industry companies provide high quality, objective, non-commercial content around topics of interest to the Quality audience. All Sponsored Content is supplied by the advertising company and any opinions expressed in this article are those of the author and not necessarily reflect the views of Quality or its parent company, BNP Media. Interested in participating in our Sponsored Content section? Contact your local rep!

close
  • Key Takeaways for Quality Leaders
    Sponsored byComplianceQuest

    Key Takeaways for Quality Leaders from the 2026 Gartner Magic Quadrant™ for QMS

  • This image shows a person seated next to a Bobcat T66 compact track loader.
    Sponsored byPolyWorks by InnovMetric

    Supercharging Digital Gauging at Bobcat North America

  • Dorsey Calibration Lab photo by Tom LaBarbera Picture this Studios
    Sponsored byDorsey Metrology International

    Ensuring Product Quality in a Competitive Manufacturing Landscape

Popular Stories

a titanium diaphragm speaker driver

The One Thing Elon Gets Right Is Designed to Scare You

This image shows a person seated next to a Bobcat T66 compact track loader.

Supercharging Digital Gauging at Bobcat North America

Dorsey Calibration Lab photo by Tom LaBarbera Picture this Studios

Ensuring Product Quality in a Competitive Manufacturing Landscape

2026 Quality Professional of the Year!

Events

June 9, 2026

Future-Proof your Quality Processes with Advanced 3D Optical CMM Technology

Discover how to effortlessly capture complex data, leverage true multi-sensor automation, and ensure continuous operation without creating inspection delays.

June 22, 2026

Automate 2026

Automate is North America's largest robotics and automation event — and the best place to take your ideas from insight to impact.
 
Our show floor features the world’s leading automation solutions, from AI and robotics to motion control, vision systems, and more. Plus, our educational conference is second to none, led by the brightest minds in automation today.
 
Ready to transform the way you work? Take the next step at Automate.
View All Submit An Event

Products

Lean Manufacturing and Service Fundamentals, Applications, and Case Studies

Lean Manufacturing and Service Fundamentals, Applications, and Case Studies

See More Products
Quality Podcast Channel Custom Content

Related Articles

  • The group of dial indicator gauge on the table with lighting effect.

    Making Sense of Gage R&R Analysis

    See More
  • Engineer in a grey coat walking with company CEO through a factory.

    Now is Your Chance to Upgrade for Growth

    See More
  • Welders in car factory

    Second Chance Manufacturing Programs: A Win-Win for Smaller Manufacturers and Job Seekers

    See More

Related Products

See More Products
  • Juran Institute's Six Sigma Breakthrough and Beyond Quality Performance Breakthrough Methods

  • Measuring Quality Improvement in Healthcare

See More Products

Events

View AllSubmit An Event
  • July 29, 2025

    Beyond the Checklist: Restoring Integrity in Aerospace Quality Assurance

    On Demand As aerospace systems become increasingly digitized, it’s more important than ever to uphold the proven value of hands-on Quality Assurance (QA) and nondestructive testing (NDT).
  • October 8, 2025

    Beyond Radiography: Optimizing Weld Inspection with Advanced Ultrasonic Techniques

    On Demand This webinar explores the latest ultrasonic techniques—Phased Array Ultrasonic Testing (PAUT), Time-of-Flight Diffraction (TOFD), and Plane Wave Imaging (PWI) with TOFD—and their advantages over traditional radiographic testing (RT).
View AllSubmit An Event
×

Stay in the know with Quality’s comprehensive coverage of
the manufacturing and metrology industries.

Newsletters | Website | eMagazine

JOIN TODAY!
  • RESOURCES
    • Advertise
    • Contact Us
    • Directories
    • Manufacturing Division
    • Store
    • Want More
  • SIGN UP TODAY
    • Create Account
    • eMagazine
    • Newsletters
    • Customer Service
    • Manage Preferences
  • SERVICES
    • Marketing Services
    • Market Research
    • Reprints
    • List Rental
    • Survey/Respondent Access
  • STAY CONNECTED
    • LinkedIn
    • Facebook
    • YouTube
    • X (Twitter)
  • PRIVACY
    • PRIVACY POLICY
    • TERMS & CONDITIONS
    • DO NOT SELL MY PERSONAL INFORMATION
    • PRIVACY REQUEST
    • ACCESSIBILITY

Copyright ©2026. All Rights Reserved BNP Media, Inc. and BNP Media II, LLC.

Design, CMS, Hosting & Web Development :: ePublishing