Quality Begins with Good Data
Critical quality decisions are based on information from engineering tests, but many of the engineering tests that are well documented and objective provide exactly the wrong information to drive the quality process.
The QS-9000 standard also is dependent on this information. The standard was designed to develop fundamental quality systems that provide continuous improvement, emphasizing defect prevention and the reduction of variation and waste in the supply chain. QS-9000 provides the outline of the quality structure, but it is the engineering tests that provide the information needed to move the system forward.
Suppliers and purchasers make critical decisions based on QS-9000's Section 4.2.4, Product Approval Process, which requires suppliers to follow the Production Part Approval Process (PPAP). In theory, the product that meets the requirements set forth by the supplier through the PPAP is a viable product that will meet the purchaser's requirements and the consumer's needs. However, no matter how well the process is followed, PPAP test results provide the information to make decisions, and in almost all cases, the PPAP tests rely on a compromised statistical reliability test.
Reliability as a requirement when contracting products is a manufacturing mainstay. Often, contracts are based on required specifications. The specifications are based on a given number of sample parts tested under explicit conditions with zero failures allowed. The advantages and influences of contractual requirements on the reliability tests are subtle but significant. A fixed-length test with a clear pass-fail result fits well into the product development timeline. The need to bring the product to market quickly and in a prescribed period of time forces the reliability test to be fixed in length, thereby compromising the quality and quantity of the information generated.
The ideal reliability test includes testing a significant number of samples to failure and conducting a reliability analysis on the products' time-to-failure, providing an accurate measure of the true reliability.
However, this isn't normally feasible. In contracts in which components are purchased for assembly in a larger system, the time-to-market requirements and the development of a timeline dictate a fixed time for testing. This precludes using a test-to-failure approach for reliability tests. Instead, a set number of the product is tested to a fixed life and zero failures are allowed. This provides a demonstration of a minimum reliability while setting a fixed time to conduct testing.
The contractual reliability is, therefore, a compromised reliability. The true reliability measure of the product sample is never actually taken. As a result, the product may be overdesigned or marginally designed without the engineers and management knowing the true life span of the product. Many standard contracted reliability measures have an inherently uncertain result. For example, 12 samples are tested to simulate one equivalent life and no failures are allowed. This demonstrates a given level of reliability with a fairly low confidence: 46%. If the product has 95% reliability, there is nearly a 46% chance that at least one out of the 12 samples will fail, while there is a 54% chance that all 12 parts will pass.
Furthermore, the information gener-ated from the test is of marginal value in actually improving the product. Contin-uous improvement, as well as reliability growth, are mandates of the QS-9000 process. Knowing that a product does or does not meet a contractual reliability provides no information on what can be done to improve the product. Demonstrating a 95% reliability, even if parts were tested to failure, would not provide a solid direction to improve the product.
Suppliers who have to meet this standard will often retest products that fail because the odds are good that it will pass the next time.
The PPAP process then becomes compromised, not by the integrity of the process, but by the appropriateness of the information used. Given 12 samples of a product tested for product approval on a given attribute, and all 12 parts pass, answer these questions:
- How can the product be improved?
- Is the product robust?
- Is the process optimized?
Although the product meets a requirement, no other important information needed to make good quality decisions is available.
A statistical bias?
If section 220.127.116.11 of the QS-9000 standard is examined, the same business bias toward statistical methods can be seen: control charts, design of experiment (DOE) and parts per million (ppm). How long will it take to conduct a DOE on a process involving 3 variables requiring 60 days of testing, using a minimum of 27 prototypes that cost $2,000 each, testing 3 parts at a time? Answer: 0 days.
No manufacturer will conduct these tests because the 540-day time frame makes the results meaningless. The alternative is to build equipment to test 27 parts simultaneously, which would require a 9-fold increase in capital investment. While this is an example, typical prototype costs can run from a few hundred dollars to more than $100,000, with a larger range for capital test setup costs. In the end, the results are still a pass-fail with no additional information. Running the test to failure would require anywhere from two to 12 Arial the testing time. Again, this is impractical.
Why is an entire industry using a core set of test methods to provide the information to their quality process when the resulting information is ill fitting? Statistical methods have served the industry for a long time, so they stick around even when they do not provide the information needed by the relatively new quality processes.
The information required to make decisions in the quality process should drive the test methods that are used. If a statistical test is appropriate, then it should be used. But if it is not, the statistical test should not be used. Longevity and historical momentum make for strong paradigms, but don't always equate to quality.
The key information for making a decision on what process to use to test a product can be broken into five groups. These are:
- Information Goals: Information is needed to validate a particular design feature and design level.
- Process: The processes available to produce information including computer modeling and engineering algorithms to produce the desired information.
- Failure Modes: The type of failure modes that are anticipated can affect both the process and the equipment needed to correctly test a product.
- Equipment: The type of equipment needed to implement a process.
- Performance Parameters: The performance ranges that differentiate competing pieces of equipment.
Several types of information may be needed depending on the development stage of a product. During the early stages of product development, the basic feasibility of a design must be established. Often, the feasibility of a design is based on past experience. However, a new design sometimes presents challenges that require the feasibility of the design to be evaluated.
As a design is developed, specific requirements and discreet information are needed either by the design engineers or to meet regulations. For example, a plastic structure of an automotive instrument panel made from a 40% glass-filled polymer will have unique material properties. The molding process used and the amount of orientation that is induced in the glass fibers during the molding process will result in material properties specific to the design. This information must be determined to correctly design the part.
During the design iteration stage, the key information needed to move forward is the root cause of inherent design failure modes. The design engineer can adjust the design to improve the inherent reliability of the design if the root cause of failure modes is known. This information fits in the QS-9000 requirements for continuous improvement in the design. Knowing the statistical reliability of a design does not provide the information; knowing the failure modes and root causes does provide the appropriate data.
In addition to the root cause of failure modes, the design's relative maturity is critical. If the design is mature, then continued design iterations couldn't improve a design's inherent reliability. An immature design will have opportunity for design improvement. Knowing if a design is mature is critical to complete the design iteration stage with a fully developed design.
During design iteration, the uniformity of stresses for known-load cases, opportunities for reducing mass and weight, redistributing heat sources in electronics, optimizing flow paths and gating and ergo-nomic impacts can be addressed through computer modeling. Although many of these issues are not as critical as physical failures that limit the life or operation of the product, they do affect the profitability and efficiency of the design.
During production and production ramp up, manufacturing engineers need to verify that the production process is not introducing or exaggerating failure modes that were not present during development. Comparison of performance between development testing and production testing can provide critical process failure root cause information. Also, the predicted behavior of a large number of the product in the field is necessary for warranty anticipation, service personal training, replacement, repair and maintenance schedules.
For many products, the actual advertised operating limits and storage limits are targeted during development and then quantified during final production. This information helps establish user guides and warranty limitations.
The information needed for feasibility and design iteration stages of product development is dependent in part on the potential failure modes. The choice and success of different processes and equipment depend on failure mode verification. A static-load test will not provide information concerning the fatigue, abrasion, chemical, coatings, finishes, lubricants or change of state of the design. Vibration alone will help determine fatigue, some abrasion, interference and binding, but no other potential failure modes. In selecting a process and test setup, the range of failure modes that need to be verified must be considered.
Many different processes can be discussed. The processes are grouped by the type of information they produce with a special emphasis on newer accelerated methods.
- Failure Mode Verification Testing (FMVT) is a failure mode identification testing method that is able to use as few as one prototype. The method is used to establish multiple design inherent failure modes, rank the failure modes and estimate the design's potential for improvement.
- Highly Accelerated Life Testing (HALT) is a failure mode identification testing method that is able to use few samples to establish multiple design inherent failure modes, and establish operating and destruct limits of a product.
- Full System Life Testing (FSLT) is an accelerated reliability test method that subjects the samples to one equivalent life with all known stress sources present. The test produces a reliability prediction.
- Static-load tests measure the proof load or destruct load of samples subjected to axial, torsional or pressure loading in a noncyclic manner. The tests, which are available in a variety of types, provide upper destruct limits and safety verification on critical load-bearing items. For example, a brake pedal will have a proof load at which the pedal is not to deform and a destruct load to verify the structure's ability to withstand certain maximum loads.
- Chemical reaction tests are environmental tests that establish a sample's ability to withstand attack from chemicals and catalysts such as salt spray, UV light, heat, cold and humidity. These tests are often paired with static load tests to track the degradation of a sample's strength as the chemical reactions take place over time.
- Coordinate measuring ma-chines can do dimensional testing of a design to verify that the sample meets the design tolerances.
The QS-9000 standard sets some key information goals. At the design approval stage, the two key questions are: Is this design feasible? What must be done to improve and optimize the design? It should be noted that neither of these goals needs a statistical reliability test. Testing for feasibility can be answered by demonstrating that one part meets the expected use conditions for a reasonable period of time. The key points for improving and optimizing the product can be determined by addressing the failure modes. At the PPAP stage, the testing must indicate whether the product is fully optimized, whether the production process is capable of producing a viable part and what needs to be done to improve the production of the product.
FMVT provides much of this information. For example, a new hinge has to go through the design validation and PPAP stages. FMVT on the hinge would first determine the stress sources that can damage the hinge.
A test would then be conducted, not to establish the statistical reliability of the product, but to determine the feasibility of the product. Can it perform under expected service conditions? What failure modes exist in the design? The tests are conducted by taking each stress source at the expected service level.
For example, the door weight may have a nominal value of 5 pounds. The test is then run for a set period of time, typically 1 hour, with all of the stress sources present and randomly applied. The random application of stress sources ensures that no non-intuitive configurations of stress on the product will cause failure. After the product is tested at the expected service conditions, the stress levels are raised for additional one-hour periods of time each in 10 even increments from the expected service conditions to destruct levels of stress. The result is an accelerated test, usually taking one or two days, that will identify the design inherent failure modes, in the approximate order of significance.
In the example of the hinge the results may be:
This progression of failure modes can be analyzed to determine the feasibility of the design, the maturity of the design and what must be done to improve the design. Because the first 60 minutes of this example were at the expected service conditions, it is expected that a feasible product should be able to handle the stress applied for that period of time. However, with one failure mode at 25 minutes, a question about feasibility is raised. If the failure can be addressed, then the product may still be viable. Failure analysis will determine whether the root cause of the stripping screw can be corrected.
Determining the relative maturity of the design and identifying the failure modes that should be addressed to improve the product can be accomplished by examining the times-to-failure. For a robust design, expect the product to last for a long period of time and then exhibit failures throughout the design in a fairly short timeframe.
For a product with many issues to be improved, expect to see a distribution similar to the one for the hinge. Several failure modes spread out across time, so that fixing the first failure mode would have a significant impact on the time-to-failure. This potential can be quantified by determining the average potential for improvement in the product. This is calculated by taking the average time between failures after the first failure divided by the time to first failure.
The formula is: ((500-25)/3)/25 = 6.3 or 630% average potential for improvement.
Historically when the maturity of products are measured using this technique, products that are fully optimized will have a design maturity (DM) of less then 10%.
Using this historical perspective helps determine how many failure modes to address. By addressing the first failure mode, the DM becomes ((500-300)/2)/300 = 33%. By addressing the second failure mode, the DM becomes ((500-475)/1)/475 = 5% which would be acceptable. For this reason, the first two failure modes would be addressed. Assuming the product was re-designed, the results may now look like this:
Now the DM = ((560-475)/3)/475 = 6% and that is acceptable. At this point, the product is feasible because it functions well beyond the expected service conditions and it has been optimized. This is information that would not have been available from a traditional compromised statistical test.
In the PPAP stage, the based line failure mode progression can be used in a similar manner to screen the production process to verify that the product is produced to match the design's performance, and identify if any means of improving the performance are possible.
An FMVT like the one described would typically take one or two days to conduct per design iteration and would provide much more information for the design and PPAP process than a reliability test would. In particular, the engineers and managers would know how to improve the product, how feasible the product was, how process changes had affected the maturity of the design and the robustness of the product.
Remember the questions asked at the beginning of this article: How can the product be improved? Is the product robust? Is the process optimized?
With a statistical test there are no good answers, but with this type of test that ignores statistical reliability and focuses on failure modes, the answers are easy.