Organizations are under constant pressure. When product failure rates are significantly reduced, customer expectations rise. More complex supply chains reduce costs but introduce new risks. Technology reduces dependence on people, but corrective actions find the cause of problems to be human error and the solution to be retraining for approximately 60% of problems.
What’s wrong with this picture? While the first two issues are typical of what happens in a complex system, the last is totally illogical in all but a small number of instances.
Think about it. If 60% of the people who were trained did not get it, either the organization is hiring people who cannot learn or the training program is totally ineffective. While this may be the case in some organizations-hopefully not yours-the reality is that the human error-retraining response usually indicates the avoidance of a solid investigation.
It is not that people do not want to solve the problem. It is often they just do not know how or it is low on their priority list. Only management can solve the latter issue, but with a little education we can significantly help with the former. After all, most people never receive training in root cause analysis, and those who are experts often have learned it through decades of experience diagnosing a wide range of problem situations.
Use of flowchart for diagnosis
Types of Problems
Not all problems require root cause analysis. Creative problems are those for which any viable solution is about as useful as another.
Think about a machine that is producing no defective parts but does not have the capacity to meet increased volume demands. Many solutions could work, such as purchasing a new machine, operating the machine more hours or days per week and modifying the machine fixtures/layout/program to optimize load/unload time and tool path. Which is best-and it could be a combination-simply comes down to an economic decision based on the volume increase desired and the cost of each solution.
Then there are analytical problems. These are the ones where everything was working all right for a while, and suddenly it is not. It may be a temporary upset or a permanent stop, but the system is no longer operating normally. In such a case it is important to find the specific reasons for the change in performance and resolve them.
If the machine above suddenly stopped working or began making bad parts, it could be due to an electrical, mechanical or hydraulic problem. Finding out which of these it is and the specific reason for it is critical to being able to fix it.
So for creative problems, solutions are all that are necessary, while for analytical problems the right solutions will not be known until a proper diagnosis is done. It is the diagnostic process, known as root cause analysis, which finds the causes.
One common issue is that problems are not well defined before trying to diagnose them. Nearly all experts on root cause analysis state that the problem statement should include: what (occurred or did not occur), where (did it occur), when (did it occur) and how much (magnitude of the problem). In developing the problem definition it also is important to determine whether the problem has occurred before, which will help ensure a more systemic diagnosis.
A related issue is that the problem is often not a single problem but many different problems. In such situations the diagnosis will be difficult since there are likely to be multiple causes, but for different subsets of the overall problem. It is better to slice the overall problem into multiple subproblems using a Pareto diagram, then work on them one at a time (an exception is when there are interconnections between the subproblems).
An important step in diagnosis is developing testable hypotheses for what could have caused the problem. Brainstorming is a technique often used for coming up with possible causes, aided by the seven Ms-manpower, material, machines, method, measurement, Mother Earth and management-and sometimes organized into a cause and effect diagram. However, more structured techniques such as a process flowchart or a logic tree provide a more systemic analysis.
Consider the situation where a test lab learns that its proficiency test results are abnormal. In order to diagnose the problem, a flowchart, logic tree or combination of the two can be used to drill down to find the cause of the problem. The diagnosis then becomes an iterative process that mimics the five Whys concept.
Perhaps the most egregious error during diagnosis is using a voting process to identify the actual cause. While voting might be of value in deciding which of many causes will be investigated first, it should almost never be used to decide the cause for which a solution will be implemented.
Such a decision should instead be based on solid logic of cause and effect relationships, backed up by data indicating that a specific cause is the correct one. In cases where voting is useful or necessary, it should be based on probability estimates backed by historical data or simulations.
Use of logic tree for diagnosis
Of course, not all problems with problem solving are related to poor diagnosis. Sometimes the solution phase also fails to work effectively. Here are some common issues:
The process for identifying possible solutions is too limited. For example, there are many creative thinking techniques beyond brainstorming which can be used to generate breakthrough ideas, and the success of other organizations at solving the same type of problem could be sought through benchmarking. And even today a fallback many organizations use is to add an additional inspection step, rather than actually trying to address the cause of the problem.
Unintended consequences of the solution are not considered. While a good solution may solve the original problem, it may create other problems if a systems review of the process change is not done.
Follow-up for effectiveness is not sufficient. Such follow-up needs to take into account that if the failure rate is low, monitoring of the results may be required for a considerable length of time. Also, monitoring of the process change itself (through evaluation by the process owner or as part of internal audits) should be done long enough to take into account that people often will revert back to old behaviors when attention to a problem is removed.
Successful solutions are not spread and sustained. If a solution is successful, the organization should look at other areas where perhaps problems have not yet been encountered, but where the same risks exist. Implementing the solution proactively could save a lot of headaches later. And sustaining change often requires finding ways to get people to internalize the new methods, which can be done through both feed-forward (have them tell others why the solution is better) and feedback (include adaptability to change in performance evaluations).
While the human mind is a powerful mechanism, it also is prone to several cognitive biases that can cause poor decisions. Examples include:
Assumptions. Every day assumptions are made that work well for us (for example, various laws of physics will continue to work), but when one is faced with a system that is not performing as expected it may be useful to document and verify some underlying assumptions.
Availability error. Given a choice, the human mind often will take the easier route rather than the one that would be more useful. This can be especially harmful when deciding what data to obtain in order to test causal hypotheses.
Confirmation bias. People also like to be right and often will spend an inordinate amount of time trying to prove it. However, looking for data that might prove a hypothesis wrong can often be more productive.
Organizational culture also can impact how well problem solving works. If problems are seen as negative, then people do not want to be involved for fear they will be blamed when things do not go well. A culture that sees problems as inherent to life and the opportunity to learn and self-correct will be much more open to exploring the multitude of issues that may cause them.
Another vital issue is who performs the diagnosis and identifies and implements solutions. Obviously people familiar with the system need to be involved, and should include the process owner or a representative. An independent facilitator or other personnel not directly involved with daily operation of the process also can help provide an objective view.
And finally, consider corrective action density. This is basically the number of problems an organization tries to solve per year divided by the number of employees. While there is no magical number, if the ratio is too high, the organization will be overloaded with poor results likely. If the number is too low, the organization may be missing opportunities to learn and improve, such as focusing on near misses if there are no significant failures.
Regardless of whether problems are related to failures of products, processes or equipment, analytical problem solving requires effective diagnosis, creative solutions and an organizational awareness of traps that keep these from being carried out well. While more complex problems may require multivariate statistics or even simulation or modeling, if the approach taken is not logical and rational, problems not only will not be solved, but people will learn the wrong things. Q
Tech TipsAn important step in diagnosis is developing testable hypotheses for what could have caused the problem.
If a solution is successful, the organization should look at other areas where perhaps problems have not yet been encountered, but where the same risks exist.
A culture that sees problems as inherent to life and the opportunity to learn and self-correct will be much more open to exploring the multitude of issues that may cause them.
For more information on problem solving, read the following articles:
“How To Conduct Effective Root Cause Analysis”
“Managing Changing Expectations: Corrective Action and Root Cause”
“Problem Solving Through the Lean Lens