For quality control workers, nothing could be more frustrating than knowing the key to unlock a particular problem is available, but that it is lost somewhere in a mountain of data. This voluminous data has been collected by a touch from a coordinate measuring machine, the click of an optical camera, the beep of a bar code scanner or from some of the multitude of other test measurement and inspection routines that are daily occurrences in modern factories.

For many companies, the data generated from these quality control efforts and filtered through statistical analysis soft-ware represents too much information stored in too many places for any one person to grasp. For them, data mining could be the answer. Data mining is the process of finding patterns and correlations in large amounts of data. Software from a handful of companies combine pattern recognition with statistical and mathematical techniques to root through these millions of data values looking for patterns.

This is a step beyond statistical process control because it looks for more than what happened in the past; it tries to forecast the future through a process called predictive modeling. These software programs extract data from a variety of islands of information, as DataNet Quality Systems (Southfield, MI) calls them, including data warehouses such as ERP, MES, SCADA, HMI and SPC; databases including Oracle, MS-SQL and Sybase; flat files such as Excel, Lotus 1-2-3, Quattro Pro, ASCII text files, CSV, equipment data files, and Web sites and Web pages as well as corporate intranets.

What once would have required reams of reports, some of which may have taken weeks to compile, can now be accessed by the click of a mouse. The only impediment might be the quality of the data or the imagination of the person deciding what types of data to include in the analysis. From here, the power of the Internet and company Intranet can allow this information to be disseminated throughout the country or the world.

Digging up savings
More than a decade ago, LTV Steel (Cleveland) was a data mining pioneer that won awards from the Data Warehouse Institute by customizing off-the-shelf software. The company was looking to use some of the massive amounts of data it collected. For instance, LTV found that producing one coil created 70,000 bytes of information in 60 files, and the company made thousands of these weekly.

In one application, the company saved $10 million by reducing defects such as roll marks in sheet metal rolls. By analyzing which furnaces were producing defects and comparing that data with life cycles of the furnaces, maintenance schedules and other disparate factors, the company was able to predict when furnace bases would deteriorate. By understanding this cycle, it was able to adjust maintenance and replacement schedules.

More recently, Robert Bosch Corp., an international automotive parts manufacturer, integrated data mining at its Anderson, SC, facility where it makes electronic control units for anti-lock brakes. As each control unit is produced, approximately 450 quality control tests are performed, resulting in millions of bytes of data that are collected each day, according to a case study from the SAS Institute (Cary, NC).

The company uses a Data Collection, Analysis and Reporting System (DCAR) based on over-the-counter software that allows it to create a data warehouse and put the data into recognizable order. The system allows users to make subsets of the data with pull-down menus and query screens. Reports such as pareto charts and histograms are available by drilling down into the database through these interactive menus.

"It goes pretty fast," says Bosch engineer K.C. Podd. "Users can look at any value in the system in no more than seven or eight seconds."

For instance, Bosch found a problem with a particular pallet of products. The company knew the approximate time the problem material was produced, but nothing more. Before DCAR, Bosch would have scrapped the entire pallet. "But using our data, we were able to identify the affected parts and we were able to save about 80 percent of the pallet," Podd relates.

While these are examples of companies using production data to find problems, many companies use the same kind of data for marketing, to find customer patterns so they know better what products their customers need and when they need them. John Deere (Waterloo, IA) can better forecast tractor sales with a pattern-based approach to data analysis. The data mining technique produces more accurate forecasts than traditional approaches, because it uses more parameters in the patterns it uses.

Going global
Manufacturing in the 21st century is a global enterprise. Outsourced parts can be produced at any point around the world and these components need to have the same high quality standards as parts built in a manufacturer's own plants.

Dana Corp. (Toledo, OH), an automotive manufacturer, is rolling out a world-wide supply chain quality monitoring system in 2001. The company has in-vested in a Web-based portal system that lets it track internal quality as well as the quality of products from its suppliers. Part per million defect rates are monitored in near real-time across the supply chain, regardless of whether the parts were made in-house or were sourced out.

Using this information, Dana is seeing reductions in time, costs and rejects. In addition to increased resource utilization and a return on software product investment in less than six months, Dana has decreased its inbound supplier part per million defect metrics by almost 50%. Reduced supplier defects translates into additional bottom-line savings during production, increased product yields and higher customer satisfaction.

"Dana wanted a real-time solution to monitor supplier metrics on a global basis, utilizing existing technology and systems to reduce overall costs," says John Black, a software consultant who implemented Dana's data mining activities. "Selection criteria also included low cost Web-based technology, extensive data mining, Six Sigma analytics, custom reporting, per-sonalized GUI [graphical user interface] and central administration, enterprise-wide."

A key to data mining success is the quality of the data. As the old saying goes, "garbage in, garbage out." Considering that by some estimates, the size of the world's data is doubling every 20 months, and that an average Fortune 500 company, such as Ford, Boeing and other manufacturers, can generate up to 500 million documents annually, it is vital that the data collected is accurate.

UT Automotive (Dearborn, MI), a United Technologies company that supplies automotive components, wanted to improve quality while cutting costs across its 90 plants, according to the Data Warehousing Institute (Seattle, WA). The problem was that there was no consistency in the data being collected; because different parts of the company were collecting differing sets of data, the company was was having difficulty finding patterns that could be relied upon with any certainty. By standardizing how and what data was collected, and organizing it in a data warehouse data analysis software, UT Automotive was able to produce in days reports that used to take months to assemble. Quality metrics such as repairs per 1,000 units, rates of test failures, delivery performance and other information that couldn't be tracked before are now tracked and shared companywide.

It's all here
A data warehouse is a collection of data that has been extracted from operational databases and then massaged to remove redundant data and bring in any additional data needed. Manufacturers can use these data to look at a function in total. For any given part, for example, a full-scale data warehouse would include data on the material the part was made of, the machine it was made on, how long the machine that made the part had been running between tool changes or services, the employee running the machine and a host of other parameters. These data warehouses can be set up as a central depository where infor-mation from legacy systems and other sources are stored. Or, individual data warehouses, some-Arial called data marts, can be established for distinct purposes, each loaded directly from legacy systems and other data sources.

Finding information that is buried within data warehouses, generally falls into two categories. The first is drill-down analysis such as online analytical processing (OLAP). This allows users to pose general questions and then drill down into the underlying data to get more detail. This tends to produce answers about past results and doesn't necessarily forecast future results.

The second involves the data mining tools, which have been called the next step beyond OLAP. Among the data-mining techniques are neural networks that mimic the brain's ability to learn from its mistakes, time-series analyses that make year-to-year comparisons and tree-based models that are branching systems that show relationships in the form of a hierarchy, such as an organizational chart, according to Steve Alexander, writing in Infoworld.

Keep it separate
Separating data, also known as data stratification, groups data into categories of concern. This adds meanings to a seemingly random group of data and allows a trial and error approach in which categories are identified and totaled, according to Peter Mears, Ph.D., in his book Quality Improvement Tools.

"Stratification involves the separation of data into categories," says Mears. "It breaks down a whole category into smaller, related subgroups to identify possible causes of a problem. Stratification can be used to identify which categories contribute to the problem under analysis."

One of the strengths of data mining is its ability to come up with the right combination of factors that will lead to design of experiments (DOE). With the abil-ity to find patterns and cor-relations, new experiments with processes can be undertaken.

The data that companies create and store in data warehouses, databases, spreadsheets, electronic text files, paper files, on file cards and in notebooks stored in file drawers represent valuable input that can be used during the "knowledge and discovery" phase of a designed experiment, according to Tom Pyzdek, a quality consultant whose book, The Complete Guide to Six Sigma, (Quality Publishing, 1999), deals with DOE in the new manufacturing environment. According to Pyzdek, instead of laboriously slogging through all that data piece-by-piece, companies should harness software, frequently neural net software, to do the number crunching.

Using large amounts of data from a toy manufacturing client, for ex-ample, Pyzdek used a neural net program to sort through products and parts data by vendor, material, and inspection result, and arrive at likely starting points for designed experiments. Using patterns gleaned from this historical product data, the quality professional can run experiments analyzing real-time data for the same process variables, ideally conducting a much faster, more targeted search for the source of the variation.

A neural net tree
Another type of neural net software program enabled Pyzdek to produce a "classification tree" model for his toy manufacturing data. This experiment used data that had to do with the failure-strength of small parts, which, if pulled off the toy, could present a choking hazard for small children. The information was entered into the software. The data included failure modes, types of parts, vendors, materials used, assembly methods and test methods. The classification tree that the program produces resembles a Pareto analysis and offers another way to present and analyze data patterns.

By finding these patterns, quality control engineers can dig for answers to their problems without having to spend countless hours referring back from one report to the next. What's more, they can feel more confident that the solutions they devise will lead to processes that are reliable and repeatable, while saving time and money through reduced defect and scrap rates.

WHAT IS OLAP?
The term Online Analytic Processing (OLAP) refers to technology that allows users of multidimensional databases to generate online descriptive or comparative summaries of data and other analytic queries.

Despite its name, analyses referred to as OLAP do not need to be performed online or in real-time. The term applies to analyses of multidimensional databases through efficient "multidimensional" queries that reference various types of data. OLAP facilities can be integrated into corporate database systems and allow analysts and managers to monitor the performance of business functions such as manufacturing processes.

The final result of OLAP techniques can be as simple as frequency tables, descriptive statistics and simple cross-tabulations or more complex, such as seasonal adjustments, removal of outliers and other forms of cleaning the data.

Although data mining techniques can operate on any kind of unprocessed or even unstructured information, they can also be applied to the data views and summaries generated by OLAP to provide more in-depth and often more multidimensional knowledge. In this sense, data mining techniques could be considered to represent either a different analytic approach that serves different purposes than OLAP, or as an analytic extension of OLAP.

Source: StatSoft Inc.