A few years ago, while visiting a successful United States manufacturer of networking and communications equipment, the host provided a tour of the facility including a walk through their programming center filled with equipment from many vendors. A colleague asked, “Do you ever find circuit boards on the factory floor that contain devices mis-programmed with the wrong software?” Our host’s response was, “It happens all the time.” It is surprising that a gap in Quality methods of this apparent frequency was accepted as matter-of-fact. In an era when Six Sigma and statistical methods have been widely deployed to drive out process variation throughout the factory, here was an example of a leading manufacturer that seemingly had no demonstrable control of the programming process for loading firmware into their products. The circuit boards containing the mis-programmed devices had to be identified during test, thus reducing first-pass yield, and required rework before delivery to final assembly.
While one might suspect that this firm’s experience is an anomaly-a deviation from good manufacturing practice isolated to a few firms-these gaps in Quality methods are far more common than one could have expected. The sophisticated Quality methods that industry has deployed have been focused primarily on hardware-things that are visible-that are built by their factory. However, the software is generally built outside of the factory and then released to the factory electronically to be loaded into the device or product as a data file. In this way, it circumvents the normal Quality control processes used with hardware production so mistakes in the loading of the data file are not found until circuit boards fail to operate correctly. In most cases, the result is scrap or rework in the factory, but in some cases the defect is not discovered until the product is in the field.
A leading wireless handset manufacturer recently delivered 200,000 phones to a well-known Fortune 50 service provider in the US. Half of the phones operated correctly; the other 100,000 did not and had to be returned to the manufacturer. The subsequent investigation found that the phones were built on two production lines. One line was using the correct programming algorithm; the other wasn’t. The expense to identify which phone was built on which production line and then to return the incorrectly programmed phones to the factory for rework ran into the millions of dollars. This error could have been identified earlier or eliminated with the use of a simple data collection plan and a statistical test such as Chi-Square to show the difference in production output from the different lines.
A typical firmware supply chain is shown in the following flow chart.
Many firms assume that because their software, or source code, is under revision control during development and release they have a Quality process. Unfortunately, their process may fail to account for the multitude of mistakes that are possible when loading the software into a semiconductor device shown as Production Programming in the flow chart. Production Programming involves selecting the appropriate data file to be loaded into the semiconductor device, selecting the appropriate algorithm for the device, actual setup of the programming equipment, physical handling of the devices during programming and operation of the programming equipment. Mistakes in any of these may lead to defective devices on the circuit board or to devices that contain the wrong firmware.
The following table identifies a number of the common mistakes that are possible at this step that lead to scrap and rework, and if not detected during test, may lead to field failures.
While these 15 are the most common mistakes, more than 40 potential mistakes have been identified. Although many of the common mistakes are easily identifiable and have minor consequences, some of them are not easily detected and have severe financial consequences in terms of the scrap and rework. In cases where the mistakes are not detected until products are deployed at the customer’s location, the damage to a firm’s reputation may be more significant than the cost of the corrective action. This is an opportunity to utilize a Cause and Effects Matrix or Failure Mode Effects Analysis (FMEA) to plan for what could go wrong and put prevention in place based on the severity, occurrence and ease of detection.
The following chart graphs the frequency and impact of the fifteen common mistakes from the table above.
While these data do not have statistical significance since they are drawn by one programming equipment manufacturer from the experience of its customers in a variety of industries and countries, we have no reason to believe that they are not representative of many other firms’ experiences as well. As you can see from this chart, some of the frequently recurring mistakes have substantial impact (1, 7 and 10). With respect to mistakes in handling data files (item 1 in the chart above), an electronics manufacturing firm in China recently told us that they used more than 100 types of data files and “sometimes the operator just programs the wrong data.”
These errors tend to occur most often when the firm is using a manual process to load the firmware into the semiconductor device or circuit board. Automated methods for loading firmware can eliminate many of these mistakes and particularly the ones that have the greatest financial impact. Also, automation provides a means for closing the loop around the process achieving effective process control by generating log files that can tie revision levels of the firmware to manufacturing dates or serial numbers. This provides a high-integrity means to ensure configuration control and traceability of the finished product that extends beyond hardware to include the firmware. Yield and trend information can also be extracted from the log files. One automotive electronics manufacturer located in the Southeast of the USA uses automated equipment to load the firmware that also connects to bar-code scanning equipment. This Poka-Yoke (mistake proofing) enables them to ensure that the algorithm and data file are the correct ones, both for the semiconductor device and for the circuit board.
Automated methods for loading firmware also provide a means for remotely monitoring the programming process even when it is conducted in factories halfway around the world. Statistical information, such as knowing the number of devices being programmed each day, enables a firm to track production levels, yield, and other vital manufacturing data including downtime to ensure effective control regardless how disintegrated the supply chain.
A recent IBM survey of 1,130 CEOs of electronics manufacturing companies found that 43% of them were worried about counterfeiting and piracy, but it appears there is a disconnect between Mahogany Row and the factory floor as the programming process presents one of the greatest vulnerabilities with respect to the potential loss of a firm’s intellectual property. With manual programming, multiple copies of the firmware reside on multiple sets of equipment in an unprotected state and subject to theft. Using automated programming solutions, however, it is possible to encrypt data files and transmit them to remote manufacturing sites securely. Once there, the files remained stored in an encrypted state on a secure server. Access to the data files can be restricted to authorized personnel. Files remain encrypted until decrypted inside of the automated programming equipment where they are not easily accessed by operators.
System level software incorporated in the automated equipment also has the means to program encrypted serial numbers that contain vital information about the manufacturing process such as when, where, who. Even if a manufacturer is unable to prevent the theft of its intellectual property, it can later read back the encrypted serial number to ascertain when and where the loss occurred. This is particularly useful if a firm subcontracts production to facilities outside of its direct control.
The point of production programming (deployment) is also the easiest place in the software life cycle (requirements, design, coding, testing, deployment and operations) to add malware without the risk of detection. Malware has become so commonplace that a recent study found that each U.S. adult had a 66% chance of experiencing at least one data exposure in 2008. We are now beginning to see the increasing risk with wireless handsets as they begin to take over many of the functions normally performed by personal computers and store sensitive data. Rich Cannings, the security leader for the Android operating system (OS) recently said, “The smartphone OS will become a major security target. Attackers can already hit millions of victims with a smartphone attack, and soon that number will be even larger. I think this will become an epiphany to malware authors.” The point of production programming must be protected since malware injected at this point circumvents the extensive filtering for spam and malicious content provided by the wireless carriers. Fortunately, the same methods that protect the software from theft at this point also provide protection from the injection of spurious content.
Automotive electronics systems share the same vulnerabilities to the theft of software IP and injection of malware as wireless. As the electronic component units (ECUs) within the vehicle proliferate and the data communication between the vehicle and the external world increases, the exposure is magnified. There is no reason to assume that automotive electronics systems will not become future targets for criminal activity.
The production programming process of loading firmware into the semiconductor device must be controlled if the Quality and integrity of the software content in the outgoing product is to be assured. Automation of this process, while not a panacea, can substantially reduce the potential for mistakes that result in scrap and/or rework. Automation also provides for the real-time monitoring of yield and job statistics to ensure effective process control. Security software can also be added to program encrypted serial numbers to further ensure traceability and configuration control over the firmware supply chain throughout the software life cycle.