Integration Corner: New Vision Apps
Most of us are familiar with the incredible continuing advances in computing power on the desktop. Faster and faster clock speeds give us more and more raw processing capability-in most cases, more raw processing capability than we know what to do with.
In 1998, I developed a new set of machine vision libraries that made use of the multimedia-processing extension (MMX) instruction sets on Intel central processing units (CPUs). These instructions allowed me to perform image processing operations in real time that I never would have even attempted before.
Now, 12 years down the road, clock speeds are 10 times faster, and most PC components have kept pace, allowing us to perform operations that used to take 10 milliseconds (mS) in a single one.
Furthermore, the processors we are using have four or eight cores, allowing us to perform many operations 40 or 80 times faster-theoretically.
Back in reality, there are two problems. First, using multiple cores is not always easy. Multiple cores in a CPU are almost like independent computers. You cannot go four times faster on a four-core processor unless you can break your application into four equally “hard” chunks that can keep all of those cores busy. Our applications have to be specifically designed to want, and to use, all of those cores.
The second problem is the fact that most applications do not really need to go faster. The speeds we hit back in 1998 were adequate for 80% of the machine vision jobs in the commercial space. So from a business perspective, adding all of this processing capability might not really be worth even worrying about. For example, if your car could go 200 mph, would that really be much of a benefit driving across town?
But it turns out that there are common problems in machine vision that can be made to use all of that capability. Instead of the 200 mph car, we will use four 50 mph buses and move a lot more people around. Here are two of the most common problems that my company works on regularly and which benefit from this new processing world.
Difficult Pattern Recognition JobsFactory automation and automated commercial imaging jobs usually have tight processing time budgets. The gizmo goes by, you take one or more images of it, and within milliseconds, or a few seconds at most, all of the analysis must be complete. Results logged, decisions made, bad parts funneled off into the rework bucket.
One of the most cumbersome and computationally intensive operations is pattern matching, and regardless of which technique is used, processors work hard to perform these calculations.
If there are several patterns to look for, things get very expensive. This happens in situations where the patterns may be rotated and scaled. But the situation gets much worse when several objects need to be searched for, or when the orientation of the target can vary; the appearance of those patterns from different angles basically turns them into different recognition problems. How do we deal with this?
One approach is to develop a 3-D geometrical model of the target and use advanced mathematical analysis to attempt to pull the patterns out of the images. That is interesting, but probably not how humans do it-and we can still put any computer-based pattern matcher to shame in these situations.
So how about a more bio-inspired approach? Since we have oodles of processing power and multiple cores, why not just pre-store the patterns in many different orientations and look for all of them?
Since we have processing power to burn, in this way we can make it useful. And we solve the problem of dividing our application amongst the available cores since each of the searching operations is basically an independent task anyway. We simply search for the patterns in many different forms and choose the solution that reports the highest confidence in finding its match.
OCROptical character recognition (OCR) is another chore that can benefit from multiple cores and fast processors. Most industrial OCR applications make use of specially designed fonts that are carefully printed to keep OCR read rates high. But what about OCR in the real world?
For example, license plate recognition is a popular application now. The plates are seen in variable lighting, from varying angles, and in the United States come from at least 50 states. There are at least 80 fonts in use, looking back historically over the past 20 years. Furthermore, the number of fonts and symbols is increasing as more and more states allow odd vanity symbols and plate designs.
Even in industrial applications, there have traditionally been many situations where the font or illumination of the label was not well-controlled, and these applications often have been solved either poorly or not at all.
The solution? Again, by attempting to read in many orientations and with many fonts simultaneously on multiple cores, a high-accuracy job of recognition can be done on a standard PC. Since each font and orientation reads as best it can and reports a confidence level, choosing the solution highest confidence not only reads the label, but also tells us a great deal about orientation and scale.
After you get the hang of the concept, you will see how many machine vision applications can be recast for a parallel processing, multicore architecture. Add some standard-interface GigE cameras to a traditional Windows or Linux server and your hardware image acquisition and processing setup is complete.