Intelligent automation systems for the plant floor increasingly rely on robotics and machine vision, generally known as Vision Guided Robotics (VGR), to provide the flexibility and reliability demanded by manufacturing environments—now and in the immediate future. This discussion provides some practical information on VGR concepts, as well as some basic implementation techniques that can help ensure a successful VGR application.

In short, VGR refers to a robot that “sees” using location and other information gained from machine vision technology. When VGR is applied in plant floor automation, the robot commonly is used to pick an object from a previously unknown position, and/or place the object at a location that might not be consistent, enabling a wide range of applications that a “blind” robot could not do.

VGR for flexible factory automation has evolved even within the past several years, and to some, the capabilities might even be “unrecognizable” compared to the options available as little as ten years ago. The value of VGR in flexible automation is significant, and while the following few implementation techniques don’t cover all of the issues involved in machine vision and robotic integration, they can provide a good starting place in the successful design and execution of your next VGR application.

General Architecture

One aspect of the general physical component architecture for VGR systems is camera placement. The system could feature an imaging device that is carried by the robot, or one that is fixed-mounted. It might be intuitive to visualize the robot carrying an imaging device, or a camera mounted somewhere on a robot arm, and those are common configurations. However, market statistics from various sources suggest that there are more VGR systems where the camera or sensor is fixed rather than robot mounted.

For a given application, there sometimes is single available choice for camera placement. Here are a couple of things to consider though when there might be more of a design option.

An implementation reason for fixed imaging is efficiency: when the robot carries the imaging device, the arm must move over the field of view and take a picture before making a subsequent guided move.  If the camera can be fixed, the imaging and position analysis can be done while the robot is making other moves, essentially executing machine vision in the background.

Conversely, a robot mounted camera might be more flexible, and could provide more imaging options, often over a smaller field of view that yields better image resolution. This implementation technique can benefit many applications. For example, if the robot can carry the camera to different areas on a larger object, multiple feature locations can be combined to gain greater overall location precision. Or, for smaller parts, it might be possible to use more than one field of view (within the lens depth of focus) to initially target a single object in a group of many, then move the camera over that object and get a more precise location using a smaller field of view.

Considerations for Lighting and Imaging

When machine vision is used for inspection, the object or feature often is positioned with reasonable repeatability in front of the cameras, and the system will have very specific, dedicated illumination to ensure the best feature contrast for the details or defects being inspected. However, when machine vision is used for robotic guidance, the objects or features might be anywhere within a reasonably constrained but possibly large area. Or, the system must locate many objects within a single field of view—not common in general vision inspection. And, often the reason for using a robot at all is to handle larger parts that would be difficult or dangerous for humans to pick up and move. These common characteristics of a VGR application drive special implementation considerations for lighting design and camera optics and placement.

Lighting design for VGR can be a compromise between best practice techniques widely discussed for general purpose machine vision, and the realities of having a large, moving piece of automation directly in or around the part or field of view that has to be uniformly illuminated.

When a field of view is smaller, one might prefer to use illumination near the object to get the best feature contrast. But, leaving room for the robot and/or tooling can be a challenge. Unless a robot-mounted camera architecture, including the illumination source, is feasible, it might not be possible to use a small dedicated light near the object. The resulting implementation compromise usually involves having light sources farther away, sometimes in less desirable orientations.

When a field of view is very large due to part size or because many parts must be imaged (for example, a wide conveyor with multiple objects or a large tray that holds multiple nested parts), the challenge is in providing enough illumination to cover the area consistently. In many situations, lighting design for VGR requires significantly more powerful illumination because at larger light standoffs, not as much light reflected from the part is returned to the camera.

Important, also, is the area of lighting coverage. Consider this principle of lighting: when the light source is near the camera, technically it requires a light that is twice the length and width of the part to achieve uniform illumination over the full part area.  Informed design and careful testing is needed to ensure proper and reliable lighting for VGR applications involving larger areas or parts.

Likewise, camera and lens selection are driven by the physical realities of robotic application. Fields of view often need to be quite large in VGR applications. Yet, there still can be constraints with regards to the camera distance from the object(s) to be imaged. As such, it’s common to see wider angle lenses used in VGR applications.  The result is the possibility of added error at the outer portion of the field of view due to perspective distortion—something that cannot usually be resolved purely by software. It’s best to try to use the highest possible focal length lens with a longer camera standoff wherever feasible.

A related camera implementation concept, and one that causes a lot of discussion or perhaps argument, is that of camera resolution and accuracy for VGR. Here are some things to consider: Robots are very precise (repeatable), but not as accurate. The difference is important to implementing VGR. A robot will move to a taught point along a taught path very repeatably. Moving to an arbitrary designated point along a varying path though, requires accuracy. VGR is a technology that provides those arbitrary move points to a robot, but the resulting movement might not be as accurate as the points determined by the machine vision system.

In implementation, then, location accuracy delivered by say, a 20 megapixel camera, possibly will not result in better overall composite accuracy in guidance for many VGR applications. Furthermore, algorithmically, VGR often is a location task, not a measurement or fine feature differentiation task. The types of “overdetermined” algorithms that do location based upon many edges or points (e.g. pattern matching or blob centroid analysis) don’t provide that much more accuracy when one doubles or even triples the edge count. When doing location on larger objects and/or features (for example, a few hundred pixels or so in the camera field of view), usually a lower resolution camera, and therefore a cheaper and faster camera, could be sufficient.

2-D or 3-D Guidance

Although it’s difficult to find actual market numbers, anecdotally 2-D guidance makes up a strong majority of VGR tasks in industrial automation. Two-dimensional vision guidance technology is mature and well proven. Properly integrated, 2-D VGR is a highly reliable, robust, and accurate tool for robotic automation.

Machine vision for 2-D VGR provides the location of an object in planar coordinates; in a practical sense, a plane that is parallel to the camera sensor or field of view. Relative to the camera view, this produces a point with X, Y, and Rotational components only.

Two-dimensional guidance is suitable in a VGR application most any time the object to be located can be presented without random variation in yaw and pitch (variations in the angles of the part surface relative to the surface it’s resting on), and when the distance from the object to the camera (more accurately, from the object to the application frame, discussed later) is consistent. In 2-D location, when these things vary from part to part, error can be introduced to the returned location for robot guidance. At that point, it might be an application for 3-D guidance.

Machine vision for 3-D VGR delivers the location and full “pose” of an object. By definition, a 3-D point is simply an X, Y, and Z coordinate in space. However, in most cases the location information provided for robotic guidance in three dimensions must cover all axes of movement of the robot: the X,Y, Z location as well as the angles around those axes (commonly known as yaw, pitch, and roll—W, P, R; though the order of these could be different for your robot). It might be readily apparent then, that while 3-D machine vision imaging produces an image of points with X,Y, Z locations (known as a “point cloud”), further image analysis and processing is needed to get the associated W,P,R vectors for that point. That information must be built on knowledge of neighboring, related points in the point cloud. (Technically, a minimum of two other 3-D points must be used.)

The techniques and components for creating a 3-D image vary widely. Practically though, it ultimately is the capability of the image analysis and feature location tools that drive the success of the 3-D VGR application. Sometimes, too much design consideration is spent on the component, and not enough on software algorithmic capability. The implementation key here is that one specific tool does not fit all 3-D application conditions. Some recent “buzzwords” in 3-D location of random parts are “deep learning” and 3-D model matching. Excellent technologies in some cases. For some parts, though, all that is required is an oriented pick point with angles based simply on neighboring features. Evaluate the needs of the application relative to available guidance tools.

Three-dimensional guidance for VGR usually is suitable in situations where parts are presented to or grasped by a robot in random orientations, and an accurate position of the part in three dimensions needs to be known. An extreme example of this is stacked or binned parts. Applications in random bin picking are widespread, and 3-D machine vision is a key enabling technology for these tasks.


Robot arms in industrial applications generally do one thing, and do that very well. They move while carrying something. They are able to go to a point and lift or place an object (for assembly, packaging, palletizing/de-palletizing, etc.), or otherwise carry and move an object or tool in a (usually) pre-trained path (for example, part machining or finishing, welding, applying materials such as paint or sealers, carrying and using tools, etc.). One of the biggest robotic challenges in a VGR application is gripping.

A discussion of all grip techniques is well beyond the scope of this discussion. But there are a few points that should be considered for successful implementation. First, gripping can and often does introduce error in the VGR process. For those applications where extreme accuracy is required, it is important to evaluate the way the part is grasped by the robot to make sure the action of the grip does not slightly change the perfectly fine location point provided by the machine vision system. Even a vacuum cup can and often does end up holding a part at a point or orientation that slightly differs from the target coordinates. One very practical implementation solution is to roughly grasp a part, with or without guidance, then present the part to a secondary camera to “fine-tune” the location of the part once it is solidly held.

Calibration, Offsets, and Coordinate Systems

To conclude our discussion, let’s briefly look at some of the most commonly misunderstood and perhaps confusing issues in robot guidance: calibration, offsets and coordinate systems. We’ll start with the last of these first.

Robots must function within known coordinate systems (or “frames”) which define the relative points to which the robot must move.  The part of the robot that moves to any point is a “tool center point” (or “tool frame”). While there always are defaults for both, it is possible and even likely that the robot will operate in a user-defined coordinate system; an arbitrary subset of the default coordinate system that will have its own origin and vector angles, and might use multiple tool frames.

It would be very limiting for a robot programmer if a vision guided robot could only function within a specific frame. A flexible machine vision system for VGR should be able to manipulate and transform points relative to multiple user frames and/or switch between frames somewhat seamlessly.

It can be similarly limiting if the robot were required to grip the part at exactly the point located by the machine vision system. This is where the concept of offsets and reference positions comes into play. When a location point is configured for a specific object or feature in the machine vision system, for best flexibility, that point can be just a “target” or “reference” point. The robot is taught to go to a more convenient real-life point on the feature or object, and the difference between the reference and the real point is the “offset.” In actual implementation working with offsets is more convenient because the real-life point can be re-taught at any time, and the applied offset will still ensure an accurate location relative to the machine vision reference point.

Finally calibration is the most fundamental step in implementing VGR. A machine vision system delivers locations in the pixel space of the camera. Therefore, a suitable calibration has to be applied that maps the camera pixels to the desired robot coordinate system. The technique for doing this varies widely, and ease of use in guidance applications in some part depends on the level of complexity involved in arriving at a precise mapping. Some machine vision solutions require a manual operation, some offer templates to guide the process, and others have a seamless integration between robot and camera that provides full automation of the calibration. Consider what will be best for your implementation and compare calibration techniques of target VGR components.

Here’s hoping that these and other implementation hints can make VGR work for you in your flexible automation applications. V&S