Image in modal.

Since the beginning of modern industrial robots in the early 1980s, robots have been guided by machine vision. Originally there were only a few robots with vision, but today it is over 5,000 robots annually in the North American market and significantly more globally. Vision guided robotics (VGR) are robotic application where cameras are used to find the location features on a workpiece and provide the location of that workpiece to the robot so that it can interact with it. Robots interact with the workpiece in many ways, the very common way is for a VGR robot to find and pick a part. Using vision to locate the workpiece allows for flexibility in the way the parts are presented to the robot, often loosening the requirements for expensive fixtures that otherwise position the workpiece in a repeatable location for the robot. VGR is one of the largest growing sectors in both the robotics and machine vision market.

To properly guide a robot based on an image from a camera two things must happen. First the desired target must be found in the image, this is called vision processing. Second, the location of the found target in the image must be provided to the robot in a way that it can use, this is done through calibration. Camera calibration defines the location of the camera with respect to the robot as well as the millimeter to pixel scale of the image. The vision processing tools locate the part in pixel space and the calibration is used to convert its location into millimeter space relative to coordinates that are defined in the robot.

The First Vision Systems

The first vision systems were completely custom, where the software engineer would directly interact with the pixels in the image, to locate the desired characteristics in the image. Likewise, in the early days of VGR, computer programmers would write code in the computer language of choice to locate clusters (blobs) of pixels or predefined shapes like circles or slots. The first step after image acquisition would be to binarize the image so that all the grayscale pixels would be converted to either black or white. Once the target was found, engineers wrote a program to provide the robot its location. These tasks often required a computer programmer with a healthy knowledge of vision algorithms and robot frame math.

The vision guided robotics industry started out with a few applications written to solve specific manufacturing problems as they came up. There wasn’t any general-purpose vision system to guide robots until the mid-1980s.

Originally, programmed predefined shapes could only be found without rotation. This limitation was not an issue for circular targets, but nonsymmetrical targets could not be rotated in the field of view and still be located. Only knowing the X,Y position of the workpiece meant that some fixturing was still used to crowd the part in some ways.

Often, the image was grayscale with very few pixels compared to modern images. The resolutions were commonly 256X256 pixels, which is 65,536 pixels in the image. Compared to images in modern machine vision applications that are many mega pixels, the original number of pixels to work with was extremely limited. A megapixel image is about one million pixels. Some images used in machine vision for robot guidance can be up to 64 megapixels in size, giving them almost one thousand times the number of pixels as some of the original images in the early 1980s. Sixty-four megapixels is still pretty large by today’s VGR standards. Many applications are very successful with a few megapixels. Higher resolutions may only provide marginal improvements in location accuracy or finding reliability. But, when looking for fine details within the image, increased resolution is recommended.

From the mid-1980s to the mid-1990s the advancements in VGR was accomplished with the user interface. Creating a user interface to take the development of VGR applications out of the computer programmer’s hands and putting it into a robot programmer’s hands was the path to expanding vision beyond a few robots per year. The introduction and refinement of packaged software algorithms that automated the implementation of 2D and 3D vision making them easier to set up and more accurate. Allowing the robot programmer to set up an application without having to have an in-depth understanding of the robot math and frames, and how to properly use the results from the vision system allowed for an increased number of successful applications.

VS 0523 Vision M-10iD 12 Ultra-Thin Part Bin-Picking 01. Image Source: FANUC

Calibrating The Camera

There are many ways to calibrate the camera. The two most common techniques were developed by R. Tsai in 1987 and Z. Zhang in 1998. Creating custom calibration routines based on these methods or similar methods takes advanced skills. Vision companies or robot companies developed packaged software that utilized the Tsai Method, Zhang Method, or a similar method to calibrate the cameras. Robot programmers could use the packaged software to calibrate the camera without having to understand the math or theory behind the calibration methods. The tools that were created to calibrate image space to the robot’s workspace using the published calibration methods were paired with robot motion to create automatic calibration routines. Automated routines to iteratively move a robot mounted target under the camera or a robot mounted camera over the target were developed and refined over the years. Through these automatic calibration routines, the robot can “learn” where the camera is.

The vision-guided robotic world changed with the introduction of geometric pattern-based object location technology for machine vision. This allowed for the trained object to be found in grayscale despite changes in angle, size, or shading. With pattern-matching technologies, vision-guided robotics greatly improved. Easily locating targets with complex shapes at varying angles and sizes allowed for many more applications to be achieved relatively easily.

Lighting is and always has been an integral part of a successful vision application. The availability of lower cost, high power LEDs in the early 2010s allowed for many more reasonable lighting solutions. Prior to the mainstream use of high-powered LEDs, lighting for applications was often expensive, cumbersome, flickered, fragile, and dimmed drastically over time. LEDs provided a cost-effective and robust way to get different wave lengths or colors of lights in the camera’s field of view. Lighting colors depend on the application.

Locating objects for the robot’s guidance was originally only in 2D, providing location in X,Y and Roll only. Even today, thanks to gravity, many of the vision-guided robotic applications are in 2D. This is because in many of the applications, the part sits flat on a conveyor, fixture, pallet or in dunnage.

VS 0523 Vision M-10iD 12 Ultra-Thin Part Bin-Picking 02. Image Source: FANUC

Enter Successful 3D Processes

In the mid-1980s, a 3D process was developed to locate partially assembled automotive bodies on an assembly line for robotic dispensing. The 3D location was done with three or four 2D cameras pointing up at the underside of the auto body. The cameras’ optical would roughly converge on a point above the vehicle, like the four edges of a pyramid pointing at the apex. Through triangulation a single 3D solution was able to determine based on the three or four 2D camera locations. This 3D multiview application has remained unchanged for over 35 years. In this example, a solution was developed for a very specific application. While many things have changed about how robots dispense sealer on vehicles, the process of finding them in 3D and guiding the robots to their location remains the same. This 3D multiview approach is not the only way to locate the vehicle in 3D, but after all these years, it remains a viable solution. This 3D approach was specific to the application, general 3D VGR didn’t start to become mainstream for many years later.

Except for specific applications like the 3D multiview dispensing application, using 3D vision to guide robots in all six degrees of freedom was not common until the early 2000s. 3D sensor technology continues to evolve and there are many different 3D camera/sensor technologies. Most common VGR technologies rely on a 2D camera combined with structured light. The structured light can be laser stripes or patterns projected with a projector. Another technology to get 3D data is time of flight and LiDAR cameras. These cameras get 3D image data by calculating the round-trip time of IR light reflecting off the object.

There are many applications that can benefit from 3D VGR, including bin picking. Picking parts randomly placed in a bin or tote is a common need in manufacturing. Since the early 2000s as 3D vision has matured it has been the goal of many parties to implement 3D bin picking. Locating complex parts randomly piled in a bin or tote can be a challenge. Once located, grasping the part and getting it out of the bin without allowing the robot arm or gripper to contact the bin walls or other parts in the bin is a challenge that continues to be pursued and perfected by many parties. Using the 3D CAD model, vision systems can locate any surface of the part then guide the robot to acceptable grasp locations based on both the part and gripper CAD models. With modern 3D vision systems for bin picking, locating the part is often not the limiting factor. The biggest challenge is designing a robot gripper that can grasp the part and remove it from the bin. Properly implemented vision-guided robots for bin picking can be very successful, depending on the constraints. Understanding the constraints of a specific project and engineering a solution to mitigate the issues is the key to success. Simple things, like how the parts nest together is a key consideration that adds to the potential risk of the application. Some applications, like picking a coat hanger out of a jumbled mess of coat hangers, may prove to be a risky endeavor.

The reputation of using vision to guide robots was less than stellar in the early years. Improperly applying the technology and poorly trained users were big issues. The expectation that simply adding a camera and letting vision magically solve expensive fixturing issues lead many vision-guided robotic applications to fail. As vision tools became more robust, lighting became better, and the training/knowledge improved, the reputation of vision-guided robots has improved over the years. The pool of robot programmers with VGR experience continues to grow, which directly impacts the number of successful VGR applications.

VS 0523 Vision M-10iD 12 Ultra-Thin Part Bin-Picking 03. Image Source: FANUC


Developing tools that use AI for VGR is still in its infancy but growing quickly. One goal is for the user to put anything in front of the camera and have the robot properly interact with it. One example would be the robot having the ability to pick anything that it “sees” even if it has never seen a particular item, allowing new part styles to be added seamlessly to a system. In this example, without AI, when the user adds a new part style into their robotic application, they may have to train the traditional vision system to find it. For the future, the hope is that the vision system can simply figure it out without requiring a human to train it.


Vision guided robotics is one of the largest growing sectors in the robotics and machine vision market. It has allowed the robot to perform tasks that would either be impossible without vision or cost prohibitive. Advancements in ease of use in both 2D and 3D user interfaces continue to make VGR applications more efficient to develop and deploy. New technologies like AI also play an important role in the development of new applications and help drive the growth of VGR.