Three-dimensional imaging is a rapidly expanding segment of the machine vision market. New methods and improvements to old methods are quickly saturating the market to fill this need. Proven methods such as structured light, stereo vision, time of flight, and laser profiling are all effective methods in their own right. Another, more current method uses a random pattern generator projected onto the surface of an object in addition to traditional stereo vision.

As many companies expand into the 3D imaging space for the first time, we must help define and refine important questions when assisting a client in setting up a 3D imaging system.


The first factor to define is: What area is the client looking at? Consider the size of the object, and how large of an area you want to capture in 3D space (field of view). The field of view of a stereo vision system is the distance between the two cameras (baseline), the angle the cameras are adjusted towards each other (vergence angle), how far away the camera is from the subject (working distance), and an optical property of the camera lenses called focal length. Many pre-packaged stereo vision cameras have the same baseline, and have different models depending on the vergence angle and the focal length. 

These factors also define what level of Z-accuracy the system will achieve. Is this enough for the client? Three-dimensional camera manufacturers will provide these numbers for the parameters given above.

Consider also the XY resolution of the cameras in the system. A system will need to have multiple pixels on a surface to accurately measure its distance from the camera. A safe number to use is six to nine pixels on a surface of the object. If the user is looking at very small parts and needs high Z-accuracy, consider higher XY resolution cameras or decreasing the working distance.


Understanding the nature of the material that the client is looking at is key to determining the feasibility of the application.

Specular reflection provides a challenge when using stereo vision. As any machine vision professional could tell you, reflective surfaces are difficult to image if you have little control over the environment. 

In stereo vision with projected light, this can be an issue since the primary light source is at the same angle as the two cameras. A simple solution is to place the camera at an angle that is not parallel to the target surface. A second solution is to use multiple stereo vision cameras. Where one camera will catch the glare of the light source, another at a different angle will not. This second camera will also provide more complete 3D data than a single camera. 

If the projector or primary light source is separate, set up a dark field light source, which has an angle of incidence less than 45 degrees. Adding a polarizing filter to the cameras will help with reflective surfaces, though they also decrease the amount of light passing to the sensors.

Some plastics or other materials absorb and reflect different wavelengths of light. For example, near-infrared light passes through cellophane or clear wrapping plastics. If the surface of cellophane is the target of the measurement, consider changing the wavelength of the light source to blue visible light or another wavelength that reflects off of the subject material.

Stereo vision systems look for similar features between two camera images. If the subject surface lacks distinctive features (ex. a smooth, single color wall), no unique features will be found, and there will be no reliable matches. Introducing an unknown projected pattern gives these types of surfaces an artificial difference in pixel values for the algorithm to match.

Scenes or objects with very dark and light objects (high dynamic range) tend to be very challenging for any vision system. If the subject parts are dark on a white background, consider darkening the background if you need it as a height reference. Removing this bright object will help automatic exposure in the camera to adjust better to the subject of the image.


Stereo vision systems are well suited for grabbing moving objects, as the technology does not rely on known movement of an object. If the object is moving quickly through the system’s field of view, stick with a single image capture and lower the exposure time to compensate for its movement.

If your object and camera can be stationary, consider a system that can shift its projected pattern and aggregate multiple images of the same object. This will result in more depth accuracy, but increased computational time.


Once you’ve considered the subject of the 3D vision system, consider how the camera system can be adjusted in software, and the hardware that will compute the 3D calculations. Most pre-packaged stereo vision systems will come with an Application Programming Interface (API). The API can be used to change parameters that related to how the two cameras perceive the 3D world.

In a stereo vision system using semi-global matching, the cameras are a known baseline distance away from each other at a vergence angle facing each other to overlap their fields of view. The distortion is removed from the resulting images, and the images are rectified to provide two images in the same co-ordinate system.
The algorithm searches for matching features between these images, and notes the distance (in pixels) a matching feature can be found. This distance is referred to as a disparity. The system can use its calibration data to convert these disparities into the distance away that feature is from the camera.

Adjusting where the algorithm searches will greatly affect the throughput of the system. To do this, change the number of disparities, and the maximum disparity. These translate into the depth range that the algorithm searches in.

Most APIs also give you control over coarse and fine disparity change penalties, and the Left-Right Consistency Check, which help improve the quality of data output from the system. Consider changing these values when surfaces are not accurately portrayed in the disparity map. 

Post-processing options may include uniqueness adjustments to remove unreliable data, smoothing filters, speckle filtering to remove invalid small surface regions, or filling to correct for missing data in an otherwise planar surface.

The disparity map can be computed on the camera itself, or on a PC. Experimentation is key in determining the power of the PC that the application needs, but start by following the recommendations from the camera’s designers. Tend towards choosing the latest generation processors, as many broaden their instruction sets to include better optimization for the 3D calculations.

A large factor in saving time is to adjust where the algorithm is calculating within the image by adjusting disparity settings. Changing the number of disparities and the minimum disparity of the algorithm will greatly affect the throughput of the system. 

Once the data has been collected and output as an STL or PLY file or a simple depth map, it’s time to put it to work. Many machine vision imaging libraries have some 3D algorithms to get real-world results from this data. 


Putting these steps into a practical example, a client is looking to set up a stereo vison system for a bin picking application. They are looking at semi-reflective metal parts averaging around 30mm x 30mm x 10mm placed in a bin measuring 550mm x 550mm x 250mm. The system has about five seconds to pick a part and place it on the belt. The client can place the system from 600mm to 1000mm away from the bottom of the bin. These parts have unique shapes at sizes as small as 4mm x 4mm x 4mm.

To find the X, Y, and Z-accuracy, use the six to nine pixels per feature rule. This yields a resolution of 1.78mm in the XY plane with the same 1.78mm target Z-accuracy.

Working backwards from the field of view and working distances given, a camera with 8mm focal length lenses, and a 4° vergence angle at a working distance of 800mm away will just give you the field of view needed.

Mounting the stereo vision system on the robotic gripper arm will provide you with multiple angles, and reduce the potential for specular reflection. Since the parts are not moving, the system can grab multiple images with a shifted pattern in each position that the robotic arm is in as it surveys the scene. This is required for the Z-accuracy, and will give better results in the presence of specular reflection.

The timing for this application is not too demanding at five seconds, but it will limit the amount of image pairs that can be captured. Adjust the disparity settings to limit the calculations to the 200mm Z-depth of the bin, but be sure to account for the depth at different angles. The disparity map calculated can then be transferred to a machine vision library for pattern matching. Experimentation is the best way to confirm the speed of the system.


Three-dimensional imaging is an important enabling technology in the machine vision arena. Understanding the strengths and limitations of this technology will increase your probability of successful integration. Properly setting up the system by taking into account the subject area, the subject material, the movement, and the software will enable you to fully harness the power of stereo vision to solve a variety of 3D measurement and robotic guidance applications. V&S