Vision in 3-D allows manufacturers to approach applications that could not be solved with 2-D.

This bin picking robot is one type of 3-D vision application. Source: MVTec


Three-dimensional vision means the utilization of 3-D information with the aid of machine vision, allowing operators to approach applications which so far could not be solved with classical 2-D technologies. It includes two main objectives, which both offer many different technologies:
  • 3-D alignment: Finding the 3-D pose (position and orientation) of an object
  • 3-D reconstruction: Determining the 3-D shape of arbitrary objects


  • From the Task to the Solution

    For any project, in order to find a solution, the customer must identify the specific vision task. First, the customer has to answer the following questions before deciding which 3-D vision method is the right one for the application.
  • What do I want to find out?
  • How accurately do I want to measure?
  • Which are the characteristics of my object?
  • What are the general conditions?

    This analysis is very important because the next step is to decide which technology to use. The different technologies have various characteristics that must match the specific needs of the customer’s application.



  • From left to right:
    a: The calibration plate enables the software to perform the camera calibration.
    b: The software rectifies perspective distortions.
    c: Distances on the caliper can be measured in the presence of perspective distortions. Source: MVTec

    3-D Calibration

    With 3-D calibration, operators establish the relationship between a camera and the environment. For robotics applications, the relationship between the robot and the camera also is determined. With this technology, operators get an explicit and accurate description of the area or line scan camera: A set of so-called internal and external camera parameters map the image coordinates to real world coordinates and vice versa.

    Three-dimensional calibration also includes the correction of lens and perspective distortions. This can be realized by correcting the image, or by measuring in the distorted image and correcting the measuring result. Furthermore, measuring results can be determined in world coordinates and the geometric relationship between the camera and the object can be calculated by the calibration-crucial for robotics applications. All 3-D applications require this technology. Typical examples include bin picking as well as stereo applications.

    Three-dimensional calibration must support both area and line scan cameras. Usually, a specific target is used for calibration, such as a calibration plate.

    Alternatively, self-calibration, without the need of a calibration target, can be used. Three-dimensional calibration should permit, for example, subpixel-accurate measurements up to 1 micron in a field of view of 10 millimeters.

    3-D Matching

    With 3-D matching, it is possible to recognize and determine the 3-D pose of arbitrary 3-D objects with only one camera. This technology is relatively new and was introduced to machine vision about three years ago.

    Three-dimensional matching can be used for 3-D alignment, or finding the 3-D pose of an object, such as within automotive and robotics applications, pick-and-place applications and bin picking. Measuring geometric features on complex 3-D objects after 3-D alignment is another possibility.

    Three-dimensional matching determines the 3-D position and orientation of 3-D objects represented by their computer-aided design (CAD) model. This is done by using the technology of shape-based matching extended to 3-D by using multiple 2-D views of the 3-D object. To gain the highest accuracy, a refinement of the pose is realized by using full 3-D matching.

    Localization of a known object is shown here. Source: MVTec

    Perspective Matching, Circle Pose & Rectangle Pose

    Instead of using the full 3-D shape of an object, for many applications it is possible to restrict the model area to a planar part of the object.

    For arbitrarily shaped object parts, perspective matching allows operators to determine the 3-D pose with only one camera. The model generation should be done by training a sample image of the object, typically only inside a specified region of interest.

    If the object has significant circles or rectangles, the 3-D pose can easily be determined with only one camera. This is done by using the known size of the circle or rectangle to calculate the distance and the tilt angle of the object in respect to the calibrated camera.

    Perspective matching is used for 3-D alignment, as in applications for which the 3-D pose of an object must be found. Examples are automotive and robotics applications, pick-and-place applications and bin picking. A further possibility is the measuring of geometric features on complex 3-D objects after 3-D alignment.

    The software should offer algorithms for both perspective matching and circle and rectangle pose. These methods help find objects easily with only one camera. For perspective matching, the software can provide two different methods suitable for two different classes of objects. Depending on the object’s shape and appearance, software should offer both deformable matching, which is based on shape-based matching technology (object edges), and descriptor-based matching, which uses so-called interest points. In case of circle and rectangle pose, software offers highly accurate methods for the extraction of subpixel contours, either edges or lines. These contours are input for the robust fitting.



    Binocular Stereo

    The 3-D coordinates of the visible points on the object surface can be determined based on two images that are acquired from different points of view. This is done by calculating the disparity map of the calibrated two camera setup.

    Binocular stereo is typically used for 3-D reconstruction, or determining the 3-D shape of arbitrary objects, which is particularly useful for mid-size and large textured objects. It can be used for quality inspection of 3-D objects. Further possibilities are the position recognition of 3-D objects, useful as a pre-processing step for 3-D matching.

    Software must support stereo vision by calculating the 3-D coordinates on the object surface with the aid of a two camera setup. This can be realized by either calculating a dense height map, or by determining the height for specific points or edges-particularly suitable for highly accurate height measuring. Furthermore, if the software supports multi-grid stereo, an advanced method to interpolate the 3-D data in homogeneous image parts is available. This method yields higher accuracy for small objects.

    Localization of a known object. Source: MVTec

    Monocular 2½-D

    There are various so-called active technologies for the extraction of height information with one camera. The resulting 3-D information is very similar to binocular stereo. The most often used methods include:

    Depth from focus (DFF) extracts distance information by calculating the focus of all pixels of the image with a focus series. A small depth of focus is used to calculate the distance of the object’s surface to the camera.

    Sheet of light means measuring an elevation profile of an object by reconstructing the projected line of light on this object.

    Three-dimensional reconstruction with small objects is primarily done by DFF; in terms of objects without texture, the sheet of light method should be preferred. Typical application examples include quality inspection of 3-D objects as well as position recognition of 3-D objects.

    For dense height maps, the software should offer various methods that can be used to process 2½-D images, as when determining object edges or angles between 3-D planes. In the case of sheet of light, highly accurate line- or point-oriented 3-D measurements are available.