Apple’s Depth Pro Model Creates 3D Maps from 2D Images in Seconds

Apple’s Depth Pro Model Creates 3D Maps from 2D Images in Seconds

Apple's Machine Learning Research team has created a foundational AI model for "zero-shot metric monocular depth estimation." Depth Pro allows for the rapid generation of detailed 3D depth maps from a single 2D image.
“Depth Pro synthesizes high-resolution depth maps with unparalleled sharpness and high-frequency details”

Apple’s Machine Learning Research team has created a foundational AI model for “zero-shot metric monocular depth estimation.” Depth Pro allows for the rapid generation of detailed 3D depth maps from a single 2D image.

Our brains interpret visual information from two sources—our eyes. Each eye captures a slightly different perspective of the world, and these views merge into a single stereo image, with the variations aiding in our perception of object distance.

Many cameras and smartphones use a single lens to capture images, but developers can create 3D depth maps by utilizing metadata from 2D photos (such as focal lengths and sensor details) or by analyzing multiple images.

However, the Depth Pro system bypasses these methods and can generate a detailed 3D depth map at 2.25 megapixels from just one image in 0.3 seconds using a standard graphics processing unit.

AI Model Architecture and Depth Estimation

The architecture of the AI model features a multi-scale vision transformer that processes both the overall context of an image and fine details like hair, fur, and other intricate structures. It can estimate both relative and absolute depth, allowing applications such as augmented reality to accurately position virtual objects in physical spaces.

This AI achieves these results without requiring intensive training on specific datasets, utilizing a technique known as zero-shot learning—defined by IBM as a machine learning approach where an AI can recognize and categorize unseen classes without labeled examples. This makes it quite adaptable.

Potential Applications of Depth Pro

In terms of applications, in addition to the mentioned AR capabilities, Depth Pro could enhance photo editing efficiency, facilitate real-time 3D imagery with a single-lens camera, and assist autonomous vehicles and robots in perceiving their surroundings more effectively in real time.

The project is currently in the research phase, but, rather unusually for Apple, the code and supporting documentation are being released as open source on GitHub. This enables developers, scientists, and coders to advance the technology further.

Researchers have published a paper detailing the project on the Arxiv server, and they offer a live demo for anyone interested in experiencing the current version firsthand.

Read the original article on: New Atlas

Read more: Erasable Display Cube Shows 2D, 3D, and Animated Images

Share this post

Leave a Reply