Last Updated | Changes |
10/5/2023 | First version published |
What are depth maps?
This guide will walk you through a number of simple techniques designed to bring your AI generated images to life!
A depth map is a single channel image that represents the distance of pixels in a scene from the viewer. It’s often used to create 3D images or models from 2D images, and provide information about scene’s depth from an otherwise “flat” 2D image.
They’re usually shades of grey and white, with white representing “higher” (closer to the camera) areas of an image, and darker shades representing those areas farther away – although those colors can be inverted for certain applications.
Some of the features showcased in this guide aren’t truly 100% depth map related, but they’re bundled into the same tools we’ll be exploring, and they are techniques to manipulate 2D images to create a sense of depth.
How can a sense of depth enhance our work?
We can leverage depth maps in a number of ways to produce exciting effects;
- Create animations which give the impression of a third-dimension to our 2D images.
- Create basic 3D models, for import into Blender, or other modeling applications.
- Create stereo side-by-side images for viewing on VR headsets, such as the Oculus Quest.
- Create Anaglyph images (red/cyan) for viewing with “old fashioned” 3D glasses.
Prerequisites
There are many ways to create depth maps from images; websites, standalone image and 3D modeling apps, and extensions to the popular Stable Diffusion interfaces. Some methods allow you to paint your own depth maps manually, onto existing images (see below), but for this guide we’ll be generating our depth maps programatically, in the stable-diffusion-webui-depthmap-script extension for Automatic1111.
We’ll look at generating depth maps in ComfyUI at a later date, but for this guide, you’ll need an up-to-date Automatic1111 WebUI installation, and the aforementioned depthmap-script extension, available from the Automatic1111 Extensions
tab. If you can’t find it in the list of available extensions, it can also be installed from the URL: https://github.com/thygate/stable-diffusion-webui-depthmap-script
If you don’t use Automatic1111 but would like to experiment, you can clone (download) the repository from https://github.com/thygate/stable-diffusion-webui-depthmap-script. Install the requirements.txt
, then run main.py
to launch a Standalone Gradio interface.
Extension Options Walkthrough
There are two ways to interact with the Depth Extension in Automatic1111. If we would like to compute depth maps from existing images, we can navigate to the Depth
tab.
If we would like to generate depth maps at the same time as generating images, we can invoke the extension from the Scripts
dropdown.
The Depth Tab
The Depth
tab can appear intimidating at first glance! There are a lot of options, but we will break them down, below.
At the top of the Depth
tab there’s a space to load an image. We can use any image – it doesn’t have to be AI generated, it could be a photograph – anything works!
Options
The following table explains the function of each checkbox option on the Depth
tab.
Option | Explanation |
---|---|
Compute On – GPU/CPU | If you receive OOM (Out Of Memory) VRAM errors while using the Depth extension you can fall-back on CPU processing! It’s very slow! |
Model | There are currently ten Models which can be leveraged to calculate and produce depth images. Each has advantages and drawbacks. The default model, res101, is based upon AdelaiDepth/LeReS. The others are variations of the MiDaS and ZoeDepth implementations. The most recently added – dpt_beit_large_512 (midas 3.1) has exceptional fidelity – and associated VRAM cost. |
Net Width/Height | Ignored when Boost is activated, the desired size of the depth map output can be set here. Also ignored when Match Input Size is enabled. |
Match net size to input size | Matches the depth map size to the dimensions of the loaded image. |
Boost (multi-resolution merging) | An implementation based upon BoostingMonocularDepth (Github Link), which greatly improves results when using the default res101 model. Much longer compute time when enabled! |
Invert (black=near, white=far) | By default, the depth map output shows white as “nearer” to the viewer. Checking this box flips this, which is useful for certain applications which require black to be the “nearer” color (see Depthy , below). |
Clip and renormalize DepthMap | This allows us to define maximum near (Near Clip ) and far (Far Clip ) threshold values, with everything in-between being renormalized (spread out) between the two. Useful if you need to adjust the depth of the map. |
Combine input and depthmap into one image | When enabled, the depth map output will be stitched/appended onto the original image, based on the Combined Axis selection (see below). When saved as a combination of original image and depth map, the file will be a three channel (RGB), 8 bit per channel, png image. |
Combine Axis – Vertical/Horizontal | See above. |
Save Outputs | This will save the depth map output in the assigned Automatic1111 txt2image directory. |
Output DepthMap | Allows the generated depth map to be shown in the Automatic1111 Gradio interface. |
Generate NormalMap | Generates a normal map image. Each pixel of a normal map encodes information about the direction a surface is facing, and can be used to calculate lighting, and enhance the quality of 3D models. |
Generate stereoscopic image(s) | Generate stereoscopic images, when checked, enables options for the creation of side-by-side (or above-below) stereo images, suitable for use on a VR headset, or Anaglyph images, for use with red/cyan glasses. Note that all stereo image generation uses CPU only. |
Generate simple 3D mesh | Generates a 3D model in the .obj format |
Generate 3D inpainted mesh | Generates a 3D model in the .ply format. This is an extremely slow process! The 3D inpainted mesh can be used to create videos from the Generate Video subtab. |
Generate 4 demo videos with 3D inpainted mesh | Uses the .ply export to create four simple example videos showcasing simple camera movements. |
Remove background | Enables subjects to be identified and backgrounds to be removed from images. |
Configuration – Output Examples
Below are some examples of the results of various configuration options, in practice;
Combine input and depthmap into one image
Generate stereoscopic image(s) – Side by Side
Generate stereoscopic image(s) – Anaglyph
Generate NormalMap
Remove Background
Generate 3D inpainted mesh (for Video generation)
Using Depth Maps in Practice
So what can we actually do with various depth maps, normal maps, side-by-side stereoscopic images, and inpainted 3D meshes generated by the Depth extension? Some workflow examples, below;
Generate Video
Once we’ve generated a 3D Inpainted .ply
mesh, we can generate video with custom camera parameters and movement;
Option | Explanation |
---|---|
Input Mesh | Pre-filled using the last generated 3D Inpainted Mesh output folder. |
Number of frames | The total number of video frames to output. |
Framerate | The desired output framerate. |
Format | Two output formats are available; mp4 and webm |
SSAA | Supersampling Anti-Aliasing, can be used to remove jagged edges and flickering in output videos. The render size is scaled by this factor, then downsampled. |
Trajectory | Trajectory controls the behavior of the camera’s movement. |
Translate: x, y, z | Translate x,y,z numbers control the magnitude of camera travel, and should be adjusted in very strong increments. The first number pertains to the X axis, the second to the Y, and the third to Z (depth/zoom). |
Crop: top, left, bottom, right | Sometimes, due to the movement of the camera, the outer edges of the images can become distorted. Specify values to crop the image by X pixels, as required. |
Dolly | Implements a “dolly-zoom” effect by adjusting the camera FOV as the camera moves along its’ trajectory. |
Create 3D Models
The website Depth Player (external link) is a tool which takes an image, and associated depth map, as input, and produces a Wavefront OBJ
file as output (much like the Generate Simple 3D Mesh option in the Depth extension, but with a little more interactivity).
It’s not a “true” 3D model which can be entirely rotated – we’re generating depth from a 2D image by displacing a a plane mesh. .obj
files are ubiquitous and can be imported into many 3D applications.
Visualize on a 2D Display
The website Depthy (external link) was around for a long time before Stable Diffusion and Generative AI art, but now it’s really useful! First, drag a color image into the Depthy window. We’ll then be prompted to upload a depth map (or manually paint one!).
Images will be instantly viewable in the interactive viewer, displaying subtle movement, which can be customized. Gif, and video, can be exported.
View on a 3D/VR Headset Device
Side-by-side Stereo images
(SBS images) can be viewed on many VR devices, including cell phones running apps like Google Cardboard (external link).
For a much more immersive experience, SBS images can be loaded onto devices such as the Oculus Quest. The example below was generated from the Depth
Extension, loaded to Oculus Quest
, and visualized with the Pigasus VR Media Player (external link).
Note that you, as a viewer, will not experience the effect of depth from the example video!
Import into 3D Modeling Applications
The generated .obj
and .ply
files can be imported into 3D applications, such as Blender (external link), for additional manipulation.
Visualize with 3D Glasses
Anaglyph (red/cyan) outputs can be viewed with cheap 3D glasses (external Amazon link), and visualized somewhat on-screen with this rudimentary anaglyph viewer (external link).
Anaglyph outputs from the Depth extension were used as training images in the creation of the experimental LoRA-3D, txt2img Anaglyph Generator for SD 1.5.