Stereo Photo from Depth Photo

To create a stereo image, you basically need two images. One for the left eye, one for the right. We do this by taking two photos from two different places, the distance between which is usually the same as the distance between the human eyes. But there is no direct access to both pictures from an iPhone camera, and their distance is also too small to use them directly to create a stereo photo.

However, I later realized that there is access to raw depth data for every photo taken in portrait mode. At first, I didn't take time to experiment with it, because that would mean processing every pixel in the image - something I had no experience with it. But it didn't leave my mind, until I finally decided to start playing and having fun with depth data and pixels in the picture.

THEORY OF STEREOSCOPY

If you want to create a 3D effect for your eyes and your brain, you need to work with parallax. It is the distance between images of one object in two pictures that are taken from different places. You can see an example in real life. When you move from left to right and focus on a specific object, other objects that are closer to you are moving from right to left relatively to the focused object. This is called negative parallax. And all distant objects are moving from left to right, an example of positive parallax.

So when we have a static image and its depth map, which is the grayscale image where the black means that this part of the picture is far away and the white is for close objects on the image, we can create pictures with different parallaxes for every pixel in the image to generate one picture for the left eye and one for the right eye. So, the input will be the original image, the depth data and the maximal shift in pixels.

Then, to simulate parallaxes for the left eye picture, we just move the color pixel that has black color in the depth mask (= distant object) for max shift distance to the left, for color pixel that corresponds to white pixel in the depth mask for max shift value to the right. If a pixel has gray color in the depth mask, we just keep it in the same place. Analogically for the right eye picture, but with the opposite direction of shift.

GET THE DEPTH DATA

The first task is to get the depth data from the portrait image on your iPhone. Just create AVDepthData from a dictionary of primitive depth-related information obtained from CGImageSourceCopyAuxiliaryDataInfoAtIndex method. The input of the method is CGImageSource, which can be created from the image's CFData or CFURL or CFDataProvider. Then we can get a CVPixelBufferobject, where we have direct access to all of the depth image map's pixels. However, the values are not between 0 and 1, so the first step in optimizing the input is to normalize these values.
Now we are also able to generate a grayscale image from that pixelbuffer to see or store the depth image. CIImage object can be directly created from CVPixelBuffer and can be easily converted to CGImage or UIImage that is ready to display.

static func getDepthImage(imageData: Data) -> CIImage {
    let source = CGImageSourceCreateWithData(imageData as CFData, nil)!
    let auxDataInfo = CGImageSourceCopyAuxiliaryDataInfoAtIndex(source, 0, kCGImageAuxiliaryDataTypeDisparity) as? [AnyHashable : Any]!
    var depthData = try! AVDepthData(fromDictionaryRepresentation: auxDataInfo)

    if depthData.depthDataType != kCVPixelFormatType_DisparityFloat32 {
        depthData = depthData.converting(toDepthDataType: kCVPixelFormatType_DisparityFloat32)
    }
    
    let pixelBuffer = depthData.depthDataMap
    pixelBuffer.normalize() //custom extension method
    
    let ciDepthImage = CIImage(cvPixelBuffer: pixelBuffer)!
    return ciDepthImage
}

PROCESS IMAGE'S PIXELS

I started to work with pixel buffers. First idea was to go through the image row by row and compute the pixel shift by asking for the color value from the pixel buffer of the depth image.

currentColor = imageBuffer[currentIndex]					//1
relativeShift = depthBuffer[currentIndex] / 255 / 2 - 1		//2
currentPixelShift = maxShift * relativeShift				//3
resultBuffer[currentIndex + currentPixelShift] = currentColor	//4

Get UInt8 color value from current index in buffer
Convert 8bit red color value to float value from -1 to 1
Compute shift distance in pixels for current pixel
Write color to new destination in result buffer

Here it was necessary to solve two issues.

Firstly, numerous different pixels can end up in the same place. When that happens, we need to use the one that is in front of the others, meaning the pixel with the brightest color in the depth map.

Second problem is gaps. As pixels shift by different distances, there are some spots where the color is not defined. Ideally, we fix those gaps with the color of the pixel in the background.

IDEA #1 - SHIFTING LOOP AND FIXING LOOP

Loop through all pixels line by line, compute shift and store the resulting depth of the moved pixel in the new place. Then decide if there is already a pixel in the same place, and how close it is to the camera. We only rewrite the pixel if the resulting depth value for the currently moved pixel is higher than the already-stored depth value.

This pixel reorganization will produce gaps, so we're going to fix them in the second loop. If we find an unspecified pixel, we continue to the first specified pixel and fill the gap with the color of the last specified or current pixel - depending on which one is more distant from the camera (= has darker color in the depth map).

IDEA#2 - FIX GAPS IN ONE LOOP

I wanted to improve my approach by doing the gap fixing in one loop. I realized that when you set the maximal shift as a positive number, meaning that the object in front (white depth color) moves to the right and the objects in the back (black depth color) move to the left, the gaps appear on the left side of the objects that are in front of the others. It is analogous for negative max shift value because when the objects shift to the opposite direction, gaps also appear on the other side. So when you want to fix the gap, you just need to recognize that the gap will be created, go in the right direction and use the last-used color.

You can detect the gap if the last shift was smaller than the current. Then you just need to fill all pixels from the last index to the current with the color used in the last pixel, if you go from left to right in the loop for positive max shift and right to left for negative max shift.

IDEA #3 - PROCESS FRONT PIXEL LAST

Thanks to the previous idea, I realized that I don't need to check the depth of the new pixel if I go through the line of pixels in the right direction. I just need to override every pixel. In this case, I flipped the direction for the line loop. As you go through the line from the right side, the objects in the front are shifting more to the right the closer they are to the objective. That means that if you go from the right side, you meet the pixel that is closest to the objective last - allowing you to rewrite the already-stored pixel for the particular index of the resulting image row.

After that, you need to change the fixing part a bit. Just use the current color to fill the gap, instead of the last one.

I tried to improve and optimize the algorithm. I also tried to pre-compute the depth map image to buffer with the pixel shifts. That could help because the depth data are smaller than the original images (mainly in iPhone depth images), so you are using the same depth value for several color pixels.

I started at a duration of about 1.5 seconds for creating two images from a small image with a resolution of 576x768 (which is the resolution of the depth data from an iPhone) and a huge duration of over 35 seconds for a full-size image in the resolution 3024x4032, even with all those algorithm optimizations. That is unacceptable for image processing, especially knowing that it should be used for video processing, for example.

FINAL SOLUTION WITH CIFILTER

I was looking for another solution. After a while, I found that you can create your own CIFilter using CIKernel function. That sounded as the right and easily-reusable solution.

So what did I need to do? Write a method that returns the color for a particular pixel of the resulting image with some inputs in CIKernel Language. A handy tool is Quarz Composer, where you can easily test your kernel code without writing any line of other code.

WRITE CIKERNEL FUNCTION

But there is one different approach. Until now, I'd gone through all pixels of the source image once to create the resulting image. But now I had to write one separate method that could be used for each pixel of the resulting image. I had to return the color of a pixel that is in a different place in the source picture in the same row, so I had to get the distance to this pixel. The issue is, the distance value and the color pixel are stored in the same location within the depth map picture.

What we know is that the pixel can't be farther than the max shift value. So in an unoptimized algorithm, we can go through all depth map pixels in one row around the current pixel and check if the distance from the pixel is the same as the distance that is computed from the depth map value. If yes, we have the right position of our pixel and we can ask for color in the source image.

Of course, there can be more pixels with a distance that corresponds. We have to find the right one - the one that is in front of all others.

From our previous research, we know that for the positive max shift the pixel that is in front moves from left to right the most. That means that the first pixel from the left, with a shift value that corresponds to the distance to the current pixel in the row, is the one we need. It is analogous for the negative shift, we only loop through neighbouring positions from right to left.

Now we can test it using Quarz Composer. Just resize the mask image to the same size as the original image and see what happens. This method works almost in real time because it is processed on GPU, where all operations are optimized for working with images and computing image-related mathematical operations in parallel.

Now, what about all the gaps between pixels? They show that we didn't find the color for the pixel. The shift value computed from the depth map is not equal to the distance from the currently-processed pixel location. If we go from left to right for positive max shift through the current pixel's neighboring pixels, it means that we are also going from possibly closer pixels to farther ones. So their shift distance is greater and in one step, we meet the pixel that is not equal but is lower than the distance. Isn't this the pixel we would like to use for filling the gap? Yes, it is. We want to fill gaps with the color of the background. I simply changed the equality condition to lower-than or equal-to, and it worked really well. All gaps are fixed and the image is complete.

kernel vec4 stereoImageFromDepth(sampler image, sampler depthMaskImage, float increment, float maxShift) {
    vec2 oldPoint = samplerCoord(image);	
    vec2 currentDepthPoint = samplerCoord(depthMaskImage);
    
    if (maxShift > 0) { //positive shift
        float d = -maxShift;
        while (d <= maxShift) {
            vec2 depthPoint = currentDepthPoint;
            depthPoint.x = currentDepthPoint.x + d;		//1
            float dFT = sample(depthMaskImage, depthPoint).r * 2 - 1;	//2
            float shiftFT = dFT * maxShift;		//3
            if (-shiftFT <= d) {			//4
                vec2 newPoint = oldPoint;
                newPoint.x = oldPoint.x - shiftFT;		//5
                vec4 color = sample(image, newPoint);		//6
                return color;
            }
            d = d + increment;
        }
        
    } else { //negative shift
        float d = -maxShift;
        while (d >= maxShift) {
            vec2 depthPoint = currentDepthPoint;
            depthPoint.x = currentDepthPoint.x + d;
            float dFT = sample(depthMaskImage, depthPoint).r * 2 - 1;
            float shiftFT = dFT * maxShift;
            if (-shiftFT >= d) {
                vec2 newPoint = oldPoint;
                newPoint.x = oldPoint.x - shiftFT;
                vec4 color = sample(image, newPoint);
                return color;
            }
            d = d - increment;
        }
        
    }
    
    return vec4(1,1,1,1);
}

Compute new destination for depth image
Convert depth map color value to range from -1 to 1
Compute pixel shift distance
Check if shift distance corresponds to the distance from current pixel
Compute new destination for original image
Get color of matched pixel in original image

The bigger the max shift value is, the slower the method is because we need to loop through the larger pixel's surroundings. Yes, we could somehow optimize the surrounding's searching loop. But on the other hand, we only need to generate two images for the left and the right eye that has a positive and negative max shift, and that max shift value is relatively small. For a beautiful stereo effect, only 1% of image width works really well. That means that we are now able to generate a stereo image from an iPhone full-resolution image and depth data in a fraction of a second, and that is enough for our purpose.

CREATE CIFILTER

Finally, we just need to create CIKernel with Kernel Function to use it in CIFilter. As I started with creating CIKernel using the method makeKernels(source: String), XCode started to warn me that “this method is deprecated since iOS 12” and I need to use Metal Shading Language. I was little bit disappointed that I needed to rewrite the method in a different coding language. But after some quick research, I realized rewriting a few types and method annotations was all that's needed.

The metal code must be written in .metal file and then two flags in build settings finish the job. They build a default.metallib file, which is then used for creating CIKernel(functionName:String, fromMetalLibraryData: Data). It is an advantage against the CIKernel Language because that has to be built in runtime every time it is used. The first flag is MTL_COMPILER_FLAGS = -fcikernel, you can find this property in build settings as Other Metal Compiling Flags. The second is MTLLINKER_FLAGS = -cikernel , which has to be created as a User-Defined Setting.

What are those changes in Metal Shading language? Not such a big deal. There is a wrapper around the function itself with a header for including some libraries:

#include <metal_stdlib>
using namespace metal;
#include <CoreImage/CoreImage.h> // includes CIKernelMetalLib.h

extern "C" { namespace coreimage {
//METHODS HERE
}}

Then, only the syntax changed.

vec2 -> float2
vec4 -> float4
sample(image, samplerCoord(image)) -> image.sample(image.coord())

That's it for our code. Nothing else.

For creating custom CIFilter, we just need to:

1.Subclass CIFilter
2.Define input properties
3.Create CIKernel
4.Override outputImage property where you need to:
a.Convert input CIImages to CISamplers
b.Apply the CIKernel and pass input parameters

class CIStereoShiftMetal: CIFilter {
    @objc dynamic var inputImage: CIImage!
    @objc dynamic var inputDepthImage: CIImage!
    @objc dynamic var inputIncrement: NSNumber!
    @objc dynamic var inputMaxShift: NSNumber!
    
    private let kernel: CIKernel = {
        let url = Bundle.main.url(forResource: "default", withExtension: "metallib")!
        let k = try! CIKernel(functionName: "stereoImageFromDepth", fromMetalLibraryData: codeData)
        return k
    }()
    
    override var outputImage: CIImage? {
        guard img.extent == depthImg.extent else {
            return nil
        }
        
        let imageSrc = CISampler(image: inputImage)
        let depthSrc = CISampler(image: inputDepthImage)
        
        let image = kernel.apply(extent: imageSrc.extent, roiCallback: { (point, rect) -> CGRect in
            rect
        }, arguments: [imageSrc, depthSrc, inputIncrement, inputMaxShift])
        
        return image
    }
}

The custom CIFilter works only as a wrapper around CIKernel. You are then able to use it as you're used to. You just need to guarantee that the source image and depth data image are of the same size. Apple creates a special filter for that purpose, allowing you to upscale image respecting edges of the source image. That filter is called CIEdgePreserveUpsampleFilter and has two parameters: inputSmallImage and inputImage, which defines the target size and is used for improved upscaling of a small image.

So, here is the code regarding how to compose CIFilters:

static func createStereoPairFromImageAndDepthMask(image: CIImage, depthImage: CIImage) -> (leftImage: CIImage, rightImage: CIImage) {
    let upsampleFilter = CIFilter(name: "CIEdgePreserveUpsampleFilter")!
    upsampleFilter.setValue(depthImage, forKey: "inputSmallImage")
    upsampleFilter.setValue(image, forKey: kCIInputImageKey)
        
    let upsampledDepthImage = upsampleFilter.outputImage!
    
    let shiftF: Double = 0.01
    let increment = 1 / Double(image.extent.width)
        
    let shiftFilter = CIStereoShiftMetal()
    shiftFilter.setValue(image, forKey: "inputImage")
    shiftFilter.setValue(upsampledDepthImage, forKey: "inputDepthImage")
    shiftFilter.setValue(NSNumber(value: increment), forKey: "inputIncrement")
    shiftFilter.setValue(NSNumber(value: shiftF), forKey: "inputMaxShift")
        
    let leftImage = shiftFilter.outputImage!
        
    shiftFilter.setValue(NSNumber(value: -shiftF), forKey: "inputMaxShift")
        
    let rightImage = shiftFilter.outputImage!
    
    return (leftImage, rightImage)
}

WE'RE DONE

Finally, we are able to create a stereo image pair from a 2D image, with its depth map in relatively easy code, in a fraction of a second - thanks to Metal and GPU processing. I implemented this into my app, called 3D Photo, so you are now able to create a 3D image and view it or export it using a few different stereoscopic formats. If you want to find the app in the App Store, just search for: stereo photo maker.

Perhaps you can now better imagine other uses of depth maps or CIFilters built on top of Metal code. Hopefully, this article will inspire you to dive deeper into this topic. That's one of the reasons I wanted to write this “guide” - inspiring others to experiment would make me very happy.

We're hiring