[SWIFT] Try shaking your hands with ARKit + Metal

I will show you how to use ARKit and Metal to shake your hands.

The finish looks like this. 20201015_qiita用_ショート.gif

Way of thinking

Use ARKit's People Occlusion.

People Occlusion gives you a humanoid mask texture that distorts the mask and camera image at the same time to get a distorted composite image. It feels like cutting out while distorted.

You can get an image like this (it is distorted more strongly than the video for clarity)

You can make the above video by superimposing the distorted hand image and the original image.

procedure

** 1. Get the official Apple sample **

Since the process of obtaining a humanoid mask texture using People Occlusion is written in Apple's official sample, I will write it based on this.

Effecting People Occlusion in Custom Renderers

** 2. Pass the elapsed time to the shader **

To make it sway, it is necessary to give the elapsed time to the transformation formula.

Declare a struct to pass to the shader.

Renderer.swift


struct Uniforms {
    var time: Float = 0
}

Next, declare a variable to manage the elapsed time and a start time.

Renderer.swift


class Renderer {
    var uniforms = Uniforms()
    private var startDate: Date = Date()
    var uniformsBuffer: MTLBuffer! //Buffer to pass to shader

Then, in the compositeImagesWithEncoder method that passes the information to the shader, we will pass the elapsed time together.

Renderer.swift


uniforms.time = time
uniformsBuffer = device.makeBuffer(bytes: &uniforms, length: MemoryLayout<Uniforms>.stride, options: [])
uniformsBuffer.label = "UniformsBuffer"

Prepare the same struct as the Swift side on the shader side and receive it as an argument of the function. The argument name is myUniforms.

Shaders.metal


struct Uniforms {
    float time;
};

fragment half4 compositeImageFragmentShader(CompositeColorInOut in [[ stage_in ]],
                                    texture2d<float, access::sample> capturedImageTextureY [[ texture(0) ]],
                                    texture2d<float, access::sample> capturedImageTextureCbCr [[ texture(1) ]],
                                    texture2d<float, access::sample> sceneColorTexture [[ texture(2) ]],
                                    depth2d<float, access::sample> sceneDepthTexture [[ texture(3) ]],
                                    texture2d<float, access::sample> alphaTexture [[ texture(4) ]],
                                    texture2d<float, access::sample> dilatedDepthTexture [[ texture(5) ]],
                                    constant SharedUniforms &uniforms [[ buffer(kBufferIndexSharedUniforms) ]],
                                    constant Uniforms &myUniforms [[buffer(kBufferIndexMyUniforms)]])
{

** 3. Rewrite shader **

In the official sample, rewrite the compositeImageFragmentShader function of Shaders.metal as follows.

Shaders.metal


@@ -219,8 +397,9 @@ fragment half4 compositeImageFragmentShader(CompositeColorInOut in [[ stage_in ]
     half4 sceneColor = half4(sceneColorTexture.sample(s, sceneTexCoord));
     float sceneDepth = sceneDepthTexture.sample(s, sceneTexCoord);

+    float2 modifier = float2(sin(cameraTexCoord.y + myUniforms.time*5)*0.2, 0); //Deformation formula
     half4 cameraColor = half4(rgb);
-    half alpha = half(alphaTexture.sample(s, cameraTexCoord).r);
+    half alpha = half(alphaTexture.sample(s, cameraTexCoord + modifier).r); //Transform the humanoid mask

     half showOccluder = 1.0;

@@ -233,8 +412,11 @@ fragment half4 compositeImageFragmentShader(CompositeColorInOut in [[ stage_in ]
         showOccluder = (half)step(dilatedDepth, sceneDepth); // forwardZ case
     }

+    float2 displacedUV = sceneTexCoord + modifier; //Transform the image

-    half4 occluderResult = mix(sceneColor, cameraColor, alpha);
+    half4 displacedCol = half4(sceneColorTexture.sample(s, displacedUV)); //Acquisition of deformed humanoid image
+    half4 occluderResult = mix(sceneColor, displacedCol, alpha); //Combine the deformed image with the original image
     half4 mattingResult = mix(sceneColor, occluderResult, showOccluder);
     return mattingResult;
 }

This is where it matters.

Shaders.metal


float2 modifier = float2(sin(cameraTexCoord.y + myUniforms.time*5)*0.2, 0); //Deformation formula

The y coordinate of the input image plus the elapsed time is given to the sin function, and this is assigned to the modifier. The modifier is a variable to add to the camera and humanoid mask, and this time the formula is contained only in x, so only the x-axis direction will fluctuate.

In addition, since the actual video fluctuates in the vertical direction, it contradicts the above formula, but this is because the iPhone recorded in the landscape state was converted vertically with image editing software.

I wrote about the deformation of figures in this article, so please see here as well.

How to transform ARKit and SceneKit shapes with Metal shader

finish

I also uploaded a video on Youtube. Finished video (Youtube)

The source code is here.

Finally

Note regularly publishes about iOS development, so please follow us. https://note.com/tokyoyoshida

Twitter has simple tips. https://twitter.com/jugemjugemjugem

Recommended Posts

Try shaking your hands with ARKit + Metal
Optical camouflage with ARKit + SceneKit + Metal ①
Web browsing with ARKit + SceneKit + Metal
Optical camouflage with ARKit + SceneKit + Metal ②
Everyone is a super saiyan with ARKit + Metal
Try running the Embulk command with your Lambda function
Try DI with Micronaut
Try create with Trailblazer
Try WebSocket with jooby
Try WildFly with Docker