[SWIFT] Tout le monde est super Saiyan avec ARKit + Metal

Dans le processus d'enquête sur Échantillon Apple d'occlusion de personnes, j'ai essayé de faire un effet lorsque je suis devenu un super saiyan.

Image complète Article référencé: ・ Appliquer des effets uniquement au corps humain avec ARKit3 et Metal ・ Effet lumineux avec Swift + Metal ・ Maîtriser le GPU avec MetalKit

Comment créer une aura

Il existe différentes variantes d'aura (?) Lors de la recherche d'images de Super Saiyans. Dans cet article, j'ai fait référence à l'aura immédiatement après que Goku soit devenu un super-Saiyan sur Namek (parce que cela semblait facile à faire sans déchirure. L'aura jaune le rend plus brillant à mesure qu'il se rapproche du corps humain).

La base est exemple Apple ici.

① Bordez le corps humain Agrandissez la partie du corps humain et supprimez la couleur de la partie originale du corps humain pour en faire un bord. ② Flou ① ・ En agrandissant le flou jaune et le flou blanc plus petit, il deviendra plus lumineux à mesure que vous vous rapprochez du corps humain. ③② et l'image capturée par la caméra sont combinées

① Bordez le corps humain

En tant que méthode de bordure épaisse, nous avons adopté une méthode d'agrandissement de l'image du corps humain et d'effacement de la couleur d'origine de la partie du corps humain. Traité par Compute Shader.

kernel void matteConvert(texture2d<half, access::read> inTexture [[ texture(0) ]],
                         texture2d<half, access::write> outWhiteTexture [[ texture(1) ]],
                         texture2d<half, access::write> outYellowTexture [[ texture(2) ]],
                         uint2 gid [[thread_position_in_grid]]) {
    
    uint2 textureIndex(gid);
    if (inTexture.read(textureIndex).r > 0.1) {
        //Aucune couleur sur le corps humain
        outWhiteTexture.write(half4(0.0), gid);
        outYellowTexture.write(half4(0.0), gid);
        return;
    }
    
    //Expansion
    constexpr int scale = 15;
    constexpr int radius = scale / 2;
    half color = 0.0;
    
    for (int i=0; i<scale; i++) {
        for (int j=0; j<scale; j++) {
            uint2 textureIndex(gid.x + (i - radius), gid.y + (j - radius));
            half alpha = inTexture.read(textureIndex).r;
            if (alpha > 0.1) {
                color = 1.0;
                break;
            }
        }
        if (color > 0.0) {
            break;
        }
    }

    outWhiteTexture.write(half4(color, color, color, 1.0), gid);
    outYellowTexture.write(half4(color, color, 0.0, 1.0), gid);
}

Si chaque pixel de l'image du corps humain se trouve à moins de 15 pixels vers le haut, le bas, la gauche et la droite, il est agrandi en le définissant comme "corps humain". (Il boucle 225 (= 15x15) fois par 1px, donc la charge est très élevée. Il serait préférable de réduire la résolution avant le traitement car elle sera floue de toute façon, mais je ne pourrais pas faire de mon mieux. Je le ferai un jour) En rendant la partie qui était à l'origine un corps humain incolore (noir), le résultat est une image de bordure. La sortie est stockée dans le tampon de texture préparé pour chacun des blancs et des jaunes.

Résultat de la bordure (confirmé avec Capture GPU Frame de Xcode12)

② Flou ①

Blur utilise Metal Performance Shader (Image Filter). Le Metal Performance Shader peut être facilement combiné avec votre propre shader.

//Changer la taille du flou avec le temps
time += 1
//Flou (blanc)
let whiteIntensity = Int((sin(Float(time)/3) + 2) * 30) | 0x01  //Un nombre impair doit être spécifié pour la taille de MPSImageTent.
let kernel1 = MPSImageTent(device: device, kernelWidth: whiteIntensity, kernelHeight: whiteIntensity)
kernel1.encode(commandBuffer: commandBuffer,
              inPlaceTexture: &whiteBlurTexture!, fallbackCopyAllocator: nil)
//Flou (jaune)
let yellowIntensity = Int((sin(Float(time)/3) + 2) * 100) | 0x01
let kernel2 = MPSImageTent(device: device, kernelWidth: yellowIntensity, kernelHeight: yellowIntensity)
kernel2.encode(commandBuffer: commandBuffer,
              inPlaceTexture: &yellowBlurTexture!, fallbackCopyAllocator: nil)

Il existe différents filtres pour Metal Performance Shader, mais après en avoir essayé, «MPSImageTent» semble être le meilleur, alors je l'ai adopté. Le flou blanc est petit, le flou jaune est grand et la taille du flou change avec le temps.

③② et l'image capturée par la caméra sont combinées

Ici, il suffit d'ajouter l'image capturée par la caméra et l'image après avoir flouté le bord (blanc, jaune).

fragment half4 compositeImageFragmentShader(CompositeColorInOut in [[ stage_in ]],
                                            texture2d<float, access::sample> capturedImageTextureY [[ texture(0) ]],
                                            texture2d<float, access::sample> capturedImageTextureCbCr [[ texture(1) ]],
                                            texture2d<float, access::sample> whiteColorTexture [[ texture(2) ]],
                                            texture2d<float, access::sample> yellowColorTexture [[ texture(3) ]],
                                            texture2d<float, access::sample> alphaTexture [[ texture(4) ]])
{
    constexpr sampler s(address::clamp_to_edge, filter::linear);

    float2 cameraTexCoord = in.texCoordCamera;

    // Sample Y and CbCr textures to get the YCbCr color at the given texture coordinate.
    float4 rgb = ycbcrToRGBTransform(capturedImageTextureY.sample(s, cameraTexCoord), capturedImageTextureCbCr.sample(s, cameraTexCoord));

    half4 cameraColor = half4(rgb);
    half4 whiteColor = half4(whiteColorTexture.sample(s, cameraTexCoord));
    half4 yellowColor = half4(yellowColorTexture.sample(s, cameraTexCoord)) * 2.0;
        
    return cameraColor + whiteColor + yellowColor;
}

Le résultat de l'addition fonctionne sans serrage.

Code source complet

· Rapide

`ViewController.swift`


class ViewController: UIViewController, MTKViewDelegate {
    
    var session = ARSession()
    var renderer: Renderer!
    
    override func viewDidLoad() {
        super.viewDidLoad()
        
        if let view = self.view as? MTKView {
            view.device = MTLCreateSystemDefaultDevice()
            view.backgroundColor = UIColor.clear
            view.delegate = self
            renderer = Renderer(session: session, metalDevice: view.device!, mtkView: view)
            renderer.drawRectResized(size: view.bounds.size)
        }
    }
    
    override func viewWillAppear(_ animated: Bool) {
        super.viewWillAppear(animated)
        let configuration = ARWorldTrackingConfiguration()
        configuration.frameSemantics = .personSegmentation
        session.run(configuration)
    }
    
    func mtkView(_ view: MTKView, drawableSizeWillChange size: CGSize) {
        renderer.drawRectResized(size: size)
    }
    
    func draw(in view: MTKView) {
        renderer.update()
    }
}

`Renderer.swift`


let kMaxBuffersInFlight: Int = 3

let kImagePlaneVertexData: [Float] = [
    -1.0, -1.0, 0.0, 1.0,
    1.0, -1.0, 1.0, 1.0,
    -1.0, 1.0, 0.0, 0.0,
    1.0, 1.0, 1.0, 0.0
]

class Renderer {
    let session: ARSession
    let matteGenerator: ARMatteGenerator
    let device: MTLDevice
    let inFlightSemaphore = DispatchSemaphore(value: kMaxBuffersInFlight)
    var mtkView: MTKView
    
    var commandQueue: MTLCommandQueue!
    var imagePlaneVertexBuffer: MTLBuffer!
    //État du pipeline pour la composition finale de l'image
    var compositePipelineState: MTLRenderPipelineState!
    //État du pipeline pour agrandir les images du corps humain
    var computeState: MTLComputePipelineState!
    //Texture d'image capturée
    var capturedImageTextureY: CVMetalTexture?
    var capturedImageTextureCbCr: CVMetalTexture?
    var capturedImageTextureCache: CVMetalTextureCache!
    //Texture d'image du corps humain
    var alphaTexture: MTLTexture?       //Image du corps humain
    var whiteBlurTexture: MTLTexture!   //Une texture qui agrandit et brouille l'image du corps humain en blanc
    var yellowBlurTexture: MTLTexture!  //Texture agrandie / floue de l'image du corps humain en jaune
    //Taille de l'écran
    var viewportSize: CGSize = CGSize()
    var viewportSizeDidChange: Bool = false
    //Calculer la taille du groupe de threads de shader lors du traitement d'images du corps humain
    var threadgroupSize = MTLSizeMake(32, 32, 1)
    //Nombre d'animations
    var time = 0
    
    init(session: ARSession, metalDevice device: MTLDevice, mtkView: MTKView) {
        self.session = session
        self.device = device
        self.mtkView = mtkView
        matteGenerator = ARMatteGenerator(device: device, matteResolution: .half)
        loadMetal()
    }
    
    func drawRectResized(size: CGSize) {
        viewportSize = size
        viewportSizeDidChange = true
    }
    
    func update() {
        _ = inFlightSemaphore.wait(timeout: DispatchTime.distantFuture)
        
        let commandBuffer = commandQueue.makeCommandBuffer()!
        //Empêchez les textures capturées par l'appareil photo d'être libérées pendant le rendu
        var textures = [capturedImageTextureY, capturedImageTextureCbCr]
        commandBuffer.addCompletedHandler { [weak self] commandBuffer in
            if let strongSelf = self {
                strongSelf.inFlightSemaphore.signal()
            }
            textures.removeAll()
        }
        //Acquisition de texture de capture de caméra (Y, CbCr 2)
        guard let currentFrame = session.currentFrame else { return }
        let pixelBuffer = currentFrame.capturedImage
        if CVPixelBufferGetPlaneCount(pixelBuffer) < 2 { return }
        capturedImageTextureY = createTexture(fromPixelBuffer: pixelBuffer, pixelFormat: .r8Unorm, planeIndex: 0)
        capturedImageTextureCbCr = createTexture(fromPixelBuffer: pixelBuffer, pixelFormat: .rg8Unorm, planeIndex: 1)
        //Définir les coordonnées UV en fonction de la taille de l'écran
        if viewportSizeDidChange {
            viewportSizeDidChange = false
            // Update the texture coordinates of our image plane to aspect fill the viewport
            let displayToCameraTransform = currentFrame.displayTransform(for: .portrait, viewportSize: viewportSize).inverted()
            
            let vertexData = imagePlaneVertexBuffer.contents().assumingMemoryBound(to: Float.self)
            for index in 0...3 {
                let textureCoordIndex = 4 * index + 2   //kImagePlaneVertexData est les coordonnées des sommets(x,y) +coordonnées uv(u,v)Il est devenu. Parce que UV est défini+2
                let textureCoord = CGPoint(x: CGFloat(kImagePlaneVertexData[textureCoordIndex]), y: CGFloat(kImagePlaneVertexData[textureCoordIndex + 1]))
                let transformedCoord = textureCoord.applying(displayToCameraTransform)
                //Image capturée
                vertexData[textureCoordIndex] = Float(transformedCoord.x)
                vertexData[textureCoordIndex + 1] = Float(transformedCoord.y)
            }
        }
        //Acquisition d'images du corps humain
        alphaTexture = matteGenerator.generateMatte(from: currentFrame, commandBuffer: commandBuffer)
        //Je veux que l'effet du flou paraisse plus grand que le corps humain, j'ai donc agrandi l'image du corps humain. De plus, des textures pour deux couleurs, blanc et jaune, sont générées.
        if let width = alphaTexture?.width, let height = alphaTexture?.height {
            let colorDesc = MTLTextureDescriptor.texture2DDescriptor(pixelFormat: .bgra8Unorm,
                                                                     width: width, height: height, mipmapped: false)
            colorDesc.usage = [.shaderRead, .shaderWrite]
            whiteBlurTexture = device.makeTexture(descriptor: colorDesc)
            yellowBlurTexture = device.makeTexture(descriptor: colorDesc)

            let threadCountW = (width + self.threadgroupSize.width - 1) / self.threadgroupSize.width
            let threadCountH = (height + self.threadgroupSize.height - 1) / self.threadgroupSize.height
            let threadgroupCount = MTLSizeMake(threadCountW, threadCountH, 1)
            
            let computeEncoder = commandBuffer.makeComputeCommandEncoder()!
            
            computeEncoder.setComputePipelineState(computeState)
            computeEncoder.setTexture(alphaTexture, index: 0)
            computeEncoder.setTexture(whiteBlurTexture, index: 1)
            computeEncoder.setTexture(yellowBlurTexture, index: 2)
            computeEncoder.dispatchThreadgroups(threadgroupCount, threadsPerThreadgroup: threadgroupSize)
            computeEncoder.endEncoding()
        }
        //Changer la taille du flou avec le temps
        time += 1
        //Flou (blanc)
        let whiteIntensity = Int((sin(Float(time)/3) + 2) * 30) | 0x01  //Un nombre impair doit être spécifié pour la taille de MPSImageTent.
        let kernel1 = MPSImageTent(device: device, kernelWidth: whiteIntensity, kernelHeight: whiteIntensity)
        kernel1.encode(commandBuffer: commandBuffer,
                      inPlaceTexture: &whiteBlurTexture!, fallbackCopyAllocator: nil)
        //Flou (jaune)
        let yellowIntensity = Int((sin(Float(time)/3) + 2) * 100) | 0x01
        let kernel2 = MPSImageTent(device: device, kernelWidth: yellowIntensity, kernelHeight: yellowIntensity)
        kernel2.encode(commandBuffer: commandBuffer,
                      inPlaceTexture: &yellowBlurTexture!, fallbackCopyAllocator: nil)
        //Image capturée + composite flou (blanc / jaune)
        guard let renderPassDescriptor = mtkView.currentRenderPassDescriptor, let currentDrawable = mtkView.currentDrawable else { return }
        let compositeRenderEncoder = commandBuffer.makeRenderCommandEncoder(descriptor: renderPassDescriptor)!
        compositeImagesWithEncoder(renderEncoder: compositeRenderEncoder)
        compositeRenderEncoder.endEncoding()
        
        commandBuffer.present(currentDrawable)
        commandBuffer.commit()
    }
    
    func loadMetal() {
        commandQueue = device.makeCommandQueue()

        let imagePlaneVertexDataCount = kImagePlaneVertexData.count * MemoryLayout<Float>.size
        imagePlaneVertexBuffer = device.makeBuffer(bytes: kImagePlaneVertexData, length: imagePlaneVertexDataCount, options: [])
        //Cache d'images capturées par la caméra
        var textureCache: CVMetalTextureCache?
        CVMetalTextureCacheCreate(nil, nil, device, nil, &textureCache)
        capturedImageTextureCache = textureCache
        //Pipeline composite d'image capturée par caméra + image du corps humain
        let defaultLibrary = device.makeDefaultLibrary()!

        let compositePipelineStateDescriptor = MTLRenderPipelineDescriptor()
        compositePipelineStateDescriptor.sampleCount = 1
        compositePipelineStateDescriptor.vertexFunction = defaultLibrary.makeFunction(name: "compositeImageVertexTransform")!
        compositePipelineStateDescriptor.fragmentFunction = defaultLibrary.makeFunction(name: "compositeImageFragmentShader")!
        compositePipelineStateDescriptor.colorAttachments[0].pixelFormat = .bgra8Unorm
        try! compositePipelineState = device.makeRenderPipelineState(descriptor: compositePipelineStateDescriptor)
        
        //Calculer le shader pour les bordures du corps humain
        let edgeShader = defaultLibrary.makeFunction(name: "matteConvert")!
        computeState = try! self.device.makeComputePipelineState(function: edgeShader)
    }

    //Générer une texture MTL à partir de l'image capturée
    func createTexture(fromPixelBuffer pixelBuffer: CVPixelBuffer, pixelFormat: MTLPixelFormat, planeIndex: Int) -> CVMetalTexture? {
        let width = CVPixelBufferGetWidthOfPlane(pixelBuffer, planeIndex)
        let height = CVPixelBufferGetHeightOfPlane(pixelBuffer, planeIndex)
        
        var texture: CVMetalTexture? = nil
        let status = CVMetalTextureCacheCreateTextureFromImage(nil, capturedImageTextureCache, pixelBuffer, nil, pixelFormat,
                                                               width, height, planeIndex, &texture)
        if status != kCVReturnSuccess {
            texture = nil
        }
        return texture
    }

    func compositeImagesWithEncoder(renderEncoder: MTLRenderCommandEncoder) {
        guard let textureY = capturedImageTextureY, let textureCbCr = capturedImageTextureCbCr else { return }
        renderEncoder.setCullMode(.none)
        renderEncoder.setRenderPipelineState(compositePipelineState)

        renderEncoder.setVertexBuffer(imagePlaneVertexBuffer, offset: 0, index: 0)

        renderEncoder.setFragmentTexture(CVMetalTextureGetTexture(textureY), index: 0)
        renderEncoder.setFragmentTexture(CVMetalTextureGetTexture(textureCbCr), index: 1)
        renderEncoder.setFragmentTexture(whiteBlurTexture, index: 2)
        renderEncoder.setFragmentTexture(yellowBlurTexture, index: 3)
        renderEncoder.setFragmentTexture(alphaTexture, index: 4)
        renderEncoder.drawPrimitives(type: .triangleStrip, vertexStart: 0, vertexCount: 4)
    }
}

・ Shader

`Shaders.metal`


typedef struct {
    float2 position [[attribute(kVertexAttributePosition)]];
    float2 texCoord [[attribute(kVertexAttributeTexcoord)]];
} ImageVertex;


typedef struct {
    float4 position [[position]];
    float2 texCoord;
} ImageColorInOut;

vertex ImageColorInOut capturedImageVertexTransform(ImageVertex in [[stage_in]]) {
    ImageColorInOut out;
    out.position = float4(in.position, 0.0, 1.0);
    out.texCoord = in.texCoord;
    return out;
}

// Convert from YCbCr to rgb
float4 ycbcrToRGBTransform(float4 y, float4 CbCr) {
    const float4x4 ycbcrToRGBTransform = float4x4(
      float4(+1.0000f, +1.0000f, +1.0000f, +0.0000f),
      float4(+0.0000f, -0.3441f, +1.7720f, +0.0000f),
      float4(+1.4020f, -0.7141f, +0.0000f, +0.0000f),
      float4(-0.7010f, +0.5291f, -0.8860f, +1.0000f)
    );

    float4 ycbcr = float4(y.r, CbCr.rg, 1.0);
    return ycbcrToRGBTransform * ycbcr;
}

// This defines the captured image fragment function.
fragment float4 capturedImageFragmentShader(ImageColorInOut in [[stage_in]],
                                            texture2d<float, access::sample> capturedImageTextureY [[ texture(kTextureIndexY) ]],
                                            texture2d<float, access::sample> capturedImageTextureCbCr [[ texture(kTextureIndexCbCr) ]]) {
    
    constexpr sampler colorSampler(mip_filter::linear,
                                   mag_filter::linear,
                                   min_filter::linear);
    
    // Sample Y and CbCr textures to get the YCbCr color at the given texture coordinate.
    return ycbcrToRGBTransform(capturedImageTextureY.sample(colorSampler, in.texCoord),
                               capturedImageTextureCbCr.sample(colorSampler, in.texCoord));
}

typedef struct {
    float2 position;
    float2 texCoord;
} CompositeVertex;

typedef struct {
    float4 position [[position]];
    float2 texCoordCamera;
} CompositeColorInOut;

// Composite the image vertex function.
vertex CompositeColorInOut compositeImageVertexTransform(const device CompositeVertex* cameraVertices [[ buffer(0) ]],
                                                         unsigned int vid [[ vertex_id ]]) {
    CompositeColorInOut out;

    const device CompositeVertex& cv = cameraVertices[vid];

    out.position = float4(cv.position, 0.0, 1.0);
    out.texCoordCamera = cv.texCoord;

    return out;
}

// Composite the image fragment function.
fragment half4 compositeImageFragmentShader(CompositeColorInOut in [[ stage_in ]],
                                            texture2d<float, access::sample> capturedImageTextureY [[ texture(0) ]],
                                            texture2d<float, access::sample> capturedImageTextureCbCr [[ texture(1) ]],
                                            texture2d<float, access::sample> whiteColorTexture [[ texture(2) ]],
                                            texture2d<float, access::sample> yellowColorTexture [[ texture(3) ]],
                                            texture2d<float, access::sample> alphaTexture [[ texture(4) ]])
{
    constexpr sampler s(address::clamp_to_edge, filter::linear);

    float2 cameraTexCoord = in.texCoordCamera;

    // Sample Y and CbCr textures to get the YCbCr color at the given texture coordinate.
    float4 rgb = ycbcrToRGBTransform(capturedImageTextureY.sample(s, cameraTexCoord), capturedImageTextureCbCr.sample(s, cameraTexCoord));

    half4 cameraColor = half4(rgb);
    half4 whiteColor = half4(whiteColorTexture.sample(s, cameraTexCoord));
    half4 yellowColor = half4(yellowColorTexture.sample(s, cameraTexCoord)) * 2.0;
        
    return cameraColor + whiteColor + yellowColor;
}

//(Image du corps humain agrandie-Créez le bord du corps humain avec (image du corps humain) et affichez-le sous forme de texture blanche et jaune
kernel void matteConvert(texture2d<half, access::read> inTexture [[ texture(0) ]],
                         texture2d<half, access::write> outWhiteTexture [[ texture(1) ]],
                         texture2d<half, access::write> outYellowTexture [[ texture(2) ]],
                         uint2 gid [[thread_position_in_grid]]) {
    
    uint2 textureIndex(gid);
    if (inTexture.read(textureIndex).r > 0.1) {
        //Aucune couleur sur le corps humain
        outWhiteTexture.write(half4(0.0), gid);
        outYellowTexture.write(half4(0.0), gid);
        return;
    }
    
    //Expansion
    constexpr int scale = 15;
    constexpr int radius = scale / 2;
    half color = 0.0;
    
    for (int i=0; i<scale; i++) {
        for (int j=0; j<scale; j++) {
            uint2 textureIndex(gid.x + (i - radius), gid.y + (j - radius));
            half alpha = inTexture.read(textureIndex).r;
            if (alpha > 0.1) {
                color = 1.0;
                break;
            }
        }
        if (color > 0.0) {
            break;
        }
    }

    outWhiteTexture.write(half4(color, color, color, 1.0), gid);
    outYellowTexture.write(half4(color, color, 0.0, 1.0), gid);
}