What to do this time

Take a picture with android, save it, display the picture and classify the image, and make a simple image recognition application that displays the classification result.

This ↓ Start the camera, shoot, Display the picture you took on the screen Recognize the pictures you take

Libraries and keywords used this time

-Python PyTorch Mobile ・ Android Camera X ・ Resnet18 ・ Kotlin

Only the ones that came out last year ...

Dependencies

First, add dependencies (as of February 2020) camera x and pytorch mobile

`build.gradle`


  def camerax_version = '1.0.0-alpha06'
    implementation "androidx.camera:camera-core:${camerax_version}"
    implementation "androidx.camera:camera-camera2:${camerax_version}"
    implementation 'org.pytorch:pytorch_android:1.4.0'
    implementation 'org.pytorch:pytorch_android_torchvision:1.4.0'

Add the following to the end of the upper ** android {} **

`build.gradle`


    compileOptions {
        sourceCompatibility JavaVersion.VERSION_1_8
        targetCompatibility JavaVersion.VERSION_1_8
    }

Camera X implementation

After adding the dependency, we will implement the function to take a picture using ** Camera X **, a library that makes it easy to handle the camera on Android.

Below, we will implement the official Camera X Tutorial. Details are mentioned in other articles, so omit it and just the code.

Manifest

Permission permission

<uses-permission android:name="android.permission.CAMERA" />

Implemented the function to take a picture with a camera

Add a function to take a picture with a camera and save it. Follow the tutorial to preview the camera and capture the camera. Since it is almost the same as the content of the tutorial, I will put only the code.

Layout

Place the place to display the taken picture, the place to display the preview of the camera, the camera start button, the capture button, and the inference button appropriately.

`activity_main.xml`


<androidx.constraintlayout.widget.ConstraintLayout xmlns:android="http://schemas.android.com/apk/res/android"
    xmlns:app="http://schemas.android.com/apk/res-auto"
    xmlns:tools="http://schemas.android.com/tools"
    android:layout_width="match_parent"
    android:layout_height="match_parent"
    tools:context=".MainActivity">

    <Button
        android:id="@+id/capture_button"
        android:layout_width="wrap_content"
        android:layout_height="wrap_content"
        android:layout_marginTop="2dp"
        android:text="photograph"
        app:layout_constraintEnd_toEndOf="parent"
        app:layout_constraintHorizontal_bias="0.25"
        app:layout_constraintStart_toStartOf="parent"
        app:layout_constraintTop_toBottomOf="@+id/frameLayout" />

    <Button
        android:id="@+id/activateCamera"
        android:layout_width="wrap_content"
        android:layout_height="wrap_content"
        android:text="Camera activation"
        app:layout_constraintEnd_toEndOf="parent"
        app:layout_constraintHorizontal_bias="0.25"
        app:layout_constraintStart_toStartOf="parent"
        app:layout_constraintTop_toBottomOf="@+id/capture_button" />

    <ImageView
        android:id="@+id/capturedImg"
        android:layout_width="500px"
        android:layout_height="500px"
        app:layout_constraintEnd_toEndOf="parent"
        app:layout_constraintStart_toStartOf="parent"
        app:layout_constraintTop_toTopOf="parent"
        app:srcCompat="@mipmap/ic_launcher_round" />

    <FrameLayout
        android:id="@+id/frameLayout"
        android:layout_width="wrap_content"
        android:layout_height="wrap_content"
        android:layout_marginTop="8dp"
        android:background="@android:color/holo_blue_bright"
        app:layout_constraintEnd_toEndOf="parent"
        app:layout_constraintStart_toStartOf="parent"
        app:layout_constraintTop_toBottomOf="@+id/capturedImg">

        <TextureView
            android:id="@+id/view_finder"
            android:layout_width="500px"
            android:layout_height="500px" />
    </FrameLayout>

    <Button
        android:id="@+id/inferBtn"
        android:layout_width="wrap_content"
        android:layout_height="wrap_content"
        android:layout_marginStart="32dp"
        android:text="inference"
        app:layout_constraintBottom_toBottomOf="@+id/capture_button"
        app:layout_constraintStart_toEndOf="@+id/capture_button"
        app:layout_constraintTop_toTopOf="@+id/capture_button" />

    <TextView
        android:id="@+id/resultText"
        android:layout_width="wrap_content"
        android:layout_height="wrap_content"
        android:layout_marginTop="4dp"
        android:text="Inference result"
        app:layout_constraintEnd_toEndOf="parent"
        app:layout_constraintHorizontal_bias="0.31"
        app:layout_constraintStart_toStartOf="parent"
        app:layout_constraintTop_toBottomOf="@+id/activateCamera" />

</androidx.constraintlayout.widget.ConstraintLayout>

MainActivity

`MainActivity.kt`



private const val REQUEST_CODE_PERMISSIONS = 10
private val REQUIRED_PERMISSIONS = arrayOf(Manifest.permission.CAMERA)

class MainActivity : AppCompatActivity(), LifecycleOwner {

    private var imgData: Bitmap? = null   //Saved image data storage variable

    override fun onCreate(savedInstanceState: Bundle?) {
        super.onCreate(savedInstanceState)
        setContentView(R.layout.activity_main)
        viewFinder = findViewById(R.id.view_finder)

        //Camera activation
        activateCamera.setOnClickListener {
            if (allPermissionsGranted()) {
                viewFinder.post { startCamera() }
            } else {
                ActivityCompat.requestPermissions(
                    this, REQUIRED_PERMISSIONS,REQUEST_CODE_PERMISSIONS
                )
            }
        }

        viewFinder.addOnLayoutChangeListener { _, _, _, _, _, _, _, _, _ ->
            updateTransform()
        }
        /**Add code to classify images later here**/
    }

    private val executor = Executors.newSingleThreadExecutor()
    private lateinit var viewFinder: TextureView

    private fun startCamera() {
        //Create preview use case
        val previewConfig = PreviewConfig.Builder().apply {
            setTargetResolution(Size(viewFinder.width, viewFinder.height)) // 680, 480
        }.build()

        val preview = Preview(previewConfig)

        preview.setOnPreviewOutputUpdateListener {
            val parent = viewFinder.parent as ViewGroup
            parent.removeView(viewFinder)
            parent.addView(viewFinder, 0)

            viewFinder.surfaceTexture = it.surfaceTexture
            updateTransform()
        }

        //Capture use case creation
        val imageCaptureConfig = ImageCaptureConfig.Builder()
            .apply {
                setCaptureMode(ImageCapture.CaptureMode.MIN_LATENCY)
            }.build()

        val imageCapture = ImageCapture(imageCaptureConfig)

        //photography
        capture_button.setOnClickListener {
            val file = File(
                externalMediaDirs.first(),
                "${System.currentTimeMillis()}.jpg "
            )

            imageCapture.takePicture(file, executor,
                object : ImageCapture.OnImageSavedListener {
                    override fun onError(
                        imageCaptureError: ImageCapture.ImageCaptureError,
                        message: String,
                        exc: Throwable?
                    ) {
                        val msg = "Photo capture failed: $message"
                        Log.e("CameraXApp", msg, exc)
                        viewFinder.post {
                            Toast.makeText(baseContext, msg, Toast.LENGTH_SHORT).show()
                        }
                    }

                    override fun onImageSaved(file: File) {
                        //Get the saved file data as a bitmap
                        // ()Rotate 90 degrees using Matrix to display
                        val inputStream = FileInputStream(file)
                        val bitmap = BitmapFactory.decodeStream(inputStream)
                        val bitmapWidth = bitmap.width
                        val bitmapHeight = bitmap.height
                        val matrix = Matrix()
                        matrix.setRotate(90F, bitmapWidth / 2F, bitmapHeight / 2F)
                        val rotatedBitmap = Bitmap.createBitmap(
                            bitmap,
                            0,
                            0,
                            bitmapWidth,
                            bitmapHeight,
                            matrix,
                            true
                        )

                        imgData = rotatedBitmap  //Store images for inference
                        //View photos taken
                        //Change view from other than main thread
                        viewFinder.post {
                            capturedImg.setImageBitmap(rotatedBitmap)
                        }
                        val msg = "Photo capture succeeded: ${file.absolutePath}"
                        viewFinder.post {
                            Toast.makeText(baseContext, msg, Toast.LENGTH_SHORT).show()
                        }

                    }
                })
        }
        //Preview and capture use case
        CameraX.bindToLifecycle(this, preview, imageCapture)
    }

    private fun updateTransform() {
        val matrix = Matrix()

        val centerX = viewFinder.width / 2f
        val centerY = viewFinder.height / 2f

        val rotationDegrees = when (viewFinder.display.rotation) {
            Surface.ROTATION_0 -> 0
            Surface.ROTATION_90 -> 90
            Surface.ROTATION_180 -> 180
            Surface.ROTATION_270 -> 270
            else -> return
        }
        matrix.postRotate(-rotationDegrees.toFloat(), centerX, centerY)

        viewFinder.setTransform(matrix)
    }
    
    override fun onRequestPermissionsResult(
        requestCode: Int, permissions: Array<String>, grantResults: IntArray
    ) {
        if (requestCode == REQUEST_CODE_PERMISSIONS) {
            if (allPermissionsGranted()) {
                viewFinder.post { startCamera() }
            } else {
                Toast.makeText(
                    this,
                    "Permissions not granted by the user.",
                    Toast.LENGTH_SHORT
                ).show()
                finish()
            }
        }
    }
    
    private fun allPermissionsGranted() = REQUIRED_PERMISSIONS.all {
        ContextCompat.checkSelfPermission(
            baseContext, it
        ) == PackageManager.PERMISSION_GRANTED
    }
}

If you can do this, you should be able to take a picture and display the picture on the screen. (I don't know if it's because of my environment or the code is bad, but there is a considerable lag between taking a picture and displaying the picture taken.)

The official provides three use cases for Camera X: ** preview, capture, and image analysis **, but this time we will use a combination of preview and capture. By the way, the supported combinations of use cases are as follows. (Official Document)

Implementation of image recognition

Download model

This time, we infer using a trained model.

If you do not have python or PyTorch environment, please download ** resnet.pt ** from your github and skip here. If you have a python PyTorch environment, please execute the following in your own environment to download the model.

import torch
import torchvision

model = torchvision.models.resnet18(pretrained=True)
model.eval()
example = torch.rand(1, 3, 224, 224)
traced_script_module = torch.jit.trace(model, example)
traced_script_module.save("resnet.pt")

If it can be executed successfully, a file called resnet.pt will be generated in the same hierarchy. (Later put this in the android studio folder) Image recognition is performed using this trained resnet18.

Infer using a model

asset folder

First, throw the model you downloaded earlier into the android studio folder. The place to throw in is ** asset folder ** (Since it does not exist by default, you can create it with right-click res folder-> new-> folder-> Asset folder)

Next, create a function to get the path from the asset folder Add the following to the bottom of MainActivity.kt

`MainActivity.kt`


 //Get path of asset file
    private fun getAssetFilePath(context: Context, assetName: String): String {
        val file = File(context.filesDir, assetName)
        if (file.exists() && file.length() > 0) {
            return file.absolutePath
        }
        context.assets.open(assetName).use { inputStream ->
            FileOutputStream(file).use { outputStream ->
                val buffer = ByteArray(4 * 1024)
                var read: Int
                while (inputStream.read(buffer).also { read = it } != -1) {
                    outputStream.write(buffer, 0, read)
                }
                outputStream.flush()
            }
            return file.absolutePath
        }
    }

inference

Make it possible to refer to 1000 classes of Image Net so that you can get the class to classify images. Create a new ImageNetCategory.kt and write the class name there. (It's too long, so copy it from github)

`ImageNetCategory.kt`


class ImageNetCategory {
    var IMAGENET_CLASSES = arrayOf(
        "tench, Tinca tinca",
        "goldfish, Carassius auratus",

          //Abbreviation(Please copy from github)

        "ear, spike, capitulum",
        "toilet tissue, toilet paper, bathroom tissue"
    )
}

Then implement the main inference part. Add the following to the last part of onCreate in MainActivity.kt.

`MainActivity.kt`



  //Loading network model
        val resnet = Module.load(getAssetFilePath(this, "resnet.pt"))

        /**inference**/
        inferBtn.setOnClickListener {
            //Resize the photo you took to 224 x 224
            val imgDataResized = Bitmap.createScaledBitmap(imgData!!, 224, 224, true)
            //Convert bitmap to tensor
            val inputTensor = TensorImageUtils.bitmapToFloat32Tensor(
                imgDataResized,
                TensorImageUtils.TORCHVISION_NORM_MEAN_RGB,
                TensorImageUtils.TORCHVISION_NORM_STD_RGB
            )

            //Forward propagation
            val outputTensor = resnet.forward(IValue.from(inputTensor)).toTensor()
            val scores = outputTensor.dataAsFloatArray

            var maxScore = 0F
            var maxScoreIdx = 0
            for (i in scores.indices) {
                if (scores[i] > maxScore) {
                    maxScore = scores[i]
                    maxScoreIdx = i
                }
            }
            
            //Convert inference result index to category name
            val inferCategory = ImageNetCategory().IMAGENET_CLASSES[maxScoreIdx]
            resultText.text = "Inference result:${inferCategory}" 
        }

Image recognition can be performed only with this. Please take various pictures and exchange models to play with.

end

This code is listed on github, so please refer to it as appropriate. Actually, I tried to put VGG-16 or something, but I gave up because I thought it would be troublesome because it was out of memory. It would be interesting to put a model that has undergone various transfer learning. Also, I thought it would be convenient to use the camera functions easily with Camera X.

[kotlin] Create an app that recognizes photos taken with a camera on android

What to do this time

Libraries and keywords used this time

Dependencies

build.gradle

build.gradle

Camera X implementation

Manifest

Implemented the function to take a picture with a camera

Layout

activity_main.xml

MainActivity.kt

Implementation of image recognition

Download model

Infer using a model

asset folder

MainActivity.kt

inference

ImageNetCategory.kt

MainActivity.kt

end

`build.gradle`

`build.gradle`

`activity_main.xml`

`MainActivity.kt`

`MainActivity.kt`

`ImageNetCategory.kt`

`MainActivity.kt`