Take a picture with android, save it, display the picture and classify the image, and make a simple image recognition application that displays the classification result.
This ↓ Start the camera, shoot, Display the picture you took on the screen Recognize the pictures you take
-Python PyTorch Mobile ・ Android Camera X ・ Resnet18 ・ Kotlin
Only the ones that came out last year ...
First, add dependencies (as of February 2020) camera x and pytorch mobile
build.gradle
def camerax_version = '1.0.0-alpha06'
implementation "androidx.camera:camera-core:${camerax_version}"
implementation "androidx.camera:camera-camera2:${camerax_version}"
implementation 'org.pytorch:pytorch_android:1.4.0'
implementation 'org.pytorch:pytorch_android_torchvision:1.4.0'
Add the following to the end of the upper ** android {} **
build.gradle
compileOptions {
sourceCompatibility JavaVersion.VERSION_1_8
targetCompatibility JavaVersion.VERSION_1_8
}
After adding the dependency, we will implement the function to take a picture using ** Camera X **, a library that makes it easy to handle the camera on Android.
Below, we will implement the official Camera X Tutorial. Details are mentioned in other articles, so omit it and just the code.
Permission permission
<uses-permission android:name="android.permission.CAMERA" />
Add a function to take a picture with a camera and save it. Follow the tutorial to preview the camera and capture the camera. Since it is almost the same as the content of the tutorial, I will put only the code.
activity_main.xml
<androidx.constraintlayout.widget.ConstraintLayout xmlns:android="http://schemas.android.com/apk/res/android"
xmlns:app="http://schemas.android.com/apk/res-auto"
xmlns:tools="http://schemas.android.com/tools"
android:layout_width="match_parent"
android:layout_height="match_parent"
tools:context=".MainActivity">
<Button
android:id="@+id/capture_button"
android:layout_width="wrap_content"
android:layout_height="wrap_content"
android:layout_marginTop="2dp"
android:text="photograph"
app:layout_constraintEnd_toEndOf="parent"
app:layout_constraintHorizontal_bias="0.25"
app:layout_constraintStart_toStartOf="parent"
app:layout_constraintTop_toBottomOf="@+id/frameLayout" />
<Button
android:id="@+id/activateCamera"
android:layout_width="wrap_content"
android:layout_height="wrap_content"
android:text="Camera activation"
app:layout_constraintEnd_toEndOf="parent"
app:layout_constraintHorizontal_bias="0.25"
app:layout_constraintStart_toStartOf="parent"
app:layout_constraintTop_toBottomOf="@+id/capture_button" />
<ImageView
android:id="@+id/capturedImg"
android:layout_width="500px"
android:layout_height="500px"
app:layout_constraintEnd_toEndOf="parent"
app:layout_constraintStart_toStartOf="parent"
app:layout_constraintTop_toTopOf="parent"
app:srcCompat="@mipmap/ic_launcher_round" />
<FrameLayout
android:id="@+id/frameLayout"
android:layout_width="wrap_content"
android:layout_height="wrap_content"
android:layout_marginTop="8dp"
android:background="@android:color/holo_blue_bright"
app:layout_constraintEnd_toEndOf="parent"
app:layout_constraintStart_toStartOf="parent"
app:layout_constraintTop_toBottomOf="@+id/capturedImg">
<TextureView
android:id="@+id/view_finder"
android:layout_width="500px"
android:layout_height="500px" />
</FrameLayout>
<Button
android:id="@+id/inferBtn"
android:layout_width="wrap_content"
android:layout_height="wrap_content"
android:layout_marginStart="32dp"
android:text="inference"
app:layout_constraintBottom_toBottomOf="@+id/capture_button"
app:layout_constraintStart_toEndOf="@+id/capture_button"
app:layout_constraintTop_toTopOf="@+id/capture_button" />
<TextView
android:id="@+id/resultText"
android:layout_width="wrap_content"
android:layout_height="wrap_content"
android:layout_marginTop="4dp"
android:text="Inference result"
app:layout_constraintEnd_toEndOf="parent"
app:layout_constraintHorizontal_bias="0.31"
app:layout_constraintStart_toStartOf="parent"
app:layout_constraintTop_toBottomOf="@+id/activateCamera" />
</androidx.constraintlayout.widget.ConstraintLayout>
MainActivity
MainActivity.kt
private const val REQUEST_CODE_PERMISSIONS = 10
private val REQUIRED_PERMISSIONS = arrayOf(Manifest.permission.CAMERA)
class MainActivity : AppCompatActivity(), LifecycleOwner {
private var imgData: Bitmap? = null //Saved image data storage variable
override fun onCreate(savedInstanceState: Bundle?) {
super.onCreate(savedInstanceState)
setContentView(R.layout.activity_main)
viewFinder = findViewById(R.id.view_finder)
//Camera activation
activateCamera.setOnClickListener {
if (allPermissionsGranted()) {
viewFinder.post { startCamera() }
} else {
ActivityCompat.requestPermissions(
this, REQUIRED_PERMISSIONS,REQUEST_CODE_PERMISSIONS
)
}
}
viewFinder.addOnLayoutChangeListener { _, _, _, _, _, _, _, _, _ ->
updateTransform()
}
/**Add code to classify images later here**/
}
private val executor = Executors.newSingleThreadExecutor()
private lateinit var viewFinder: TextureView
private fun startCamera() {
//Create preview use case
val previewConfig = PreviewConfig.Builder().apply {
setTargetResolution(Size(viewFinder.width, viewFinder.height)) // 680, 480
}.build()
val preview = Preview(previewConfig)
preview.setOnPreviewOutputUpdateListener {
val parent = viewFinder.parent as ViewGroup
parent.removeView(viewFinder)
parent.addView(viewFinder, 0)
viewFinder.surfaceTexture = it.surfaceTexture
updateTransform()
}
//Capture use case creation
val imageCaptureConfig = ImageCaptureConfig.Builder()
.apply {
setCaptureMode(ImageCapture.CaptureMode.MIN_LATENCY)
}.build()
val imageCapture = ImageCapture(imageCaptureConfig)
//photography
capture_button.setOnClickListener {
val file = File(
externalMediaDirs.first(),
"${System.currentTimeMillis()}.jpg "
)
imageCapture.takePicture(file, executor,
object : ImageCapture.OnImageSavedListener {
override fun onError(
imageCaptureError: ImageCapture.ImageCaptureError,
message: String,
exc: Throwable?
) {
val msg = "Photo capture failed: $message"
Log.e("CameraXApp", msg, exc)
viewFinder.post {
Toast.makeText(baseContext, msg, Toast.LENGTH_SHORT).show()
}
}
override fun onImageSaved(file: File) {
//Get the saved file data as a bitmap
// ()Rotate 90 degrees using Matrix to display
val inputStream = FileInputStream(file)
val bitmap = BitmapFactory.decodeStream(inputStream)
val bitmapWidth = bitmap.width
val bitmapHeight = bitmap.height
val matrix = Matrix()
matrix.setRotate(90F, bitmapWidth / 2F, bitmapHeight / 2F)
val rotatedBitmap = Bitmap.createBitmap(
bitmap,
0,
0,
bitmapWidth,
bitmapHeight,
matrix,
true
)
imgData = rotatedBitmap //Store images for inference
//View photos taken
//Change view from other than main thread
viewFinder.post {
capturedImg.setImageBitmap(rotatedBitmap)
}
val msg = "Photo capture succeeded: ${file.absolutePath}"
viewFinder.post {
Toast.makeText(baseContext, msg, Toast.LENGTH_SHORT).show()
}
}
})
}
//Preview and capture use case
CameraX.bindToLifecycle(this, preview, imageCapture)
}
private fun updateTransform() {
val matrix = Matrix()
val centerX = viewFinder.width / 2f
val centerY = viewFinder.height / 2f
val rotationDegrees = when (viewFinder.display.rotation) {
Surface.ROTATION_0 -> 0
Surface.ROTATION_90 -> 90
Surface.ROTATION_180 -> 180
Surface.ROTATION_270 -> 270
else -> return
}
matrix.postRotate(-rotationDegrees.toFloat(), centerX, centerY)
viewFinder.setTransform(matrix)
}
override fun onRequestPermissionsResult(
requestCode: Int, permissions: Array<String>, grantResults: IntArray
) {
if (requestCode == REQUEST_CODE_PERMISSIONS) {
if (allPermissionsGranted()) {
viewFinder.post { startCamera() }
} else {
Toast.makeText(
this,
"Permissions not granted by the user.",
Toast.LENGTH_SHORT
).show()
finish()
}
}
}
private fun allPermissionsGranted() = REQUIRED_PERMISSIONS.all {
ContextCompat.checkSelfPermission(
baseContext, it
) == PackageManager.PERMISSION_GRANTED
}
}
If you can do this, you should be able to take a picture and display the picture on the screen. (I don't know if it's because of my environment or the code is bad, but there is a considerable lag between taking a picture and displaying the picture taken.)
The official provides three use cases for Camera X: ** preview, capture, and image analysis **, but this time we will use a combination of preview and capture. By the way, the supported combinations of use cases are as follows. (Official Document)
This time, we infer using a trained model.
import torch
import torchvision
model = torchvision.models.resnet18(pretrained=True)
model.eval()
example = torch.rand(1, 3, 224, 224)
traced_script_module = torch.jit.trace(model, example)
traced_script_module.save("resnet.pt")
If it can be executed successfully, a file called resnet.pt will be generated in the same hierarchy. (Later put this in the android studio folder) Image recognition is performed using this trained resnet18.
First, throw the model you downloaded earlier into the android studio folder. The place to throw in is ** asset folder ** (Since it does not exist by default, you can create it with right-click res folder-> new-> folder-> Asset folder)
Next, create a function to get the path from the asset folder Add the following to the bottom of MainActivity.kt
MainActivity.kt
//Get path of asset file
private fun getAssetFilePath(context: Context, assetName: String): String {
val file = File(context.filesDir, assetName)
if (file.exists() && file.length() > 0) {
return file.absolutePath
}
context.assets.open(assetName).use { inputStream ->
FileOutputStream(file).use { outputStream ->
val buffer = ByteArray(4 * 1024)
var read: Int
while (inputStream.read(buffer).also { read = it } != -1) {
outputStream.write(buffer, 0, read)
}
outputStream.flush()
}
return file.absolutePath
}
}
Make it possible to refer to 1000 classes of Image Net so that you can get the class to classify images. Create a new ImageNetCategory.kt and write the class name there. (It's too long, so copy it from github)
ImageNetCategory.kt
class ImageNetCategory {
var IMAGENET_CLASSES = arrayOf(
"tench, Tinca tinca",
"goldfish, Carassius auratus",
//Abbreviation(Please copy from github)
"ear, spike, capitulum",
"toilet tissue, toilet paper, bathroom tissue"
)
}
Then implement the main inference part. Add the following to the last part of onCreate in MainActivity.kt.
MainActivity.kt
//Loading network model
val resnet = Module.load(getAssetFilePath(this, "resnet.pt"))
/**inference**/
inferBtn.setOnClickListener {
//Resize the photo you took to 224 x 224
val imgDataResized = Bitmap.createScaledBitmap(imgData!!, 224, 224, true)
//Convert bitmap to tensor
val inputTensor = TensorImageUtils.bitmapToFloat32Tensor(
imgDataResized,
TensorImageUtils.TORCHVISION_NORM_MEAN_RGB,
TensorImageUtils.TORCHVISION_NORM_STD_RGB
)
//Forward propagation
val outputTensor = resnet.forward(IValue.from(inputTensor)).toTensor()
val scores = outputTensor.dataAsFloatArray
var maxScore = 0F
var maxScoreIdx = 0
for (i in scores.indices) {
if (scores[i] > maxScore) {
maxScore = scores[i]
maxScoreIdx = i
}
}
//Convert inference result index to category name
val inferCategory = ImageNetCategory().IMAGENET_CLASSES[maxScoreIdx]
resultText.text = "Inference result:${inferCategory}"
}
Image recognition can be performed only with this. Please take various pictures and exchange models to play with.
This code is listed on github, so please refer to it as appropriate. Actually, I tried to put VGG-16 or something, but I gave up because I thought it would be troublesome because it was out of memory. It would be interesting to put a model that has undergone various transfer learning. Also, I thought it would be convenient to use the camera functions easily with Camera X.
Recommended Posts