[Java] Get metadata from files with Apathce Tika, and get image / video width and height from metadata [Kotlin]

Thing you want to do

Get the frame size (width and height) of the image / video.

In this article, we will use Apache Tika to get the metadata and get the frame size from that metadata.

apology

As far as I read the above official document, the method introduced in this article can only support image formats (+ jpeg as far as I tried) supported by the Java platform such as png, gif, bmp, and flv, mp4 format videos. .. This does not change even if Content-type is treated as video / mp4, such as mov that appears in the execution result described later.

As I researched, it seemed that there is almost no way to get the frame size in Java so far (I would appreciate it if you could tell me). If you want to do it surely, I think it would be better to call ffmpeg from Java.

Introduction method

Introduce Apache Tika Parsers from Maven. I used 1.21 for verification.

Apache Tika Parsers

Verification code

It's a rough sample, but it's a sample to get the frame size for all the files under src / main / resources. It is sorted by extension and output.

The org.apache.tika.metadata.Metadata type is named because it is named after the Kotlin Metadata type.

import org.apache.tika.Tika
import org.apache.tika.io.TikaInputStream
import java.io.File
import org.apache.tika.metadata.Metadata as TikaMetadata

fun main() {
    val tika = Tika()

    File(System.getProperty("user.dir") + "/src/main/resources").listFiles().filter {
        it.name != ".gitkeep"
    }.map {
        val metaData = TikaMetadata() //If you do not declare each file, the previously determined data will remain endlessly.
        val tikaStream = TikaInputStream.get(it.toURI(), metaData)

        tika.parse(tikaStream, metaData)

        val fileInfo = """
            Extension: ${metaData.get("resourceName").split(".").last().toLowerCase()}
            Content-Type = ${metaData.get("Content-Type")}
        """.trimIndent()

        val width = metaData.get("tiff:ImageWidth") ?.let { w -> Integer.parseInt(w) }
        val height = metaData.get("tiff:ImageLength") ?.let { h -> Integer.parseInt(h) }

        """
            #$fileInfo
            #width: ${width?: "nothing" }
            #height: ${height?: "nothing"}
        """.trimMargin("#")
    }.sorted().forEach {
        println(it)
        println("----------------------")
    }
}

Execution result

It is the result of putting the data picked up in the texto.

It's long so it's folded
Extension: 7z
Content-Type = application/x-7z-compressed
width: nothing
height: nothing
----------------------
Extension: avi
Content-Type = video/x-msvideo
width: nothing
height: nothing
----------------------
Extension: docx
Content-Type = application/vnd.openxmlformats-officedocument.wordprocessingml.document
width: nothing
height: nothing
----------------------
Extension: exe
Content-Type = application/x-msdownload
width: nothing
height: nothing
----------------------
Extension: flv
Content-Type = video/x-flv
width: nothing
height: nothing
----------------------
Extension: gif
Content-Type = image/gif
width: 300
height: 250
----------------------
Extension: gif
Content-Type = image/gif
width: 300
height: 250
----------------------
Extension: gif
Content-Type = image/gif
width: 300
height: 250
----------------------
Extension: html
Content-Type = text/html; charset=Shift_JIS
width: nothing
height: nothing
----------------------
Extension: jpg
Content-Type = image/jpeg
width: 4608
height: 2592
----------------------
Extension: jpg
Content-Type = image/jpeg
width: 759
height: 1074
----------------------
Extension: m3u
Content-Type = text/plain; charset=windows-1252
width: nothing
height: nothing
----------------------
Extension: mkv
Content-Type = video/x-matroska
width: nothing
height: nothing
----------------------
Extension: mkv
Content-Type = video/x-matroska
width: nothing
height: nothing
----------------------
Extension: mkv
Content-Type = video/x-matroska
width: nothing
height: nothing
----------------------
Extension: mkv
Content-Type = video/x-matroska
width: nothing
height: nothing
----------------------
Extension: mov
Content-Type = application/mp4
width: 0
height: 0
----------------------
Extension: mov
Content-Type = application/mp4
width: 0
height: 0
----------------------
Extension: mov
Content-Type = video/quicktime
width: 0
height: 0
----------------------
Extension: mov
Content-Type = video/quicktime
width: 320
height: 240
----------------------
Extension: mp3
Content-Type = audio/mpeg
width: nothing
height: nothing
----------------------
Extension: mp4
Content-Type = application/mp4
width: 320
height: 240
----------------------
Extension: mp4
Content-Type = application/mp4
width: 640
height: 360
----------------------
Extension: mp4
Content-Type = application/mp4
width: 725
height: 408
----------------------
Extension: mp4
Content-Type = application/mp4
width: 725
height: 408
----------------------
Extension: mp4
Content-Type = video/mp4
width: 0
height: 0
----------------------
Extension: mp4
Content-Type = video/mp4
width: 0
height: 0
----------------------
Extension: mp4
Content-Type = video/mp4
width: 718
height: 404
----------------------
Extension: mp4
Content-Type = video/mp4
width: 720
height: 408
----------------------
Extension: mp4
Content-Type = video/mp4
width: 854
height: 480
----------------------
Extension: mp4
Content-Type = video/x-m4v
width: 725
height: 408
----------------------
Extension: mpg
Content-Type = video/mpeg
width: nothing
height: nothing
----------------------
Extension: mpg
Content-Type = video/mpeg
width: nothing
height: nothing
----------------------
Extension: mpg
Content-Type = video/mpeg
width: nothing
height: nothing
----------------------
Extension: msi
Content-Type = application/x-ms-installer
width: nothing
height: nothing
----------------------
Extension: pdf
Content-Type = application/pdf
width: nothing
height: nothing
----------------------
Extension: png
Content-Type = image/png
width: 600
height: 371
----------------------
Extension: pptx
Content-Type = application/vnd.openxmlformats-officedocument.presentationml.presentation
width: nothing
height: nothing
----------------------
Extension: svg
Content-Type = image/svg+xml
width: nothing
height: nothing
----------------------
Extension: ts
Content-Type = application/octet-stream
width: nothing
height: nothing
----------------------
Extension: vcmf
Content-Type = application/octet-stream
width: nothing
height: nothing
----------------------
Extension: vob
Content-Type = video/mpeg
width: nothing
height: nothing
----------------------
Extension: webm
Content-Type = video/webm
width: nothing
height: nothing
----------------------
Extension: webm
Content-Type = video/webm
width: nothing
height: nothing
----------------------
Extension: webm
Content-Type = video/webm
width: nothing
height: nothing
----------------------
Extension: webm
Content-Type = video/webm
width: nothing
height: nothing
----------------------
Extension: wmv
Content-Type = video/x-ms-wmv
width: nothing
height: nothing
----------------------
Extension: zip
Content-Type = application/zip
width: nothing
height: nothing
----------------------

Recommended Posts

[Java] Get metadata from files with Apathce Tika, and get image / video width and height from metadata [Kotlin]
[Java] Get Charset with Apathce Tika / Initialize String from Charset [Kotlin]
[Java] Get MimeType from the contents of the file with Apathce Tika [Kotlin]
[Java] Parse Excel (not limited to various) files with Apathce Tika [Kotlin]
[Kotlin] Get Java Constructor / Method from KFunction and call it
[Java] Reading and writing files with OpenCSV
[Java] Get tag information from music files
[Kotlin] Delete files with duplicate contents [Java]
Disassemble and decompile .class files compiled from Kotlin
[Java] Get KFunction from Method / Constructor in Java [Kotlin]
[Review] Reading and writing files with java (JDK6)
[Android development] Get an image from the server in Java and set it in ImageView! !!
Call a method with a Kotlin callback block from Java
Java language from the perspective of Kotlin and C #
I want to transition screens with kotlin and java!
[Java] Development with multiple files using package and import