[Java] Get MimeType from the contents of the file with Apathce Tika [Kotlin]

Thing you want to do

Get the MimeType from the file contents. MimeType can be inferred from the extension of the file, but since it is used for purposes that are troublesome if it is rewritten, it is obtained from the file contents.

means

Use Apache Tika. However, the result depends on the file name unless it involves TikaInputStream.

Introduction method

Install and use Apache Tika Core from Maven. I used 1.21 for verification.

Verification code

It's a rough sample, but it's a sample to get MimeType for all files under src / main / resources. As a side note, the ʻorg.apache.tika.metadata.Metadatatype is named because it is named Kotlin'sMetadata` type.

Read the file in resources and output MimeType


import java.io.File
import org.apache.tika.Tika
import org.apache.tika.io.TikaInputStream
import org.apache.tika.metadata.Metadata as TikaMetadata

fun main() {
    val resourcesDir = File(System.getProperty("user.dir") + "/src/main/resources")

    val metaData = TikaMetadata()
    val tika = Tika()

    resourcesDir.listFiles().map {
        val tikaStream = TikaInputStream.get(it.toURI(), metaData)
        //Extension is unified to lowerCase for sorting
        it.name.split(".").last().toLowerCase() + " -> " + tika.detect(tikaStream, metaData)
    }.sorted().forEach {
        //Output after sorting
        println(it)
    }
}

Execution result

This is the result of throwing in the files and samples that were in that area and turning them. It can be taken almost uniquely. I also rewrote the extension and tried it, and it worked pretty well.

7z -> application/x-7z-compressed
avi -> video/x-msvideo
docx -> application/vnd.openxmlformats-officedocument.wordprocessingml.document
exe -> application/x-dosexec
flv -> video/x-flv
html -> text/html
jpg -> image/jpeg
jpg -> image/jpeg
m3u -> text/plain
mkv -> video/x-matroska
mkv -> video/x-matroska
mkv -> video/x-matroska
mkv -> video/x-matroska
mov -> video/quicktime
mov -> video/quicktime
mov -> video/quicktime
mov -> video/quicktime
mp3 -> audio/mpeg
mp4 -> video/mp4
mp4 -> video/mp4
mp4 -> video/mp4
mp4 -> video/mp4
mp4 -> video/mp4
mp4 -> video/mp4
mp4 -> video/mp4
mp4 -> video/mp4
mp4 -> video/mp4
mp4 -> video/x-m4v
mpg -> video/mpeg
mpg -> video/mpeg
mpg -> video/mpeg
msi -> application/x-ms-installer
pdf -> application/pdf
png -> image/png
pptx -> application/vnd.openxmlformats-officedocument.presentationml.presentation
svg -> image/svg+xml
ts -> application/octet-stream
vcmf -> application/octet-stream
vob -> video/mpeg
webm -> video/webm
webm -> video/webm
webm -> video/webm
webm -> video/webm
zip -> application/zip

Other means

I used Tika this time, but the method using ʻURLConnection and mime-util` is major in the sense that it comes out in the search. However, these had difficulty in detection accuracy and maintenance continuation, so this time I used Tika as a trial.

The site I used to write the article

-How to get ContentType from file header in Java \ | Hacknote -Providing video compression samples

Recommended Posts

[Java] Get MimeType from the contents of the file with Apathce Tika [Kotlin]
[Java] Get Charset with Apathce Tika / Initialize String from Charset [Kotlin]
[Java] Get metadata from files with Apathce Tika, and get image / video width and height from metadata [Kotlin]
[Java] Generate Data URI from byte string of file contents [Kotlin]
CI the architecture of Java / Kotlin applications with ArchUnit
Java language from the perspective of Kotlin and C #
[Java] Get the file in the jar regardless of the environment
[Java] Get the file path in the folder with List
Get to the abbreviations from 5 examples of iterating Java lists
Replace the contents of the Jar file
[Java1.8 +] Get the date of the next x day of the week with LocalDate
Get the public URL of a private Flickr file in Java
Overwrite upload of file with the same name with BOX SDK (java)
How to get the length of an audio file in java
Get the result of POST in Java
Check the contents of the Java certificate store
Check the contents of params with pry
[Java] Get the day of the specific day of the week
Memo: [Java] Check the contents of the directory
[Kotlin] Delete files with duplicate contents [Java]
Format the contents of LocalDate with DateTimeFormatter
[Java8] Search the directory and get the file
[Java] Get the date with the LocalDateTime class
Generate source code from JAR file with JD-GUI of Java Decompiler project
[Java] Parse Excel (not limited to various) files with Apathce Tika [Kotlin]
Verify the contents of the argument object with Mockito
[Java] Set the time from the browser with jsoup
[Java] Get the length of the surrogate pair string
[JAVA] Get only the file name, excluding the extension
Overwrite the contents of config with Spring-boot + JUnit5
Calculate the similarity score of strings with JAVA
Increment with the third argument of iterate method of Stream class added from Java9
[Java / Kotlin] Resize considering the orientation of the image
Get Enum by reverse lookup from the contents
[Java] How to get the authority of the folder
[Java] Get KFunction from Method / Constructor in Java [Kotlin]
[Java] How to get the URL of the transition source
Get the URL of the HTTP redirect destination in Java
[Swift] Get the number of steps with CMP edometer
List the contents of categories created with Active Hash
[Kotlin] Get the argument name of the constructor by reflection
How to write Scala from the perspective of Java
[Java] Check the JDK version of the built war file
Call a method with a Kotlin callback block from Java
[Java] How to get the maximum value of HashMap
Monitor the internal state of Java programs with Kubernetes
Check the behavior of Java Intrinsic Locks with bpftrace
[Java] Get the date 10 days later with the Calendar class
[Rails] How to get the contents of strong parameters
Java: Use Stream to sort the contents of the collection
The story of making dto, dao-like with java, sqlite
Replace only part of the URL host with java
Get started with serverless Java with the lightweight framework Micronaut!
I want to get a list of the contents of a zip file and its uncompressed size
I want to recreate the contents of assets from scratch in the environment built with capistrano
Whether to make the server side at the time of system rebuild with Kotlin or Java
Find the address class and address type from the IP address with Java
How to use trained model of tensorflow2.0 with Kotlin / Java
I tried to summarize the basics of kotlin and java
How to get the longest information from Twitter as of 12/12/2016
[Java] Simplify the implementation of data history management with Reladomo