When there are a lot of duplicate files in a directory, leave only one and delete the duplicates.
Comparing the entire file is difficult because the processing cost is high.
Therefore, this time we decided to compare the hashes (digests) of the files and implemented it with the following policy.
Set
to
Set` if it does not existThe sample that works for the time being is as follows.
import java.io.File
import java.security.MessageDigest
val sha256 : MessageDigest = MessageDigest.getInstance("SHA-256")
fun getDigest(bytes: ByteArray): List<Byte> = sha256.digest(bytes).asList()
fun getFiles(pathToDir: String): List<File> = File(pathToDir).listFiles()?.asList() ?: emptyList()
fun main() {
val files = getFiles(/*Path of the directory to be processed*/)
val set = HashSet<List<Byte>>()
var count = 0
files.forEach {
val digest = getDigest(it.readBytes())
if (!set.add(digest)) {
if (it.delete()) {
println("Deleted:\t${it.name}")
count++
} else {
println("Fail delete:\t${it.name}")
}
}
}
println("\n\n$count deleted.")
}
Deleted: 43_3 copies.gif
Deleted: 46_Copy of 3 2.gif
Deleted: 70_Copy of 1 2.gif
Deleted: 94_1 copy.gif
Deleted: 50_Copy of 3 2.gif
Deleted: 66_1 copy.gif
Deleted: 95_1 copy.jpg
Deleted: 58_3 copies.gif
Deleted: 63_1 copy.gif
Deleted: 32_1 copy.jpg
Deleted: 55_3 copies.gif
Deleted: 62_3 copies.gif
Deleted: 49_3 copies.gif
Deleted: 9_Copy of 1 2.gif
Deleted: 47_3 copies.gif
Deleted: 96_1 copy.jpg
Deleted: 71_1 copy.gif
Deleted: 52_Copy of 3 2.gif
Deleted: 64_Copy of 1 2.gif
Deleted: 61_3 copies.gif
Deleted: 56_3 copies.gif
Deleted: 60_Copy of 3 2.gif
Deleted: 31_1 copy.jpg
Deleted: 57_Copy of 3 2.gif
Deleted: 98_Copy of 1 2.jpg
Deleted: 34_1 copy.jpg
Deleted: 68_1 copy.gif
Deleted: 53_3 copies.gif
Deleted: 42_3 copies.gif
Deleted: 74_Copy of 1 2.gif
Deleted: 30_1 copy.gif
Deleted: 36_Copy of 1 2.gif
Deleted: 65_1 copy.gif
Deleted: 100_1 copy.jpg
Deleted: 37_1 copy.gif
Deleted: 35_Copy of 1 2.gif
Deleted: 45_3 copies.gif
Deleted: 99_1 copy.jpg
Deleted: 87_Copy of 1 2.jpg
Deleted: 33_1 copy.jpg
Deleted: 73_1 copy.gif
Deleted: 1_7 copies.jpg
Deleted: 48_3 copies.gif
Deleted: 54_Copy of 3 2.gif
Deleted: 51_3 copies.gif
Deleted: 67_1 copy.gif
Deleted: 93_Copy of 1 2.gif
Deleted: 44_Copy of 3 2.gif
Deleted: 72_Copy of 1 2.gif
Deleted: 97_Copy of 1 2.jpg
50 deleted.
I used java.security.MessageDigest
.
This can be used as the Java
standard without installing any libraries.
This time, I specified SHA-256
for the text, but if you want to avoid duplication to the limit, I think you should specify SHA-512
.
Probably the easiest and cheapest to use is HashSet
.
Also, in ByteArray
, there is anxiety around ʻequals, so here it is converted to
List` and handled.
Recommended Posts