When I was running a Java program on a CentOS Docker container, a mysterious garbled character occurred when I got a list of files containing Japanese filenames.
Sample.java
import java.io.*;
public class Sample {
public static void main(String[] args) {
//File name is Japanese file "/sample/AIUEO.Place "csv"
new File("/sample").listFiles(new FilenameFilter() {
public boolean accept(File dir, String name) {
System.out.println(name); // =>Japanese file name is garbled when getting file list
return false;
}
});
}
}
By the way, it has been confirmed that garbled characters do not occur when the LANG environment variable is set to ʻen_US.UTF-8, and garbled characters occur when the LANG environment variable is set to
ja_JP.UTF-8`.
This article describes the causes of garbled Japanese file names and how to deal with them.
First, setting ja_JP.UTF-8
in the LANG environment variable causes garbled characters, which is because the Japanese locale is not registered in the ** CentOS image of Docker **.
You can check the locales that can be specified in the LANG environment variable with the locale -a
command.
Try running the command inside the container of the CentOS image to check.
# locale -a
C
POSIX
en_US.utf8
As mentioned above, the Japanese locale is not included in the Docker CentOS image container. If you try to get the file list from a Java program by specifying the LANG environment variable as shown below in this container, the Japanese file name will be garbled.
LANG=ja_JP.UTF-8
export LANG
java Sample
=>Garbled Japanese file name.csv
As a workaround, use the localedef
command to ** add a Japanese locale ** to eliminate the garbled characters.
Add the following command as a RUN instruction in the Dockerfile or run it inside the container.
# localedef -f UTF-8 -i ja_JP ja_JP.UTF-8
Check the locales that can be specified with the locale -a
command again.
# locale -a
C
POSIX
en_US.utf8
ja_JP.utf8
The localedef
command added ja_JP.utf8
.
Now, even if you set the LANG environment variable, you can handle Japanese file names without garbled characters.