Everyone hates character codes, doesn't it? There was talk of Java 8 and UTF-8 becoming the default, Now, I wish I could write a sample that uses the character code in Java.
First of all, when do you need to be aware of the character code? That's when there is input or output from something other than your own Java application.
Input data from the client in the server-client system Read external files such as CSV
Return from server in server-client system File export DB registration
I think there are many others, but here are some that I often use.
String.getBytes A function that gets a string in byte format.
"TEST".getBytes(StandardCharsets.UTF_8);
In this way, specify the character code in the argument of getBytes. By doing this, the character string "TEST" can be interpreted as UTF-8 and converted to byte format. This getBytes function can specify nothing as an argument. In that case, the default character code in the execution environment is used. If you want to check the value, you can check it by executing the following code.
System.out.println(System.getProperty("file.encoding"));
If you want to change the default character code, specify the following options at runtime.
-Dfile.encoding=〇〇
You can get the character string of the specified character code by receiving byte [] and the character code in the constructor of String.
byte[] byte1 = "TEST".getBytes(StandardCharsets.UTF_8);
String encorded = new String(byte1, "MS932");
If this constructor also does not specify a character code, the default character code in the execution environment will be used.
There are many ways to read a file, but only one is excerpted.
try {
BufferedReader bufferedReader = Files.newBufferedReader(Paths.get(""), StandardCharsets.UTF_8);
} catch (IOException e) {
//TODO auto-generated catch block
e.printStackTrace();
}
Specify the file to be read with Files.newBufferedReader and specify the character code in the second argument. This function can omit the character code, in which case it will be UTF-8 in any environment. (It seems to be from Java8) It looks like this.
public static BufferedReader newBufferedReader(Path path) throws IOException {
return newBufferedReader(path, StandardCharsets.UTF_8);
}
Well, I would like to end here today. Files.newBufferedReader defaults to UTF-8, and getBytes and String constructors have different default values depending on the environment. Please note that the default value differs depending on the function used. Well, I think you should specify the character code at any time. ..
Recommended Posts