Be careful if you find SHIFT-JIS in Java

Background

There was a problem that "~" was garbled in the WEB application created by Java. Since the DB character code was UTF-8 and the CSV output was MS932, it was a simpler story when I was investigating whether it was garbled due to conversion of different character codes. I wasted a lot of time, so I'll leave a note as well.

Cause

Even if the DB is UTF-8, there is no problem here because it is UTF16, which is an internal representation in Java. When I was following the process, after acquiring the data from the DB, it was converted to SHIFT-JIS by Java processing and then converted to MS932. I thought it was a conversion problem of SHIFT-JIS, MS932 that appears in "~", but it was a conversion problem of SHIFT-JIS, UTF16 (internal representation of Java).

Character conversion example

I created a simple source code and verified it. (java: 1.8.0_121) When a Java character string is generated after converting to a byte array of SHIFT-JIS and MS932, only SHIFT-JIS is garbled.

        String org = "~";

        byte[] sjBytes = org.getBytes("SHIFT-JIS");
        byte[] ms932Bytes = org.getBytes("MS932");

        String sj = new String(sjBytes, "SHIFT-JIS");
        String ms932 = new String(ms932Bytes, "MS932");

        String fmt = "%s\t string:%s,Byte array:%s";
        System.out.println(String.format(fmt, "Original character", org, DatatypeConverter.printHexBinary(org.getBytes())));
        System.out.println(String.format(fmt, "SHIFT-JIS", sj, DatatypeConverter.printHexBinary(sjBytes)));
        System.out.println(String.format(fmt, "MS932", ms932, DatatypeConverter.printHexBinary(ms932Bytes)));

Output result

Original string: ~, byte array: EFBD9E SHIFT-JIS string:?, Byte array: 3F MS932 String: ~, Byte array: 8160

Summary

There is no problem if you use UTF-8 in the first place, but it is difficult to handle because the specifications will change. If you really want to use SHIFT-JIS, MS932 is enough, so don't use SHIFT-JIS.

Table of Contents

Recommended Posts

Be careful if you find SHIFT-JIS in Java
Be careful about upgrade if you use | etc. in the URL of Tomcat
Cannot find javax.annotation.Generated in Java 11
Find a subset in Java
Try an If expression in Java
Do you use Stream in Java?
Determine if the strings to be compared are the same in Java
Be careful when omitting return in Ruby
If you can't install java on Catalina
Find the maximum and minimum of the five numbers you entered in Java
What Java programmers find useful in Kotlin
If you use Spring's DataSourceTransactionManager, it may be committed in case of error! ??
Be careful with requests and responses when using the Serverless Framework in Java
CORBA seems to be removed in Java SE 11. .. ..
[Java] Be careful of the key type of Map
Second decoction: Try an If expression in Java
There seems to be no else-if in java
A note when you want Tuple in Java
[Ruby] What if you put each statement in each statement? ??
In Java Try-with-Resources, even if you return in the try clause, it will be closed properly, so let's return without worrying
Partization in Java
Changes in Java 11
Rock-paper-scissors in Java
Pi in Java
FizzBuzz in Java
Omission of curly braces in if statement (Java silver)
After all, if you learn first, I think Java
Write a class that can be ordered in Java
Do not write if (isAdmin == true) code in Java
Output true with if (a == 1 && a == 2 && a == 3) in Java (Invisible Identifier)
When you want to dynamically replace Annotation in Java8
If you want to recreate the instance in cloud9
JAWJAW is convenient if you use WordNet from Java
[Java10] Be careful of using var and generics together
I tried to find out what changed in Java 9