phenomenon

One letter of the alphabet with a pronunciation distinction, such as umlaut used in German, is displayed as two garbled letters. For example, the place name Kärnten becomes KÃ¤rnten.

Most of the other alphabets aren't garbled, so it's hard to notice (in fact, if you google with "KÃ¤rnten" you'll see a lot of garbled sites).

This time I had this problem when reading and writing the exif metadata of an image in Java.

Cause

utf-8The character string saved asiso-8859-1Because it has been read as.

Below is an example of execution in Java REPL.

`python`


java> String s = new String("Kärnten")

java> byte[] iso = s.getBytes("ISO-8859-1")
byte[] iso = [75, -28, 114, 110, 116, 101, 110]

java> byte[] utf8 = s.getBytes("UTF-8")
byte[] utf8 = [75, -61, -92, 114, 110, 116, 101, 110]

Thus, "ä" is represented by 1 byte ( `-28```) in ISO-8859-1 and 2 bytes ( -61, -92```) in UTF-8. To. If you save the byte string in UTF-8 and then read it as ISO-8859-1, `` -61 will be interpreted as "Ã" and `` `-92 will be interpreted as" ¤ ". So

`python`


java> new String(utf8, "ISO-8859-1")
KÃ¤rnten

It turns into something like that.

The same applies to other pronunciation distinctions. Example:

ö → Ã¶
ü → Ã¼
Ä → Ã

Coping

Obviously, specify the correct character code for both reading and writing.

`python`


java> new String(utf8, "ISO-8859-1");
Kärnten

java> new String(iso, "ISO-8859-1");
Kärnten

reference

https://forum.httrack.com/readmsg/18923/indexhtml

[JAVA] Umlaut garbled characters

phenomenon

Cause

python

python

Coping

python

reference

`python`

`python`

`python`