In Unicode (UTF-16), one character is usually represented by two bytes. However, as the number of characters that should be handled by Unicode increased, the number of characters that can be expressed in 2 bytes (65535 characters) became insufficient, and by expressing some characters in 4 bytes, the number of characters that can be handled increased. .. Such 4-byte characters are called surrogate pairs.
The character "rebuke" is a surrogate pair, so if you normally use the
length method, it will be considered two characters.
Therefore, to correctly count strings containing surrogate pairs, use the
codePointCount method instead of the
var str1 = "Hello"; System.out.println(str1.length()); //Result: 5 var str2 = "Scold"; System.out.println(str2.length()); //Result: 3 //This will get the correct number of characters System.out.println(str2.codePointCount(0, str2.length())); //Result: 2
/** @param begin Start position for length @param end End position for length @number of return characters */ public int codePointCount(int begin, int end)