About full-width ⇔ half-width conversion of character strings in Java

background

In business, it became necessary to implement full-width → half-width or half-width → full-width conversion of character strings. A memo of the result of the investigation. It is used to absorb notational fluctuations during comparison processing.

Requirements

--Conversion target: Alphanumeric characters, symbols --I want to target only those symbols that can be converted one-to-one (so-called environment-dependent characters are excluded). --There is no need to convert katakana, but if it is unique for the purpose of use, it can be converted. --Execution environment: Java SE 8u201 (64bit)

Method

Normalizer Method 1. How to use java.text.Normalizer which is available in Java 6 and above. As a concern, since it is not intended for full-width ⇔ half-width conversion, extra conversion may occur during the normalization process.

ICU4J ICU - International Components for Unicode Method 2. Use an external library. Use the latest version at the time of writing, ICU 4J 63.1.

Source

Transliterator fullToHalf = Transliterator.getInstance("Fullwidth-Halfwidth");
Transliterator halftoFull = Transliterator.getInstance("Halfwidth-Fullwidth");
String target =
    "All half ""\" \"Agapa Agapa ABabABab123123 "" ()()[][];;!!??##//--・ ┣ ① ⑪ ㌀ Co., Ltd. ㌔¼ ⑴";
System.out.println("target :" + target);
System.out.println("NFC    :" + Normalizer.normalize(target, Normalizer.Form.NFC));
System.out.println("NFD    :" + Normalizer.normalize(target, Normalizer.Form.NFD));
System.out.println("NFKC   :" + Normalizer.normalize(target, Normalizer.Form.NFKC));
System.out.println("NFKD   :" + Normalizer.normalize(target, Normalizer.Form.NFKD));
System.out.println("ICU4J H:" + fullToHalf.transliterate(target));
System.out.println("ICU4J F:" + halftoFull.transliterate(target));

Execution result

target :All half """ "Agapa Agapa ABabABab123123 "" ()()[][];;!!??##//--・ ┣ ① ⑪ ㌀ Co., Ltd. ㌔¼ ⑴
NFC    :All half """ "Agapa Agapa ABabABab123123 "" ()()[][];;!!??##//--・ ┣ ① ⑪ ㌀ Co., Ltd. ㌔¼ ⑴
NFD    :All half """ "Agapa Agapa ABabABab123123 "" ()()[][];;!!??##//--・ ┣ ① ⑪ ㌀ Co., Ltd. ㌔¼ ⑴
NFKC   :All half """ "Agapa Agapa Agapa ABabABab123123 ""()()[][];;!!??##//--・ ┣111 apartment(stock)Km 1⁄4(1)
NFKD   :All half """ "Agapa Agapa Agapa ABabABab123123 ""()()[][];;!!??##//--・ ┣111 apartment(stock)Km 1⁄4(1)
ICU4J H:All half """ "Agapa Agapa ABabABab123123 ""()()[][];;!!??##//--・ ┣①⑪㌀ Co., Ltd.㌔¼⑴
ICU4J F:All half "" "Agapa Agapa Agapa ABabABab123123" "" () () [] [] ;;! !! ?? ?? ## // -- ・ ┣ ① ⑪ ㌀ Co., Ltd. ㌔¼ ⑴

Conclusion

This time, I decided to use ICU4J to convert from full-width to half-width. If you can narrow down the input in advance, I think Normalizer is easier and better.

References

-What you should be careful about when performing full-width / half-width conversion with Normalizer -Let's use Java character conversion library ICU4J -Best practice for converting half-width kana to full-width kana in Java

Recommended Posts

About full-width ⇔ half-width conversion of character strings in Java
[Java] Full-width ⇔ half-width conversion
Full-width → half-width conversion with Java String (full-width kana → half-width kana)
[Java] Comparison of String type character strings
[java] Summary of how to handle character strings
About the idea of anonymous classes in Java
[Java] Handling of character strings (String class and StringBuilder class)
Basics of character operation (java)
Implementation of gzip in java
About abstract classes in java
Implementation of tri-tree in Java
[Java] Comparison method of character strings and comparison method using regular expressions
[Introduction to Java] Handling of character strings (String class, StringBuilder class)
Guess the character code in Java
About fastqc of Biocontainers and Java
About Lambda, Stream, LocalDate of Java8
[Java] Remove whitespace from character strings
List of members added in Java 9
Cast an array of Strings to a List of Integers in Java
Concatenate strings returned by methods of multiple objects in Java Stream
About file copy processing in Java
[Algorithm] Descending order of character strings
List of types added in Java 9
Implementation of like function in Java
How to concatenate strings in java
About returning a reference in a Java Getter
[Java] I participated in ABC-188 of Atcorder.
Implementation of DBlayer in Java (RDB, MySQL)
Get the result of POST in Java
[Easy-to-understand explanation! ] Reference type type conversion in Java
OCR in Java (character recognition from images)
[Creating] A memorandum about coding in Java
How to do base conversion in Java
About Records preview added in Java JDK 14
Immutable (immutable) List object conversion function in Java8
Continued Talk about writing Java in Emacs @ 2018
[Java] Precautions when comparing character strings with character strings
The story of writing Java in Emacs
Role of JSP in Web application [Java]
Discrimination of Enums in Java 7 and above
The story of low-level string comparison in Java
[Java] Handling of JavaBeans in the method chain
[Introduction to Java] About type conversion (cast, promotion)
About the description order of Java system properties
A story about the JDK in the Java 11 era
Try scraping about 30 lines in Java (CSV output)
The story of learning Java in the first programming
Measure the size of a folder in Java
About var used in Java (Local Variable Type)
[Java] Use of final in local variable declaration
Feel the passage of time even in Java
Calculate the similarity score of strings with JAVA
Basics of threads and Callable in Java [Beginner]
A quick review of Java learned in class
Method name of method chain in Java Builder + α
Import files of the same hierarchy in Java
Why use setters/getters instead of public/private in Java