In business, it became necessary to implement full-width → half-width or half-width → full-width conversion of character strings. A memo of the result of the investigation. It is used to absorb notational fluctuations during comparison processing.
--Conversion target: Alphanumeric characters, symbols --I want to target only those symbols that can be converted one-to-one (so-called environment-dependent characters are excluded). --There is no need to convert katakana, but if it is unique for the purpose of use, it can be converted. --Execution environment: Java SE 8u201 (64bit)
Normalizer
Method 1. How to use java.text.Normalizer
which is available in Java 6 and above.
As a concern, since it is not intended for full-width ⇔ half-width conversion, extra conversion may occur during the normalization process.
ICU4J ICU - International Components for Unicode Method 2. Use an external library. Use the latest version at the time of writing, ICU 4J 63.1.
Transliterator fullToHalf = Transliterator.getInstance("Fullwidth-Halfwidth");
Transliterator halftoFull = Transliterator.getInstance("Halfwidth-Fullwidth");
String target =
"All half ""\" \"Agapa Agapa ABabABab123123 "" ()()[][];;!!??##//--・ ┣ ① ⑪ ㌀ Co., Ltd. ㌔¼ ⑴";
System.out.println("target :" + target);
System.out.println("NFC :" + Normalizer.normalize(target, Normalizer.Form.NFC));
System.out.println("NFD :" + Normalizer.normalize(target, Normalizer.Form.NFD));
System.out.println("NFKC :" + Normalizer.normalize(target, Normalizer.Form.NFKC));
System.out.println("NFKD :" + Normalizer.normalize(target, Normalizer.Form.NFKD));
System.out.println("ICU4J H:" + fullToHalf.transliterate(target));
System.out.println("ICU4J F:" + halftoFull.transliterate(target));
target :All half """ "Agapa Agapa ABabABab123123 "" ()()[][];;!!??##//--・ ┣ ① ⑪ ㌀ Co., Ltd. ㌔¼ ⑴
NFC :All half """ "Agapa Agapa ABabABab123123 "" ()()[][];;!!??##//--・ ┣ ① ⑪ ㌀ Co., Ltd. ㌔¼ ⑴
NFD :All half """ "Agapa Agapa ABabABab123123 "" ()()[][];;!!??##//--・ ┣ ① ⑪ ㌀ Co., Ltd. ㌔¼ ⑴
NFKC :All half """ "Agapa Agapa Agapa ABabABab123123 ""()()[][];;!!??##//--・ ┣111 apartment(stock)Km 1⁄4(1)
NFKD :All half """ "Agapa Agapa Agapa ABabABab123123 ""()()[][];;!!??##//--・ ┣111 apartment(stock)Km 1⁄4(1)
ICU4J H:All half """ "Agapa Agapa ABabABab123123 ""()()[][];;!!??##//--・ ┣①⑪㌀ Co., Ltd.㌔¼⑴
ICU4J F:All half "" "Agapa Agapa Agapa ABabABab123123" "" () () [] [] ;;! !! ?? ?? ## // -- ・ ┣ ① ⑪ ㌀ Co., Ltd. ㌔¼ ⑴
This time, I decided to use ICU4J to convert from full-width to half-width. If you can narrow down the input in advance, I think Normalizer is easier and better.
-What you should be careful about when performing full-width / half-width conversion with Normalizer -Let's use Java character conversion library ICU4J -Best practice for converting half-width kana to full-width kana in Java
Recommended Posts