[Android → WEB] Yen mark / backslash code point conversion (code point: A5 or C2A5 ⇔ C5)

Overview

When using the "yen mark (half-width ¥)" on Android Because the character code has been treated as "backslash (half-width /)" in Windows We have summarized the difference in character code points between the yen mark and backslash and this measure.

Differences in code points in each environment (circle code / backslash)

With UTF-8 in each environment The following code points are used for the yen code and backslash.

Windows Web Yen mark: 5C Backslash: 5C

Mac / iOS / Android Web Yen mark: A5 Backslash: 5C

iOS / Android native app Yen mark: C2A5 Backslash: 5C

[Reference URL] Character code of yen mark and backslash

Countermeasures

As a countermeasure, all the code points of the yen mark, This is a countermeasure for Windows Web (code point: A5 or C2A5 ⇒ C5).

Source code creation

I created a class (CodePointConversion.java) that performs codepoint conversion. Converts the target code point (A5 or C2A5) to the code point (C5).

CodePointConversion.java


import java.util.Map;
import java.util.HashMap;
import java.lang.StringBuilder;

/**
*Code point conversion class
* @author HogeHoge
*/
public class CodePointConversion {
	//Code point conversion table. Convert from KEY to VALUE code point.
	private static Map<Integer, Integer> conversion_map = new HashMap<Integer, Integer>() {
		// ¥ → \
		{put(0xA5, 0x5C);
		 put(0xC2A5, 0x5C);}
	};
	
	/**
     *Perform code point conversion
     * @param str Code point conversion string
     * @return String after code point conversion
     */
     public static String convertCordPoint(String str) {
     	//null check
		if (str == null) {
			return str;
		}
		StringBuilder sb = new StringBuilder(str);
		
		//Get character loop
		for(int i = 0; i < sb.length(); i++) {
			//Get the code point of the acquired character
			int code_point = sb.codePointAt(i);
			
			for (Map.Entry<Integer, Integer> entry : conversion_map.entrySet()) {
		    	if (code_point == entry.getKey()) {
		    		//If it is a code point conversion target, perform code point conversion.
		    		String converted_char = new String(Character.toChars(entry.getValue()));
					sb.replace(i, i+1, converted_char);
				}
			}
     	}
     	
     	return sb.toString();
    }
}

Test code creation

This is a test code for checking the operation. For the input value and output value, check the execution result.

CodePointConversionTest.java


import org.junit.jupiter.api.Test;
import static org.junit.Assert.*;
import static org.hamcrest.CoreMatchers.*;

class CodePointConversionTest {

	@Test
	void testConvertCordPoint001() {
		//[Input value] \\ * \ is a half-width character.
		String input_str = new String(Character.toChars(0xA5)) + new String(Character.toChars(0xC2A5));
		//【Expected value】\\※the first\Is an escape character.
		String expect_str = "\\\\";
		
		System.out.println("【Input value】\n" + input_str);
		System.out.println("[Input code point]\n" + Integer.toHexString(input_str.codePointAt(0)) + "\n" + Integer.toHexString(input_str.codePointAt(1)));
		
		//Code point conversion. ¥ →\Is converted to.
		String result_str = CodePointConversion.convertCordPoint(input_str);
		
		//Output value
		System.out.println("\n [Output value]\n" + result_str);
		System.out.println("[Output code point]\n" + Integer.toHexString(result_str.codePointAt(0)) + "\n" + Integer.toHexString(result_str.codePointAt(1)));
		
		//Check the output result.
		assertThat(result_str, is(expect_str));
	}

}

The following execution results.

【Input value】
\?
[Input code point]
a5
c2a5

【Output value】
\\
[Output code point]
5c
5c

The above input value is "?", But it is a problem on the screen display of the Windows terminal. Actually, the character string "Yen mark: A5 Yen mark: C2A5" is included as an input value.

How to treat backslash as a circle code on Android

The methods described so far have changed the yen mark to a backslash (code point: A5 or C2A5 ⇒ C5). I described how to convert code points, but On Android, there may be cases where you want to treat it as a backslash ⇒ yen mark (code point: C5 ⇒ A5).

In that case, please handle in the form of reversing the KEY and VALUE of conversion_map (code point conversion table).

(Example) How to treat the code point of C5 as A5.
	//Code point conversion table. Convert from KEY to VALUE code point.
	private static Map<Integer, Integer> conversion_map = new HashMap<Integer, Integer>() {
		// \ → ¥
		{put(0x5C, 0xA5);}
	};

At the end

This time, I described how to perform code point conversion (code point: A5 or C2A5 ⇔ C5) of yen mark and backslash. Just add code points to conversion_map (code point conversion table) It is possible to convert various code points.

If you like, please use it when performing code point conversion in Java.

Recommended Posts

[Android → WEB] Yen mark / backslash code point conversion (code point: A5 or C2A5 ⇔ C5)