It seems that the method that makes full use of NIO is recommended for the implementation of truncation by the number of bytes of String. However, the implementation of the street introduced in Japanese does not seem to work as expected on the Android environment.
If you search on Google, the following pages will hit the top. http://qiita.com/ota-meshi/items/16972156c935b8b7feaa http://d.hatena.ne.jp/kameid/20090314/1237025305
The general flow of these implementations is as follows.
When I tested this process, I got the following results.
Test type | Execution environment | result |
---|---|---|
Local unit test | Android Studio OpenJDK | "AIUE" |
Instrumented test | Android emulator | "AIUEO" |
The above implementation is made assuming that the position of CharBuffer is 4 when the encoding is terminated by Overflow. However, on the Android emulator or the actual device, when the encoding was terminated by Overflow, the CharBuffer position advanced to 5, so the truncation did not work as expected.
Just in case, if you check the description about the position of Charset Encoder, it seems that it is not so strict explanation. It seems that the position on the writing side can be expected to be located at the end at the time of Overflow, but the reading side is likely to change depending on the implementation of the Overflow judgment of each Encoder. I thought.
https://docs.oracle.com/javase/jp/8/docs/api/java/nio/charset/CharsetEncoder.html The position of the buffer increases with the number of characters read or bytes written,
When I reached out to the English-speaking world for search on Google, the following page was introduced. https://theholyjava.wordpress.com/2007/11/02/truncating-utf-string-to-the-given/
The implementation here is roughly as follows. It is a process that does not depend on the position on the reading side when encoding / decoding is completed.
With reference to this, we will introduce the implementation that incorporates the early truncation unnecessary judgment.
public static String truncate(String text, int capacity) {
if (text == null || capacity < 0) {
throw new IllegalArgumentException("invalid parameter.");
}
Charset charset = StandardCharsets.UTF_8;
CharsetEncoder encoder = charset.newEncoder()
.onMalformedInput(CodingErrorAction.IGNORE)
.onUnmappableCharacter(CodingErrorAction.IGNORE)
.reset();
// step 0.
int estimate = text.length() * (int) Math.ceil(encoder.maxBytesPerChar());
if (estimate <= capacity) {
return text;
}
// step 1.
ByteBuffer srcBuffer = ByteBuffer.allocate(capacity);
CoderResult result = encoder.encode(CharBuffer.wrap(text), srcBuffer, true);
encoder.flush(srcBuffer);
srcBuffer.flip();
if (result.isUnderflow()) {
return text;
}
// step 2.
CharBuffer dstBuffer = CharBuffer.allocate(text.length());
CharsetDecoder decoder = charset.newDecoder()
.onMalformedInput(CodingErrorAction.IGNORE)
.onUnmappableCharacter(CodingErrorAction.IGNORE)
.reset();
decoder.decode(srcBuffer, dstBuffer, true);
decoder.flush(dstBuffer);
dstBuffer.flip();
// step 3.
return dstBuffer.toString();
}
@Test
public void truncate() throws Exception {
//1Byte character
String testA = "abcde";
String testA_len0 = "";
String testA_len1 = "a";
String testA_len4 = "abcd";
String testA_len5 = "abcde";
assertThat(StringUtil.truncate(testA, 0), is(testA_len0));
assertThat(StringUtil.truncate(testA, 1), is(testA_len1));
assertThat(StringUtil.truncate(testA, 4), is(testA_len4));
assertThat(StringUtil.truncate(testA, 5), is(testA_len5));
//3Byte characters
String testB = "AIUEO";
String testB_len0 = "";
String testB_len1 = "Ah";
String testB_len4 = "AIUE";
String testB_len5 = "AIUEO";
assertThat(StringUtil.truncate(testB, 0), is(testB_len0));
assertThat(StringUtil.truncate(testB, 2), is(testB_len0));
assertThat(StringUtil.truncate(testB, 3), is(testB_len1));
assertThat(StringUtil.truncate(testB, 14), is(testB_len4));
assertThat(StringUtil.truncate(testB, 15), is(testB_len5));
//4Byte characters
//5 characters
// https://www.softel.co.jp/blogs/tech/archives/596
String testC = "\uD840\uDC0B\uD844\uDE3D\uD844\uDF1B\uD845\uDC6E\uD846\uDCBD";
String testC_len0 = "";
String testC_len1 = "\uD840\uDC0B";
String testC_len4 = "\uD840\uDC0B\uD844\uDE3D\uD844\uDF1B\uD845\uDC6E";
String testC_len5 = "\uD840\uDC0B\uD844\uDE3D\uD844\uDF1B\uD845\uDC6E\uD846\uDCBD";
assertThat(StringUtil.truncate(testC, 3), is(testC_len0));
assertThat(StringUtil.truncate(testC, 4), is(testC_len1));
assertThat(StringUtil.truncate(testC, 19), is(testC_len4));
assertThat(StringUtil.truncate(testC, 20), is(testC_len5));
//Combination of 1Byte and 3Byte characters
String testD = "A A B I C U D E E";
String testD_len1 = "A";
String testD_len2 = "A Ah";
String testD_len9 = "A A B I C U D E E";
String testD_len10 = "A A B I C U D E E";
assertThat(StringUtil.truncate(testD, 1), is(testD_len1));
assertThat(StringUtil.truncate(testD, 3), is(testD_len1));
assertThat(StringUtil.truncate(testD, 4), is(testD_len2));
assertThat(StringUtil.truncate(testD, 19), is(testD_len9));
assertThat(StringUtil.truncate(testD, 20), is(testD_len10));
//Emoji
//Japanese flag, BATH
// U+1F1EF U+1F1F5, U+1F6C0
// 4+4Byte + 4Byte
// http://qiita.com/_sobataro/items/47989ee4b573e0c2adfc
String testE = "\uD83C\uDDEF\uD83C\uDDF5\uD83D\uDEC0";
String testE_len0 = "";
String testE_len1 = "\uD83C\uDDEF";
String testE_len2 = "\uD83C\uDDEF\uD83C\uDDF5";
String testE_len3 = "\uD83C\uDDEF\uD83C\uDDF5\uD83D\uDEC0";
assertThat(StringUtil.truncate(testE, 3), is(testE_len0));
assertThat(StringUtil.truncate(testE, 4), is(testE_len1));
assertThat(StringUtil.truncate(testE, 7), is(testE_len1));
assertThat(StringUtil.truncate(testE, 8), is(testE_len2));
assertThat(StringUtil.truncate(testE, 11), is(testE_len2));
assertThat(StringUtil.truncate(testE, 12), is(testE_len3));
//String string length check
assertEquals(1 + 1 + 1 + 1 + 1, testA.length());
assertEquals(1 + 1 + 1 + 1 + 1, testB.length());
assertEquals(2 + 2 + 2 + 2 + 2, testC.length());
assertEquals(2 + 2 + 2, testE.length());
}
Recommended Posts