[JAVA] About truncation by the number of bytes of String on Android

It seems that the method that makes full use of NIO is recommended for the implementation of truncation by the number of bytes of String. However, the implementation of the street introduced in Japanese does not seem to work as expected on the Android environment.

Implementation of streets by NIO and problems

If you search on Google, the following pages will hit the top. http://qiita.com/ota-meshi/items/16972156c935b8b7feaa http://d.hatena.ne.jp/kameid/20090314/1237025305

The general flow of these implementations is as follows.

When I tested this process, I got the following results.

Test type Execution environment result
Local unit test Android Studio OpenJDK "AIUE"
Instrumented test Android emulator "AIUEO"

The above implementation is made assuming that the position of CharBuffer is 4 when the encoding is terminated by Overflow. However, on the Android emulator or the actual device, when the encoding was terminated by Overflow, the CharBuffer position advanced to 5, so the truncation did not work as expected.

Just in case, if you check the description about the position of Charset Encoder, it seems that it is not so strict explanation. It seems that the position on the writing side can be expected to be located at the end at the time of Overflow, but the reading side is likely to change depending on the implementation of the Overflow judgment of each Encoder. I thought.

https://docs.oracle.com/javase/jp/8/docs/api/java/nio/charset/CharsetEncoder.html The position of the buffer increases with the number of characters read or bytes written,

Implementation of the solution version

When I reached out to the English-speaking world for search on Google, the following page was introduced. https://theholyjava.wordpress.com/2007/11/02/truncating-utf-string-to-the-given/

The implementation here is roughly as follows. It is a process that does not depend on the position on the reading side when encoding / decoding is completed.

With reference to this, we will introduce the implementation that incorporates the early truncation unnecessary judgment.


    public static String truncate(String text, int capacity) {
        if (text == null || capacity < 0) {
            throw new IllegalArgumentException("invalid parameter.");
        }

        Charset charset = StandardCharsets.UTF_8;
        CharsetEncoder encoder = charset.newEncoder()
                .onMalformedInput(CodingErrorAction.IGNORE)
                .onUnmappableCharacter(CodingErrorAction.IGNORE)
                .reset();
        // step 0.
        int estimate = text.length() * (int) Math.ceil(encoder.maxBytesPerChar());
        if (estimate <= capacity) {
            return text;
        }

        // step 1.
        ByteBuffer srcBuffer = ByteBuffer.allocate(capacity);
        CoderResult result = encoder.encode(CharBuffer.wrap(text), srcBuffer, true);
        encoder.flush(srcBuffer);
        srcBuffer.flip();
        if (result.isUnderflow()) {
            return text;
        }

        // step 2.
        CharBuffer dstBuffer = CharBuffer.allocate(text.length());
        CharsetDecoder decoder = charset.newDecoder()
                .onMalformedInput(CodingErrorAction.IGNORE)
                .onUnmappableCharacter(CodingErrorAction.IGNORE)
                .reset();
        decoder.decode(srcBuffer, dstBuffer, true);
        decoder.flush(dstBuffer);
        dstBuffer.flip();
        // step 3.
        return dstBuffer.toString();
    }

Test code

    @Test
    public void truncate() throws Exception {
        //1Byte character
        String testA = "abcde";
        String testA_len0 = "";
        String testA_len1 = "a";
        String testA_len4 = "abcd";
        String testA_len5 = "abcde";
        assertThat(StringUtil.truncate(testA, 0), is(testA_len0));
        assertThat(StringUtil.truncate(testA, 1), is(testA_len1));
        assertThat(StringUtil.truncate(testA, 4), is(testA_len4));
        assertThat(StringUtil.truncate(testA, 5), is(testA_len5));

        //3Byte characters
        String testB = "AIUEO";
        String testB_len0 = "";
        String testB_len1 = "Ah";
        String testB_len4 = "AIUE";
        String testB_len5 = "AIUEO";
        assertThat(StringUtil.truncate(testB, 0), is(testB_len0));
        assertThat(StringUtil.truncate(testB, 2), is(testB_len0));
        assertThat(StringUtil.truncate(testB, 3), is(testB_len1));
        assertThat(StringUtil.truncate(testB, 14), is(testB_len4));
        assertThat(StringUtil.truncate(testB, 15), is(testB_len5));

        //4Byte characters
        //5 characters
        // https://www.softel.co.jp/blogs/tech/archives/596
        String testC = "\uD840\uDC0B\uD844\uDE3D\uD844\uDF1B\uD845\uDC6E\uD846\uDCBD";
        String testC_len0 = "";
        String testC_len1 = "\uD840\uDC0B";
        String testC_len4 = "\uD840\uDC0B\uD844\uDE3D\uD844\uDF1B\uD845\uDC6E";
        String testC_len5 = "\uD840\uDC0B\uD844\uDE3D\uD844\uDF1B\uD845\uDC6E\uD846\uDCBD";
        assertThat(StringUtil.truncate(testC, 3), is(testC_len0));
        assertThat(StringUtil.truncate(testC, 4), is(testC_len1));
        assertThat(StringUtil.truncate(testC, 19), is(testC_len4));
        assertThat(StringUtil.truncate(testC, 20), is(testC_len5));

        //Combination of 1Byte and 3Byte characters
        String testD = "A A B I C U D E E";
        String testD_len1 = "A";
        String testD_len2 = "A Ah";
        String testD_len9 = "A A B I C U D E E";
        String testD_len10 = "A A B I C U D E E";
        assertThat(StringUtil.truncate(testD, 1), is(testD_len1));
        assertThat(StringUtil.truncate(testD, 3), is(testD_len1));
        assertThat(StringUtil.truncate(testD, 4), is(testD_len2));
        assertThat(StringUtil.truncate(testD, 19), is(testD_len9));
        assertThat(StringUtil.truncate(testD, 20), is(testD_len10));

        //Emoji
        //Japanese flag, BATH
        // U+1F1EF U+1F1F5, U+1F6C0
        // 4+4Byte + 4Byte
        // http://qiita.com/_sobataro/items/47989ee4b573e0c2adfc
        String testE = "\uD83C\uDDEF\uD83C\uDDF5\uD83D\uDEC0";
        String testE_len0 = "";
        String testE_len1 = "\uD83C\uDDEF";
        String testE_len2 = "\uD83C\uDDEF\uD83C\uDDF5";
        String testE_len3 = "\uD83C\uDDEF\uD83C\uDDF5\uD83D\uDEC0";
        assertThat(StringUtil.truncate(testE, 3), is(testE_len0));
        assertThat(StringUtil.truncate(testE, 4), is(testE_len1));
        assertThat(StringUtil.truncate(testE, 7), is(testE_len1));
        assertThat(StringUtil.truncate(testE, 8), is(testE_len2));
        assertThat(StringUtil.truncate(testE, 11), is(testE_len2));
        assertThat(StringUtil.truncate(testE, 12), is(testE_len3));

        //String string length check
        assertEquals(1 + 1 + 1 + 1 + 1, testA.length());
        assertEquals(1 + 1 + 1 + 1 + 1, testB.length());
        assertEquals(2 + 2 + 2 + 2 + 2, testC.length());
        assertEquals(2 + 2 + 2, testE.length());
    }

Recommended Posts

About truncation by the number of bytes of String on Android
About the basics of Android development
About the number of threads of Completable Future
Count the number of occurrences of a string in Ruby
About the handling of Null
[Ruby] Questions and verification about the number of method arguments
About the description of Docker-compose.yml
A note about the seed function of Ruby on Rails
Traps brought about by the default implementation of the Java 8 interface
About the Android life cycle
The story of not knowing the behavior of String by passing Java by reference
Get the acceleration and bearing of the world coordinate system on Android
About the behavior of ruby Hash # ==
[Android] Get the date on Monday
Note on the path of request.getRequestDispatcher
Various methods of the String class
About the role of the initialize method
Think about the 7 rules of Optional
I read the source of String
Pagination sorted by number of likes
Summary about the introduction of Device
About the log level of java.util.logging.Logger
Read the IC balance of your student ID card (Felica) on Android
[Android] Add an arbitrary character string at the beginning of multiple lines
What wasn't fair use in the diversion of Java APIs on Android
Display text on top of the image
About the version of Docker's Node.js image
Understand the basics of Android Audio Record
Try using the service on Android Oreo
Asynchronous processing by RxJava (RxAndroid) on Android
What is testing? ・ About the importance of testing
Samshin on the value of the hidden field
part of the syntax of ruby ​​on rails
The process of understanding Gemfile by non-engineers
[Swift] Termination of the program by assertion
About the operation of next () and nextLine ()
How to determine the number of parallels
About the initial display of Spring Framework
Looking back on the basics of Java
About the error message Invalid redeclaration of'***'
About the treatment of BigDecimal (with reflection)
About the mechanism of the Web and HTTP
[Java] Check the number of occurrences of characters
The contents of the data saved by CarrierWave.
Item 63: Beware the performance of string concatenation
Display View on other apps on Android (Summary of support methods by API version)
String # split (String regex, int limit) Note on the operation specifications of the second argument