[JAVA] I read the source of String

I decided to read the JDK source somehow. That said, I don't have time to read each line carefully, so I read it briefly and found this code. Last time I read the source of Long, so next is String.

String class

The String class is a string class. Since we came with Byte, Short, Integer, and Long, it can be said that String is also a wrapper class for char []. Well then, the field.

String.java


    private final char value[];
    private int hash; // Default to 0
    public static final Comparator<String> CASE_INSENSITIVE_ORDER = new CaseInsensitiveComparator();

There is char [] which is the body of the string. Hash is assigned only in the following constructor. Otherwise it remains 0.

String.java


    public String(String original) {
        this.value = original.value;
        this.hash = original.hash;
    }

Calculated when the string length is greater than 0 and hash is 0 when the hashCode method is called. When the hash value calculated by chance is 0, I think that it should not be calculated every time, but that is unlikely.

String.java


    public int hashCode() {
        int h = hash;
        if (h == 0 && value.length > 0) {
            char val[] = value;

            for (int i = 0; i < value.length; i++) {
                h = 31 * h + val[i];
            }
            hash = h;
        }
        return h;
    }

There was a constant called CASE_INSENSITIVE_ORDER. I wondered if it was possible recently, but it was from JDK 1.2.

length method, isEmpty method

The length method returns the length. C-like strlen searches for null characters each time, but java just returns the size of the array. It means that value secures the perfect size. Well, it's an immutable object, so you don't have to afford it.

String.java


    public int length() {
        return value.length;
    }

IsEmpty has been added since JDK 1.6.

String.java


    public boolean isEmpty() {
        return value.length == 0;
    }

The content is the same as length () == 0, but it is a very java-like method that can be judged by boolean type without comparing the numerical value.

charAt method, toCharArray method

charAt is a method to get one character at a time, and toCharArray is a method to get an array collectively.

String.java


    public char charAt(int index) {
        if ((index < 0) || (index >= value.length)) {
            throw new StringIndexOutOfBoundsException(index);
        }
        return value[index];
    }

    public char[] toCharArray() {
        // Cannot use Arrays.copyOf because of class initialization order issues
        char result[] = new char[value.length];
        System.arraycopy(value, 0, result, 0, value.length);
        return result;
    }

charAt only extracts the elements by subscripting the array, but toCharArray secures another array of the same size and copies it. Unless there is a special reason, it is better to access with charAt by turning to length () with a for statement.

By the way, I saw in another person's source that you can split it into a single character string with str.split ("") ;, but stop it like this. It seems that String instances can be created only for the types of characters contained in the string.

intern method, equals method

There is an intern method. To be honest, I'd like you to stop doing this. The implementation is a native method. It feels like I'm doing the reference value internally.

String.java


    public native String intern();

How to use intern ...

Main.java


    public static void main(String[] args) {
        String s01 = "abc";
        String s02 = "abcdef".substring(0, 3);
        System.out.println(s01 == s02);
        System.out.println(s01.equals(s02));
        String s03 = s02.intern();
        System.out.println(s01 == s03);
        System.out.println(s01.equals(s03));
    }

When you run ...

false
true
true
true

When I substring, the reference value is different, but when I intern, the reference value is the same.

Here is the equal method,

String.java


    public boolean equals(Object anObject) {
        if (this == anObject) {
            return true;
        }
        if (anObject instanceof String) {
            String anotherString = (String)anObject;
            int n = value.length;
            if (n == anotherString.value.length) {
                char v1[] = value;
                char v2[] = anotherString.value;
                int i = 0;
                while (n-- != 0) {
                    if (v1[i] != v2[i])
                        return false;
                    i++;
                }
                return true;
            }
        }
        return false;
    }

Pay attention to this == anObject at the beginning. Rather than always writing the comparison operator == on the premise that everything is intern, if you write it in equals for the time being, it will be faster if it is intern and the same instance.

With equals, the loop is terminated when the length of the string is different or when the string is compared from the beginning and the loop is broken, but when the last character is different or the string is the same, processing is performed. So it is the fastest to compare by reference value, but somehow it seems to be effective to return false if it does not match by comparing with hashCode after length comparison. If hashCode is not calculated, it will loop up to 2 times.

indexOf method

indexOf has been around for a long time, but if you wonder why the argument is an int ch ...

String.java


    public int indexOf(int ch, int fromIndex) {
        final int max = value.length;
        if (fromIndex < 0) {
            fromIndex = 0;
        } else if (fromIndex >= max) {
            // Note: fromIndex might be near -1>>>1.
            return -1;
        }

        if (ch < Character.MIN_SUPPLEMENTARY_CODE_POINT) {
            // handle most cases here (ch is a BMP code point or a
            // negative value (invalid code point))
            final char[] value = this.value;
            for (int i = fromIndex; i < max; i++) {
                if (value[i] == ch) {
                    return i;
                }
            }
            return -1;
        } else {
            return indexOfSupplementary(ch, fromIndex);
        }
    }

    private int indexOfSupplementary(int ch, int fromIndex) {
        if (Character.isValidCodePoint(ch)) {
            final char[] value = this.value;
            final char hi = Character.highSurrogate(ch);
            final char lo = Character.lowSurrogate(ch);
            final int max = value.length - 1;
            for (int i = fromIndex; i < max; i++) {
                if (value[i] == hi && value[i + 1] == lo) {
                    return i;
                }
            }
        }
        return -1;
    }

Oh, did it support surrogate pairs? The indexOfSupplementary searches for each of the two chars, but the first half is U + D800 to U + DBFF, and the second half is U + DC00 to U + DFFF.

Just search lastIndexOf from the back, and there is a method for surrogate pairs called lastIndexOfSupplementary. It seems that JDK 1.5 is compatible with surrogate pairs. There was no MIN_SUPPLEMENTARY_CODE_POINT in the JDK 1.4 source. Since JDK 1.4, the argument of indexOf is int ch. That kind of thing is amazing.

toString method

I don't know if there are people in the world who write "abc" .toString () ...

String.java


    public String toString() {
        return this;
    }

Return yourself.

Finally

I'm not sure if the String class is a runtime library or part of a language specification. For example, the default constructor is ...

String.java


    public String() {
        this.value = "".value;
    }

Recommended Posts

I read the source of String
I read the source of ArrayList I read
I read the source of Integer
I read the source of Long
I read the source of Short
I read the source of Byte
I read the Kotlin startbook
Is drainTo of LinkedBlockingQueue safe? I followed the source
Various methods of the String class
I tried to summarize the methods of Java String and StringBuilder
I investigated the internal processing of Retrofit
Read the Perelman treatise of Poincare conjecture
Item 63: Beware the performance of string concatenation
I will absolutely convert the time string!
The story of low-level string comparison in Java
I want to output the day of the week
I checked the place of concern of java.net.URL # getPath
I understood the very basics of character input
[Java] Get the length of the surrogate pair string
[Java] The confusing part of String and StringBuilder
I compared the characteristics of Java and .NET
I want to var_dump the contents of the intent
I tried using the profiler of IntelliJ IDEA
I checked the number of taxis with Ruby
Try the free version of Progate [Java I]
Character string comparison: I was caught in the skill check problem of Paiza
[Java] How to get the URL of the transition source
Count the number of occurrences of a string in Ruby
I examined the life cycle of the extension of JUnit Jupiter
The world of clara-rules (2)
I tried using the Server Push function of Servlet 4.0
I read the readable code, so make a note
I was addicted to the record of the associated model
I tried to summarize the state transition of docker
I tried to reduce the capacity of Spring Boot
I tried the new feature profiler of IntelliJ IDEA 2019.2.
The world of clara-rules (4)
Image processing: The basic structure of the image read by the program
I want to display the name of the poster of the comment
I summarized the display format of the JSON response of Rails
The world of clara-rules (1)
Read Java HashMap source
I read the "Object-Oriented Practical Guide", so a memorandum
Source of cellular objects
Read the Rails Guide (Overview of Action Controller) again
[Java] When writing the source ... A memorandum of understanding ①
I wrote a sequence diagram of the j.u.c.Flow sample
The world of clara-rules (5)
I summarized the types and basics of Java exceptions
The idea of quicksort
I am keenly aware of the convenience of graphql-code-generator, part 2
I can't get out of the Rails dbconsole screen
I learned about the existence of a gemspec file
I want to be aware of the contents of variables!
I want to return the scroll position of UITableView!
The idea of jQuery
I made the server side of an online card game ①
About truncation by the number of bytes of String on Android
I tried to summarize the basics of kotlin and java
Now, I understand the coordinate transformation method of UIView (Swift)
Put the file in the properties of string in spring xml configuration