JavaRush /Java Blog /Random EN /Strings in Java (class java.lang.String)
Viacheslav
Level 3

Strings in Java (class java.lang.String)

Published in the Random EN group

Introduction

The path of a programmer is a complex and long process. And in most cases it starts with a program that displays Hello World on the screen. Java is no exception (see Lesson: The "Hello World!" Application ). As we can see, the message is output using System.out.println("Hello World!"); If you look at the Java API, the System.out.println method takes String as an input parameter . This type of data will be discussed.

String as a sequence of characters

Actually, String translated from English is a string. That's right, the String type represents a text string. What is a text string? A text string is some kind of ordered sequence of characters that follow each other. The symbol is char. Sequence – sequence. So yes, absolutely correct, String is an implementation of java.lang.CharSequence. And if you look inside the String class itself, then inside it there is nothing more than an array of chars: private final char value[]; It has java.lang.CharSequencea fairly simple contract:
Strings in Java (class java.lang.String) - 1
We have a method for getting the number of elements, getting a specific element and getting a set of elements + the toString method itself, which will return this) It’s more interesting to understand the methods that came to us in Java 8, and this is: chars()and codePoints() Recall from the Tutorial from Oracle “ Primitive Data” Types " that char is single 16-bit Unicode character. That is, essentially char is just a type half the size of an int (32 bits) that represents numbers from 0 to 65535 (see decimal values ​​in the ASCII Table ). That is, if we wish, we can represent char as int. And Java 8 took advantage of this. Starting with version 8 of Java, we have IntStream - a stream for working with primitive ints. Therefore, in charSequence it is possible to obtain an IntStream representing either chars or codePoints. Before we move on to them, we will see an example to show the convenience of this approach. Let's use Tutorialspoint online java compiler and execute the code:
public static void main(String []args){
        String line = "aaabccdddc";
        System.out.println( line.chars().distinct().count() );
}
Now you can get a number of unique symbols in this simple way.

CodePoints

So, we saw about chars. Now it’s not clear what kind of code points these are. The concept of codePoint appeared because when Java appeared, 16 bits (half an int) were enough to encode a character. Therefore, char in java is represented in UTF-16 format ("Unicode 88" specification). Later, Unicode 2.0 appeared, the concept of which was to represent a character as a surrogate pair (2 characters). This allowed us to expand the range of possible values ​​to an int value. For more details, see stackoverflow: " Comparing a char to a code-point? " UTF-16 is also mentioned in the JavaDoc for Character . There, in the JavaDoc, it is said that: It In this representation, supplementary characters are represented as a pair of char values, the first from the high-surrogates range, (\uD800-\uDBFF), the second from the low-surrogates range (\uDC00-\uDFFF). is quite difficult (and maybe even impossible) to reproduce this in standard alphabets. But the symbols do not end with letters and numbers. In Japan they came up with something so difficult to encode as emoji - the language of ideograms and emoticons. There is an interesting article about this on Wikipedia: “ Emoji ”. Let's find an example of an emoji, for example this: “ Emoji Ghost ”. As we can see, the same codePoint is even indicated there (value = U+1F47B). It is indicated in hexadecimal format. If we convert to a decimal number, we get 128123. This is more than 16 bits allow (i.e. more than 65535). Let's copy it:
Strings in Java (class java.lang.String) - 2
Unfortunately, the JavaRush platform does not support such characters in text. Therefore, in the example below you will need to insert a value into String. Therefore, now we will understand a simple test:
public static void main(String []args){
	    String emojiString = "Вставте сюда эмоджи через ctrl+v";
	    //На один emojiString приходится 2 чара (т.к. не влезает в 16 бит)
	    System.out.println(emojiString.codePoints().count()); //1
	    System.out.println(emojiString.chars().count()); //2
}
As you can see, in this case 1 codePoint goes for 2 chars. This is the magic.

Character

As we saw above, Strings in Java consist of char. A primitive type allows you to store a value, but a wrapper java.lang.Characterover a primitive type allows you to do a lot of useful things with this symbol. For example, we can convert a string to uppercase:
public static void main(String[] args) {
    String line = "организация объединённых наций";
    char[] chars = line.toCharArray();
    for (int i = 0; i < chars.length; i++) {
        if (i == 0 || chars[i - 1] == ' ') {
            chars[i] = Character.toUpperCase(chars[i]);
        }
    }
    System.out.println(new String(chars));
}
Well, various interesting things: isAlphabetic(), isLetter(), isSpaceChar(), isDigit(), isUpperCase(), isMirrored()(for example, brackets. '(' has a mirror image ')').

String Pool

Strings in Java are immutable, that is, constant. This is also indicated in the JavaDoc of the java.lang.String class itself . Second, and also very important, strings can be specified as literals:
String literalString = "Hello, World!";
String literalString = "Hello, World!";
That is, any quoted string, as stated above, is actually an object. And this begs the question - if we use strings so often and they can often be the same (for example, the text “Error” or “Successfully”), is there any way to make sure that the strings are not created every time? By the way, we still have Maps, where the key can be a string. Then we definitely can’t have the same strings be different objects, otherwise we won’t be able to get the object from the Map. Java developers thought, thought and came up with String Pool. This is a place where strings are stored, you can call it a string cache. Not all the lines themselves end up there, but only the lines specified in the code by a literal. You can add a line to the pool yourself, but more on that later. So, in memory we have this cache somewhere. A fair question: where is this pool located? The answer to this can be found on stackoverflow: “ Where does Java's String constant pool live, the heap or the stack? " It is located in Heap memory, in a special runtime constant pool area. The Runtime constant pool is allocated when a class or interface is created by the virtual machine from the method area - a special area in Heap that all threads inside the Java Virtual Machine have access to. What does String pool give us? This has several advantages:
  • Objects of the same type will not be created
  • Comparison by reference is faster than character-by-character comparison via equals
But what if we want to put the created object into this cache? Then, we have a special method: String.intern This method adds a string to the String Pool. It is worth noting that this is not just some kind of cache in the form of an array (as for Integers). The intern method is specified as "native". This means that the method itself is implemented in another language (mostly C++). In the case of basic Java methods, various other optimizations can be applied to them at the JVM level. In general, magic will happen here. It’s interesting to read the following post about interns: https://habr.com/post/79913/#comment_2345814 And it seems like a good idea. But how will this affect us? But it really will have an impact)
public static void main(String[] args) {
    String test = "literal";
    String test2 = new String("literal");
    System.out.println(test == test2);
}
As you can see, the lines are the same, but the result will be false. And all because == compares not by value, but by reference. And this is how it works:
public static void main(String[] args) {
    String test = "literal";
    String test2 = new String("literal").intern();
    System.out.println(test == test2);
}
Just note that we will still make new String. That is, intern will return us a String from the cache, but the original String that we searched for in the cache will be thrown out for cleaning, because no one else knows about him. This is clearly an unnecessary consumption of resources =( Therefore, you should always compare strings using equals in order to avoid sudden and difficult to detect errors as much as possible.
public static void main(String[] args) {
    String test = "literal";
    String test2 = new String("literal").intern();
    System.out.println(test.equals(test2));
}
Equals performs a character-by-character string comparison.

Concatenation

As we remember, lines can be added. And as we remember, our strings are immutable. So how does it work then? That's right, a new line is created, which consists of symbols of the objects being added. There are a million versions of how plus concatenation works. Some people think that there will be a new object every time, others think that there will be something else. But only one person may be right. And that someone is the javac compiler. Let's use the online compiler service and run:
public class HelloWorld {

    public static void main(String[] args) {
        String helloMessage = "Hello, ";
        String target = "World";
        System.out.println(helloMessage + target);
    }

}
Now let's save this as a zip archive, extract it to a directory and execute: javap –c HelloWorld And here we find out everything:
Strings in Java (class java.lang.String) - 3
In a loop, of course, it is better to do the concatenation via StringBuilder yourself. And not because of some kind of magic, but so that the StringBuilder is created before the cycle, and in the cycle itself only append occurs. By the way, there is another interesting thing here. There is an excellent article: “ String Processing in Java. Part I: String, StringBuffer, StringBuilder ." Lots of useful information in the comments. For example, it is specified that when concatenating a view, new StringBuilder().append()...toString()intrinsic optimization is in effect, regulated by the -XX:+OptimizeStringConcat option, which is enabled by default. intrinsic - translated as “internal”. The JVM handles such things in a special way, processing them as Native, only without the additional costs of JNI. Read more: " Intrinsic Methods in HotSpot VM ".

StringBuilder and StringBuffer

As we saw above, StringBuilder is a very useful tool. Strings are immutable, i.e. immutable. And I want to fold it. Therefore, we are given 2 classes to help us: StringBuilder and StringBuffer. The main difference between the two is that StringBuffer was introduced in JDK1.0, while StringBuilder came in java 1.5 as a non-synchronized version of StringBuffer to eliminate the increased overhead of unnecessary method synchronization. Both of these classes are implementations of the abstract class AbstractStringBuilder - A mutable sequence of characters. An array of charms is stored inside, which is expanded according to the rule: value.length * 2 + 2. By default, the size (capacity) of StringBuilder is 16.

Comparable

The strings are comparable, i.e. implement the compareTo method. This is done using character-by-character comparison. Interestingly, the minimum length is selected from two strings and a loop is executed over it. Therefore, compareTo will either return the difference between the int values ​​of the first unmatched characters up to the smallest string length, or return the difference between string lengths if all characters match within the minimum string length. This comparison is called “lexicographical”.

Working with Java Strings

String has many useful methods:
Strings in Java (class java.lang.String) - 4
There are many tasks for working with strings. For example, on Coding Bat . There is also a course on coursera: " Algorithms on Strings ".

Conclusion

Even a short overview of this class takes up an impressive amount of space. And that's not all. I highly recommend watching the report from JPoint 2015: Alexey Shipilev - Catechism java.lang.String
#Viacheslav
Comments
TO VIEW ALL COMMENTS OR TO MAKE A COMMENT,
GO TO FULL VERSION