JavaRush /Java Blog /Random EN /charAt() Method in Java

charAt() Method in Java

Published in the Random EN group
There are many basic techniques that we use regularly without even thinking about it. Well, what if you think about it and look at how some seemingly simple methods are implemented? I think this will help us get one step closer to Java) charAt() in Java - 1Let's imagine a situation in which we need to extract a certain character in some string. How can we do this in Java? For example, by calling the Java String charAt. charAt()We will talk about the method in today's article.

Syntax

char charAt(int index)returns the char value at the specified index. The index ranges from 0 to length()-1. That is, the first charvalue of the sequence is in index 0, the next is in , index 1etc., as is the case with array indexing.

Example

public static void main(String[] args) {
   System.out.print("JavaRush".charAt(0));
   System.out.print("JavaRush".charAt(1));
   System.out.print("JavaRush".charAt(2));
   System.out.print("JavaRush".charAt(3));
}
The first line takes the first character, the second line takes the second, and so on. Since not println, but is used here print, without a new line, we will get the following output to the console:

Java
If charthe given index is represented as Unicode, the result of the method java charAt()will be the character that represents this Unicode:
System.out.println("J\u0061vaRush".charAt(1));
Console output:

a

What's "under the hood"

How does it work, you ask? charAt() in Java - 2The fact is that each object Stringcontains an array bytewith bytes of the elements of a given string:
private final byte[] value;
And here is the method itself chatAt:
public char charAt(int index) {
   if (isLatin1()) {
       return StringLatin1.charAt(value, index);
   } else {
       return StringUTF16.charAt(value, index);
   }
}
isLatin1- a flag indicating whether our string contains only Latin characters or not. This determines which method will be called next.

isLatin1 = true

If the string contains only Latin characters, a static charAtclass method is called StringLatin1:
public static char charAt(byte[] value, int index) {
   if (index < 0 || index >= value.length) {
       throw new StringIndexOutOfBoundsException(index);
   }
   return (char)(value[index] & 0xff);
}
The first step is to check that the incoming index is greater than or equal to 0, and that it does not go beyond the internal byte array, and if this is not the case, then an exception is thrown new StringIndexOutOfBoundsException(index). If the checks are passed, then the element we need is taken. At the end we see:
  • &extends for binary operation to bytebitwise
  • 0xffdoes nothing but &requires an argument
  • (char)converts data from an ASCII table tochar

isLatin1 = false

If we had not only Latin characters, the class will be used StringUTF16and its static method will be called:
public static char charAt(byte[] value, int index) {
   checkIndex(index, value);
   return getChar(value, index);
}
Which in turn calls:
public static void checkIndex(int off, byte[] val) {
   String.checkIndex(off, length(val));
}
And he delegates to a static method String:
static void checkIndex(int index, int length) {
   if (index < 0 || index >= length) {
       throw new StringIndexOutOfBoundsException("index " + index +
                                                 ", length " + length);
   }
}
Here, in fact, a check is made to see if the index is valid: again, whether it is positive or zero, and whether it did not go beyond the limits of the array. But in a class StringUTF16in a method, charAtcalling the second method will be more interesting:
static char getChar(byte[] val, int index) {
   assert index >= 0 && index < length(val) : "Trusted caller missed bounds check";
   index <<= 1;
   return (char)(((val[index++] & 0xff) << HI_BYTE_SHIFT) |
                 ((val[index]   & 0xff) << LO_BYTE_SHIFT));
}
Let's begin to analyze what is actually happening here. The first step at the beginning of the method is another check for the validity of the index. To understand what happens next, you need to understand: when a non-Latin character enters the array value, it is represented by two bytes (two array cells). If we have a string of two Cyrillic characters - “av”, then:
  • for 'a' this is a pair of bytes - 48 and 4;
  • for 'in' - 50 and 4.
That is, if we create the string “av”, it will have an array value- {48, 4, 50, 4} Actually, this method works with two array cells value. Therefore, the next step is a shift index <<= 1;to get directly to the index of the first byte of the desired character in the array value. Now let's say we have a string "абвг". Then the value array will look like this: {48, 4, 49, 4, 50, 4, 51, 4}. We ask for the third element of the string, and then the binary representation is 00000000 00000011. When shifted by 1, we get 00000000 00000110, that is index = 6. To refresh your knowledge on bitwise operations, you can read this article . charAt() in Java - 4We also see some variables: HI_BYTE_SHIFT in this case it is 0. LO_BYTE_SHIFTin this case it is 8. In the last line of this method:
  1. An element is taken from the value array and bitwise shifted by HI_BYTE_SHIFT, that is, 0, while increasing index +1.

    In the example with the string "абвг", the sixth byte - 51 - would remain so, but at the same time the index increases to 7.

  2. After this, the next element of the array is taken and shifted bitwise in the same way, but by LO_BYTE_SHIFT, that is, by 8 bits.

    And if we had byte 4, which has a binary representation - 00000000 00000100, then after shifting by 8 bits we will have 00000100 00000000. If it is an integer - 1024.

  3. Next, for these two values, follows the operation | (OR).

    And if we had bytes 51 and 1024, which in binary representation looked like 00000000 00110011 and 00000100 00000000, then after the operation ORwe will get 00000100 00110011, which means the number 1075 in the decimal system.

    Well, in the end, the number 1075 is converted to the char type, and when converting int -> char, the ASCII table is used, and in it, under the number 1075, there is the character 'g'.

Actually, this is how we get 'g' as the result of the method charAt()in Java programming.
Comments
TO VIEW ALL COMMENTS OR TO MAKE A COMMENT,
GO TO FULL VERSION