JavaRush /Java Blog /Random EN /Regular expressions in Java, part 4

Regular expressions in Java, part 4

Published in the Random EN group
We present to your attention a translation of a short guide to regular expressions in Java, written by Jeff Friesen for the javaworld website . For ease of reading, we have divided the article into several parts. Regular Expressions in Java, Part 4 - 1 Regular Expressions in Java, Part 1 Regular Expressions in Java, Part 2 Regular Expressions in Java, Part 3

Methods for working with captured groups

The application source code RegexDemoincludes a method call m.group(). The method group()is one of several methods of the class Matcheraimed at working with captured groups:
  • The method int groupCount()returns the number of captured groups in the resolver pattern. This number does not take into account the special capture group number 0, which corresponds to the pattern as a whole.

  • The method String group()returns the characters of the previous match found. To report a successful search for an empty string, this method returns an empty string. If the resolver has not yet performed a lookup or a previous lookup operation failed, an exception is thrown IllegalStateException.

  • The method String group(int group)is similar to the previous method, except that it returns the characters of the previous match, captured by the group number specified by the parameter group. Note that this group(0)is equivalent to group(). If the template does not have a captured group with the given number, the method throws an exception IndexOutOfBoundsException. If the resolver has not yet performed a lookup or a previous lookup operation failed, an exception is thrown IllegalStateException.

  • The method String group(String name)returns the characters of the previous match found, captured by the name group. If the captured group name is not in the template, an exception is thrown IllegalArgumentException. If the resolver has not yet performed a lookup or a previous lookup operation failed, an exception is thrown IllegalStateException.

The following example demonstrates the use of the groupCount()and methods group(int group):
Pattern p = Pattern.compile("(.(.(.)))");
Matcher m = p.matcher("abc");
m.find();
System.out.println(m.groupCount());
for (int i = 0; i <= m.groupCount(); i++)
System.out.println(i + ": " + m.group(i));
Execution results:
3
0: abc
1: abc
2: bc
3: c
Regular Expressions in Java, Part 4 - 2

Methods for determining match positions

The class Matcherprovides several methods that return the starting and ending positions of a match:
  • The method int start()returns the starting position of the previous match found. If the resolver has not yet performed a lookup or a previous lookup operation failed, an exception is thrown IllegalStateException.

  • The method int start(int group)is similar to the previous method, but returns the starting position of the previous match found for the group whose number is specified by the parameter group. If the template does not have a captured group with the given number, the method throws an exception IndexOutOfBoundsException. If the resolver has not yet performed a lookup or a previous lookup operation failed, an exception is thrown IllegalStateException.

  • The method int start(String name)is similar to the previous method, but returns the starting position of the previous match found for the group called name. If the captured group nameis not in the template, an exception is thrown IllegalArgumentException. If the resolver has not yet performed a lookup or a previous lookup operation failed, an exception is thrown IllegalStateException.

  • The method int end()returns the position of the last character of the previous match found plus 1. If the matcher has not yet performed a match or the previous search operation failed, an exception is thrown IllegalStateException.

  • The method int end(int group)is similar to the previous method, but returns the ending position of the previous match found for the group whose number is specified by the parameter group. If the template does not have a captured group with the given number, the method throws an exception IndexOutOfBoundsException. If the resolver has not yet performed a lookup or a previous lookup operation failed, an exception is thrown IllegalStateException.

  • The method int end(String name)is similar to the previous method, but returns the ending position of the previous match found for the group called name. If the captured group nameis not in the template, an exception is thrown IllegalArgumentException. If the resolver has not yet performed a lookup or a previous lookup operation failed, an exception is thrown IllegalStateException.

The following example demonstrates two match location methods that output the start/end match positions for capture group number 2:
Pattern p = Pattern.compile("(.(.(.)))");
Matcher m = p.matcher("abcabcabc");
while (m.find())
{
   System.out.println("Найдено " + m.group(2));
   System.out.println("  начинается с позиции " + m.start(2) +
                      " и заканчивается на позиции " + (m.end(2) - 1));
   System.out.println();
}
The output of this example is the following:
Найдено bc
начинается с позиции 1 и заканчивается на позиции 2
Найдено bc
начинается с позиции 4 и заканчивается на позиции 5
Найдено bc
начинается с позиции 7 и заканчивается на позиции 8

Methods of the PatternSyntaxException class

An instance of the class PatternSyntaxExceptiondescribes a syntax error in the regular expression. Throws such an exception from the methods compile()and matches()class Pattern, and is formed through the following constructor: PatternSyntaxException(String desc, String regex, int index) This constructor stores the specified description ( desc), regular expression ( regex), and the position at which the syntax error occurred. If the location of the syntax error is unknown, the value indexis set to -1. Most likely, you will never need to create instances of the PatternSyntaxException. However, you will need to extract the above values ​​when creating a formatted error message. To do this, you can use the following methods:
  • The method String getDescription()returns a description of the syntax error.
  • The method int getIndex()returns either the position at which the error occurred, or -1 if the position is unknown.
  • The method String getPattern()returns an invalid regular expression.
Additionally, the inherited method String getMessage()returns a multiline string with the values ​​returned from previous methods along with a visual indication of where the syntax error occurred in the template. What is a syntax error? Here's an example: java RegexDemo (?itree Treehouse In this case, we forgot to specify the closing parenthesis metacharacter ( )) in the nested flag expression. This is what is output from this error:
regex = (?itree
input = Treehouse
Неправильное регулярное выражение: Unknown inline modifier near index 3
(?itree
   ^
Описание: Unknown inline modifier
Позиция: 3
Неправильный шаблон: (?itree

Build Useful Regular Expression Applications Using the Regex API

Regular expressions enable you to create powerful text processing applications. In this section, we'll show you two handy applications that will hopefully encourage you to further explore the Regex API classes and methods. The second appendix introduces Lexan: a reusable code library for performing lexical analysis. Regular Expressions in Java, Part 4 - 3

Regular expressions and documentation

Documentation is one of the mandatory tasks when developing professional software. Fortunately, regular expressions can help you with many aspects of documentation creation. The code in Listing 1 extracts lines containing single-line and multiline C-style comments from a source file and writes them to another file. For the code to work, the comments must be on the same line. Listing 1. Retrieving comments
import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;

import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.util.regex.PatternSyntaxException;

public class ExtCmnt
{
   public static void main(String[] args)
   {
      if (args.length != 2)
      {
         System.err.println("Способ применения: java ExtCmnt infile outfile");
         return;
      }

      Pattern p;
      try
      {
         // Следующий шаблон определяет многострочные комментарии,
         // располагающиеся в одной строке (например, /* одна строка */)
            // и однострочные комментарии (например, // Howая-то строка).
            // Комментарий может располагаться в любом месте строки.

         p = Pattern.compile(".*/\\*.*\\*/|.*//.*$");
      }
      catch (PatternSyntaxException pse)
      {
         System.err.printf("Синтаксическая ошибка в регулярном выражении: %s%n", pse.getMessage());
         System.err.printf("Описание ошибки: %s%n", pse.getDescription());
         System.err.printf("Позиция ошибки: %s%n", pse.getIndex());
         System.err.printf("Ошибочный шаблон: %s%n", pse.getPattern());
         return;
      }

      try (FileReader fr = new FileReader(args[0]);
           BufferedReader br = new BufferedReader(fr);
           FileWriter fw = new FileWriter(args[1]);
           BufferedWriter bw = new BufferedWriter(fw))
      {
         Matcher m = p.matcher("");
         String line;
         while ((line = br.readLine()) != null)
         {
            m.reset(line);
            if (m.matches()) /* Должна соответствовать вся строка */
            {
               bw.write(line);
               bw.newLine();
            }
         }
      }
      catch (IOException ioe)
      {
         System.err.println(ioe.getMessage());
         return;
      }
   }
}
The method main()in Listing 1 first checks for correct command-line syntax and then compiles a regular expression designed to detect single- and multi-line comments into a class object Pattern. If no exception is raised PatternSyntaxException, the method main()opens the source file, creates the target file, obtains a matcher to match each line read against the pattern, and then reads the source file line by line. For each line, it is matched with a comment pattern. If successful, the method main()writes the string (followed by a newline) to the target file (we'll cover file I/O logic in a future Java 101 tutorial). Compile Listing 1 as follows: javac ExtCmnt.java Run the application with file ExtCmnt.javaas input: java ExtCmnt ExtCmnt.java out You should get the following results in file out:
// Следующий шаблон определяет многострочные комментарии,
 // располагающиеся в одной строке (например, /* одна строка */)
    // и однострочные комментарии (например, // Howая-то строка).
    // Комментарий может располагаться в любом месте строки.
p = Pattern.compile(".*/\\*.*\\*/|.*//.*$");
    if (m.matches()) /* Должна соответствовать вся строка */
In the pattern string .*/\\*.*\\*/|.*//.*$, the pipe metacharacter |acts as a logical OR operator, indicating that the matcher should use the left operand of the given regular expression construct to find a match in the matcher text. If there are no matches, the matcher uses the right operand from the given regular expression construct for another search attempt (the parenthesis metacharacters in the captured group also form a logical operator). Regular Expressions in Java, Part 5
Comments
TO VIEW ALL COMMENTS OR TO MAKE A COMMENT,
GO TO FULL VERSION