Regular Expressions in Java

A regular expression is a kind of pattern that can be applied to text (String, in Java). Java provides the java.util.regex package for matching regular expressions. Regular expressions are very similar to the Perl programming language and are very easy to learn. A regular expression either matches the text (part of it) or not. * If a regular expression matches a piece of text, then we can find it. ** If the regular expression is compound, then we can easily figure out which part of the regular expression matches which part of the text.

First example

The regular expression " [a-z] +" matches all lowercase letters in the text. [a-z]means any character from ato zinclusive, and +means "one or more" characters. Let's assume we supply the string "code 2 learn java tutorial". How to do this in Java First, you must create a template:

import java.util.regex.*;
Pattern p = Pattern.compile(“[a-z]+”);

Next you have to create matcherfor the text by sending a message on the diagram:

Matcher m = p.matcher(“code 2 learn java tutorial”);

NOTE: Neither have constructors Pattern, Matcherwe create them using class methods Pattern. Pattern Class:The class object constitutes a representation of a regular expression. The Pattern class does not provide any public constructors. To create a template, you must first call one of the public static methods, which then return an object of the class Pattern. These methods take a regular expression as an argument. Matcher Class:The Finder object is an engine that interprets the pattern and performs matching operations on the input string. Like Patterna class, Matcherit has no public constructors. You get an object Matcherby calling a method matcheron a class object Pattern. Once we have completed these steps, and now we have an instance of the class Matcher m , we can now check whether the pattern was found or not, and if so, at what position, etc. m.matches()returns true if the pattern matches the entire string, false otherwise. m.lookingAt()returns true if the pattern matches the beginning of the string, false otherwise. m.find ()returns true if the pattern matches any part of the text.

What else to read:

Java Developer Group:

Regular Expressions in Java

Finding a match

After a successful match, m.start() will return the index of the first character matched and m.end() will return the index of the last character matched, plus one. If an unsuccessful attempt was made and no match was found, m.start()they m.end()will throwIllegalStateException

This is RuntimeExceptionso you don't have to catch it.

It may seem strange to m.end()return the index of the last matched character plus one, but that's exactly what most of the String.

For example,“Now is the time“.substring(m.start(), m.end())

will return the same string. Let's take a look at the code:

import java.util.regex.*;

public class RegexTest {
    public static void main(String args[]) {
        String pattern = "[a-z]+";
        String text = "code 2 learn java tutorial";
        Pattern p = Pattern.compile(pattern);
        Matcher m = p.matcher(text);
        while(m.find()) {
            System.out.print(text.substring(m.start(), m.end()) + "*");
        }
    }
}

Output: code*learn*java*tutorial*

Additional Methods

If there is a match, then:

m.replaceFirst(replacement)returns a new string, where the first substring that matches the pattern will be replaced withreplacement
m.replaceAll(replacement)returns a new string, where each substring that matches the pattern will be replaced
m.find(StartIndex)find the next match starting at the specified index
m.reset()resets the template
m.reset(NewText)resets the finder, and gives it a new text (maybe String, StringBufferor CharBuffer)

Regular expression syntax

^Matches the beginning of a line.
$Matches the end of the string.
.Matches any single character except newline. Using the m option allows it to match a newline.
[...]Matches any single character in parentheses.
[^ ...]Matches any single character not in parentheses.
\AStart the entire line.
\zEnd of entire line.
\ZThe end of the entire line except the final line terminator.
re*Matches 0 or more occurrences of the preceding expression.
re+One or more matches of the previous expression.
re?Matches 0 or 1 to the location of the previous expression.
re{n}Matches exactly N The number of occurrences of the preceding expression.
re{n,}Matches N or more occurrences of the preceding expression.
re{n, m}Matches at least n and at most m occurrences of the previous expression.
a|bMatches a or b.
(re)A group of regular expressions and remembering the found text.
(?: re)Groups of regular expressions that do not remember the found text.
(?> re)Matches an independent pattern with no returns.
\wMatches letters and numbers [a-zA-Z_0-9].
\WThese are not letters or numbers.
\sMatches spaces. Equivalent to [\t\n\r\f].
\SNot whitespace characters.
\dMatches the numbers. Equivalent to [0-9].
\DDoesn't match the numbers.
\GMatches the point of the last match.
\nMatches newline.
\bMatches at a word boundary.
\BMatches not on a word boundary.
\n, \t, etc.Newline, carriage return, tab, etc. characters.
\QQuote all characters before \E.
\EThe quotation started with \Q ends.

Comments

TO VIEW ALL COMMENTS OR TO MAKE A COMMENT,
GO TO FULL VERSION