A regular expression is a kind of pattern that can be applied to text (String, in Java). Java provides the
java.util.regex package for matching regular expressions. Regular expressions are very similar to the Perl programming language and are very easy to learn. A regular expression either matches the text (part of it) or not. * If a regular expression matches a piece of text, then we can find it. ** If the regular expression is compound, then we can easily figure out which part of the regular expression matches which part of the text.
First example
The regular expression "
[a-z] +
" matches all lowercase letters in the text.
[a-z]
means any character from
a
to
z
inclusive, and
+
means "one or more" characters. Let's assume we supply the string "code 2 learn java tutorial". How to do this in Java First, you must create a template:
import java.util.regex.*;
Pattern p = Pattern.compile(“[a-z]+”);
Next you have to create
matcher
for the text by sending a message on the diagram:
Matcher m = p.matcher(“code 2 learn java tutorial”);
NOTE: Neither have constructors
Pattern
,
Matcher
we create them using class methods
Pattern
.
Pattern Class:
The class object constitutes a representation of a regular expression. The Pattern class does not provide any public constructors. To create a template, you must first call one of the public static methods, which then return an object of the class
Pattern
. These methods take a regular expression as an argument.
Matcher Class:
The Finder object is an engine that interprets the pattern and performs matching operations on the input string. Like
Pattern
a class,
Matcher
it has no public constructors. You get an object
Matcher
by calling a method
matcher
on a class object
Pattern
. Once we have completed these steps, and now we have an instance of the class
Matcher m
, we can now check whether the pattern was found or not, and if so, at what position, etc.
m.matches()
returns true if the pattern matches the entire string, false otherwise.
m.lookingAt()
returns true if the pattern matches the beginning of the string, false otherwise.
m.find ()
returns true if the pattern matches any part of the text.
Finding a match
After a successful match,
m.start() will return the index of the first character matched and
m.end() will return the index of the last character matched, plus one. If an unsuccessful attempt was made and no match was found,
m.start()
they
m.end()
will throw
IllegalStateException
- This is
RuntimeException
so you don't have to catch it.
It may seem strange to
m.end()
return the index of the last matched character plus one, but that's exactly what most of the
String
.
- For example,
“Now is the time“.substring(m.start(), m.end())
will return the same string. Let's take a look at the code:
import java.util.regex.*;
public class RegexTest {
public static void main(String args[]) {
String pattern = "[a-z]+";
String text = "code 2 learn java tutorial";
Pattern p = Pattern.compile(pattern);
Matcher m = p.matcher(text);
while(m.find()) {
System.out.print(text.substring(m.start(), m.end()) + "*");
}
}
}
Output: code*learn*java*tutorial*
Additional Methods
If there is a match, then:
m.replaceFirst(replacement)
returns a new string, where the first substring that matches the pattern will be replaced withreplacement
m.replaceAll(replacement)
returns a new string, where each substring that matches the pattern will be replaced
m.find(StartIndex)
find the next match starting at the specified index
m.reset()
resets the template
m.reset(NewText)
resets the finder, and gives it a new text (maybe String
, StringBuffer
or CharBuffer
)
Regular expression syntax
^
Matches the beginning of a line.
$
Matches the end of the string.
.
Matches any single character except newline. Using the m option allows it to match a newline.
[...]
Matches any single character in parentheses.
[^ ...]
Matches any single character not in parentheses.
\A
Start the entire line.
\z
End of entire line.
\Z
The end of the entire line except the final line terminator.
re*
Matches 0 or more occurrences of the preceding expression.
re+
One or more matches of the previous expression.
re?
Matches 0 or 1 to the location of the previous expression.
re{n}
Matches exactly N The number of occurrences of the preceding expression.
re{n,}
Matches N or more occurrences of the preceding expression.
re{n, m}
Matches at least n and at most m occurrences of the previous expression.
a|b
Matches a or b.
(re)
A group of regular expressions and remembering the found text.
(?: re)
Groups of regular expressions that do not remember the found text.
(?> re)
Matches an independent pattern with no returns.
\w
Matches letters and numbers [a-zA-Z_0-9].
\W
These are not letters or numbers.
\s
Matches spaces. Equivalent to [\t\n\r\f].
\S
Not whitespace characters.
\d
Matches the numbers. Equivalent to [0-9].
\D
Doesn't match the numbers.
\G
Matches the point of the last match.
\n
Matches newline.
\b
Matches at a word boundary.
\B
Matches not on a word boundary.
\n, \t, etc.
Newline, carriage return, tab, etc. characters.
\Q
Quote all characters before \E.
\E
The quotation started with \Q ends.
GO TO FULL VERSION