Backlinks in Java Regular Expressions

Backreferences in Java regular expressions are a very useful feature supported by the Java engine . In order to understand what backlinks are, you first need to learn to understand what a group is . The group in regular expressions treats multiple characters as a single unit. Groups are created by placing characters in parentheses – “()”. One pair of parentheses is one group. Backlinks are convenient because we can repeat search patterns without copying them directly. We just need to refer to a previously defined group using a construction like \N , where N is the group number. The following 2 examples give you a feel for the convenience of backlinks.

Example 1: Finding a Repeating Pattern

A construction of the form (\d\d\d)\1 matches line 123123, but not line 123456.

String str = "ля123123ля"; Pattern p = Pattern.compile("(\\d\\d\\d)\\1"); Matcher m = p.matcher(str); System.out.println(m.groupCount()); while (m.find()) { String word = m.group(); System.out.println(word + " " + m.start() + " " + m.end()); }

Output: 1 123123 2 8 Translator's note! Here, as a translator, I want to take a little liberty and insert comments from myself, because I’m just learning about regular expressions myself and I hope that they will correct me with swear words :) if what I write below is an error: 1) The groupCount () method Returns the number of groups specified in the pattern, so even if the input string is "la123" 456 la", which does not fit into the template, the number 1 will still be displayed on the screen. 2) The find() method Searches for the next group, but returns only a boolean value: true – found, false – not found 3) group() Method Returns the last one substring found from the pattern. In this case 123123 4) The start() method Returns the position of the found substring in the source string (numbering, of course, starting from zero) 5) The end() method Returns the position in the source string immediately following the found substring. Thus, this value does not point to the last character of the found substring in the source string, but to the next one after it.

Example 2: Finding duplicate words

String pattern = "\\b(\\w+)\\b[\\w\\W]*\\b\\1\\b"; Pattern p = Pattern.compile(pattern, Pattern.CASE_INSENSITIVE); String phrase = "unique is not duplicate but unique, Duplicate is duplicate."; Matcher m = p.matcher(phrase); while (m.find()) { String val = m.group(); System.out.println("Найденная последовательность символов: \"" + val + "\""); System.out.println("Слово-дубликат: " + m.group(1) + "\n"); }

Conclusion:

Найденная последовательность символов: "unique is not duplicate but unique" Слово-дубликат: unique Найденная последовательность символов: "Duplicate is duplicate" Слово-дубликат: Duplicate

Please note that this method of finding duplicate words (using regular expressions) is not optimal. For example, in the example above, the first word “duplicate” is skipped.

Comments

TO VIEW ALL COMMENTS OR TO MAKE A COMMENT,
GO TO FULL VERSION