Regular expressions in Java
Pattern/Matcher/PatternSyntaxException/Regex-Syntax
The regular expression is the sequence of characters that make up a search pattern. This is used while searching for a data in a text, to describe what is being looked for.
Regular expressions can be used to perform all types of text search and text replacement operations, also it is widely used to set restrictions on strings such as password and email authentication. The regex can be a single character or a more complicated patter.
When using a regular expression, its result matching with the text can be either: 1. True/False, indicating whether the regular expression matched the text or not. 2. Set of matches. One match for each instance of the regular expression found in the text.
Java does not have a built-in Regular Expression class, but we can use the regex package, by importing java.util.regex package to work with regular expressions.
The package includes the following classes:
∙ Pattern Class – Defines a pattern (which is to be used in search)
∙ Matcher Class – Used to search for the pattern
After learning about Java Regex, you will be able to check your regex expressions with the Java Regex Tester Tool.
∙ Pattern Class
Here you can see an example, which checks whether a word is contained in the text or not.
public class Example1 {
public static void main(String[] args) {
String text = “Java was called Oak at the beginning.”;
String regex = “.*called.*”;
boolean match = Pattern.matches(regex, text);
System.out.println(“Match ” + (match ? “found” : “not found”));
}
}
The result will be:
Match found
Here the text is the String which is to be checked with the regular expression. And the regex that we have as a String, starting with (.*), it matches all texts which contain one or more characters, and after that having text “called” and following by one or more characters, which is defined by (.*) characters. The matches() method is a static method of Pattern class, which returns true when the expression matches the text, otherwise it will return false.
✔ In the Pattern class we have split() and pattern() methods. Let’s refer to each of them.
The split() method is used to divide the given target according to the given pattern. There are two flavors to the split method. The first takes the target string as an argument and has a zero limit, while the second takes two inputs, target and the limit.
The limit parameter controls how many times to apply the pattern, and consequently affects the length of the resulting array. And the method checks the regular expression which is given with the input sequence and returns the string array that contains the substrings.
So the split() method is overloaded in two ways.
1. String[] split(CharSequence input)
2. String[] split(CharSequence input, int limit)
The array of strings computed by splitting the input around matches of this pattern.
public class PatternSplitExample1 {
public static void main(String[] args) {
Pattern pattern = Pattern.compile(“\\s”);
String[] strArray = pattern.split(“Java was created at Sun Microsystems”); for (String str : strArray) {
System.out.println(” ” + str);
}
}
}
Here in the example you can see that the Pattern is put “/s” which is for a whitespace character. The result for the code above will be:
Java
was
created
at
Sun
Microsystems
Another example where it is used the input and the limit as method parameters:
public class PatternSplitExample2 {
public static void main(String[] args) {
Pattern pattern = Pattern.compile(“:”);
String[] strArray =
pattern.split(“Today:the:Java:compiler:is:written:in:Java:while:the:JRE:is:written:in:C.”, 8); for (String str : strArray) {
System.out.println(” ” + str);
}
}
}
The Pattern used here is “:” by which it is split, and put the limit 8.
The output will be:
Today
the
Java
compiler
is
written
in
Java:while:the:JRE:is:written:in:C.
The pattern() method of the Pattern class fetches and returns the regular expression in a string format, by which the current pattern was compiled.
public class PatternExample1 {
public static void main(String[] args) {
String input = “My Id is 061U2374”;
String regex = “(.*)?(\\d+)”;
Pattern pattern = Pattern.compile(regex);
if (pattern.matcher(input).matches()) {
System.out.println(“Found a match!”);
} else {
System.out.println(“Have not found a match!”);
}
String regularExpression = pattern.pattern();
System.out.println(“Regular expression: ” + regularExpression);
}
}
The output is:
Found a match!
Regular expression: (.*)?(\d+)
∙ Matcher Class
Here you can see another example, with another way of using regex and also here is calculated the all occurrences of the searching substring in the text, and outputted result with indexes.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Example2 {
public static void main(String[] args) {
Pattern pattern = Pattern.compile(“is”, Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(“Java is used as the server-side language for most back-end development projects.” +
“Java is also commonly used for desktop computing, other mobile computing, games, and numerical computing.”);
int count = 0;
while (matcher.find()) {
count++;
System.out.println(“found: ” + count + ” : “
+ matcher.start() + ” – ” + matcher.end());
}
}
}
The result of this code will be:
found: 1 : 5 – 7
found: 2 : 85 – 87
Here the pattern is created with the use of compile() method, which is a static method of the Pattern class.
With the first parameter is indicated the pattern that is being searched for and with the second, to show whether the search should be case-sensitive or not, which is optional parameter.
With the matcher() method is being searched for the pattern in a string. It returns a Matcher object, which contains information about the search that was performed. And the find() method returns true if the pattern was found in the string and false otherwise.
✔ In Matcher class we have matcher(), lookingAt(), reset(), group() methods and will speak about each of them separately.
The matches() method is used to match the input sequence against the whole text. It is matching the pattern from beginning to the end.
public class MatcherMatchesExample1 {
public static void main(String[] args) {
Pattern pattern = Pattern.compile(“.*sun.*”, Pattern.CASE_INSENSITIVE); Matcher matcher = pattern.matcher(“Java was created at Sun Microsystems”); boolean matches = matcher.matches();
System.out.println(“matches = ” + matches);
}
}
The result will be:
matches = true
The lookingAt() method is used for matching the input sequence against the beginning of the text. This method is like the matches() method, but has a difference as the matches() match the regular expression against the whole text.
public class MatcherLookingAtExample1 {
public static void main(String[] args) {
Pattern pattern = Pattern.compile(“In one”, Pattern.CASE_INSENSITIVE); Matcher matcher = pattern.matcher(“in one year Java gets downloaded one billion times.”); System.out.println(“Looking at = ” + matcher.lookingAt());
}
}
The result will be:
Looking at = true
So this method returns true if a prefix of the input sequence matches the matcher’s pattern, otherwise it will return false.
The reset() method is used to reset the matcher. It removes all the state information and resets the character sequence to default, and append position to zero.
public class MatcherResetExample1 {
public static void main(String[] args) {
String regex = “Java”;
Pattern pattern = Pattern.compile(regex);
String stringToBeMatched = “JavaJavaJavaJava”;
Matcher matcher = pattern.matcher(stringToBeMatched);
matcher = matcher.reset();
System.out.println(matcher.toMatchResult());
}
}
Output:
java.util.regex.Matcher[pattern=Java region=0, 16 lastmatch=]
The group() method returns the matched sequence captured by the previous match as the string. It is used to get the input subsequence matched by the previous match result. It does not take any parameter and returns the String which is the input subsequence that matched. The method will throw IllegalStateException if no match has yet been attempted, or if the previous match operation failed. public class MatcherGroupExample1 {
public static void main(String[] args) {
Pattern pattern = Pattern.compile(“a(bb)”);
Matcher matcher = pattern.matcher(“aabbabbabbaaa”);
while (matcher.find())
System.out.println(“Start: ” + matcher.start() + “,” +
” End: ” + matcher.end() + “, Group ” + matcher.group());
}
}
Output:
Start: 1, End: 4, Group abb
Start: 4, End: 7, Group abb
Start: 7, End: 10, Group abb
In the following example you can see how to extract two groups from a given String. public class MatcherGroupExample2 {
public static void main(String[] args) {
String stringToSearch = “Java was created at Sun Microsystems…”;
Pattern pattern = Pattern.compile(” (\\S+re\\S+) .* (\\S+ros\\S+).*”);
Matcher matcher = pattern.matcher(stringToSearch);
if (matcher.find()) {
String group1 = matcher.group(1);
String group2 = matcher.group(2);
System.out.format(“‘%s’, ‘%s’\n“, group1, group2);
}
}
}
In the Pattern we specify that want to search for two groups in the String. And if the pattern matches the string, then we can try to extract our groups. And get and print the groups we have been looking for. So the output will be:
‘created’, ‘Microsystems…’
⮚ PatternSyntaxException
This is an unchecked exception that indicates a syntax error in a regular exception pattern. The PatternSyntaxException class provides several methods to help you determine what went wrong. The methods are:
1. public String getDescription() – Retrieves the description of the error.
2. public int getIndex() – Retrieves the error index.
3. public String getPattern() – Retrieves the erroneous regular expression pattern. 4. public String getMessage() – Returns a multi-line string containing the description of the syntax error and its index, the erroneous regular expression pattern, and a visual indication of the error index within the pattern.
⮚ The syntax of Regular Expressions
The main aspect of regular expressions is their syntax. Mainly the most programming languages support the regular expressions and Java is not an exception. And the syntax in each language may vary and not be exactly the same. So here we should learn the syntax that is used in Java language.
▪ Metacharacters
In Java are used some metacharacters and each of them has its special meaning. The metacharacters are:
< |
|
– |
> |
|
= |
( |
|
$ |
) |
|
! |
[ |
|
| |
] |
|
? |
{ |
|
* |
} |
|
+ |
\ |
|
. |
^ |
|
|
Now we can get familiar with those metacharacters which are used mostly, and after studying this, you can write the regular expressions you need and then check them here https://www.regextester.com/.
| – to find a match for any one of the patterns separated by | as in: table|chair|window . – to find just one instance of any character
^ – to find a match as the beginning of a string as in: ^Hello
$ – to find a match at the end of the string as in: World$
\d – to find a digit
\s – to find a whitespace character
\b – to find a match at the beginning of a word like: \bWORD, or at the end of the word like: WORD\b \uxxxx – to find the Unicode character specified by the hexadecimal number xxxx
Besides the Metacharacters it is also needed to know the Quantifiers.
▪ Quantifiers
The Quantifiers allow us to specify the number of occurrences to match against. Here is a table of Quantifiers with their description, which you can use when writing a regular expression.
|
Greedy Reluctant Possessive |
Meaning
X? X, once or not at all
X?? |
X?+ |
X* X, zero or more times
X*? |
X*+ |
X+ X, one or more times
X+? |
X++ |
X{n} X, exactly n times
X{n}? |
X{n}+ |
X{n,} X, at least n times
X{n,}? |
X{n,}+ |
X{n,m} X{n,m}? |
X{n,m}+ |
X, at least n but not more than m times
So now you can write your own regular expressions with different ways and testing them and using in the projects.
Leave a Reply