Regular expressions allow you to formulate patterns to search for. Here's an example: It is easy to search for the string "Sep" in a file, you do it with
grep "Sep" file
This gives you all lines containing the string "Sep". But what do you do if you only want lines starting with "Sep", for example, to read all lines in your syslog regarding september? Then you need regular expressions. It works like this:
grep -E "^Sep" /var/log/messages
gives you all entries for september in your syslog. And there is much more you can do with regular expressions.
- 1 Escaping
- 2 Write regular expressions
- 3 Read regular expressions
- 4 Understand regular expressions
The characters ^ and \ are seen as control-characters. ^ means "at the beginning of a line". With a backslash, you can escape these control-characters, meaning they act as body-characters again:
grep "^hallo" file
finds all occurrences of "hallo" at the beginning of a line in file.
finds all occurrences of "^hallo" in a file
finds all occurrences of "\^hallo" in a file
finds all occurrences of "\\^hallo" in a file And so on...
Write regular expressions
For "finding a pattern defined by a regular expression", we speak of "matching".
Beginning of a line is
grep "^hallo" file
prints all occurrences of "hallo" at the beginning of a line in file.
The end of a line
grep "hallo$" file
prints all occurrences of "hallo" at the end of a line in file.
Find string1 OR string2
grep -E "Sep|Aug" file
prints all lines from file that contain "Sep" or "Aug".
Match a group of characters
grep -E "L[I,1]NUX" file
prints all lines from file that contain "LINUX" or "L1NUX"
Match a range of characters
grep -E "foo[1-9]" file
prints all lines from file that contain "foo1" or "foo2" till "foo9"
NOT the following characters
To invert matching for a group of characters
grep -E "for[^ e]" file
prints all lines from file that contain "for", but not followed by a space or an e, so not "for you" or "foresee"
With grep you have an additional possibility to invert matches:
grep -Ev "gettimeofday" file
prints all lines from file that do NOT contain "gettimeofday". This is a grep feature.
grep -E "L.nux" file
matches any character that is not a newline, e.g. Linux, Lenux and Lnux in file.
Match one or more times
grep -E "L[i]+nux" file
Match if i is there at least once in file The + here is a quantifier. It means, that i occurs 1 or more times. It is also possible to accept 0 or more times if you replace the + by a *.
Read regular expressions
An asterisk is a quantifier saying "whatever number of".
grep -E "Li*nux" file Lnux Linux Liinux Liiinux
An asterisk is placed next to an atom that can be repeated in whatever number. In the above example, the atom is the i character, but it can also be a group of characters:
grep -E "ba(na)*" file ba bana banana bananana
The ^ character stands for
- the beginning of a line if it stands at the beginning of a branch
# grep ^foo barfoo foo foo
- "not" if it stands behind a bracket
# grep for[^e] foresee for each for each
- the ^ character if it is escaped
# grep "\^" adsf as^df as^df
Understand regular expressions
Branches, Pieces and Atoms
A regular expression consists of one or more branches, separated by "|", the "OR" sign. If one of the branches matches, the expression matches:
grep -E "Tom|Harry"
Here, the expression is Tom|Harry, and Tom and Harry are both branches.
A branch consists of one or more pieces, seen in its particular order. A piece is an atom optionally followed by a quantifier:
grep -E "To*m"
Here, T is a piece as well as o* and m.
An atom is a character, a bracket expression or a subexpression. Each line can be an atom:
a b [^e] (this is a subexpression)
A quantifier is used to define that an atom can exist several times:
grep -E "To*m"
Will find all lines containing Tom, Toom, Tooom and Tm.
<Rating comment=false> How do you like this article? 1 (Hated it) 2 3 4 5 (Loved it) </Rating>