Difference between revisions of "Regular expressions"

From Linuxintro
imported>WikiSysop
imported>WikiSysop
Line 49: Line 49:
 
  grep -Ev "gettimeofday" ''file''
 
  grep -Ev "gettimeofday" ''file''
 
prints all lines from ''file'' that do NOT contain "gettimeofday". This is a grep feature.
 
prints all lines from ''file'' that do NOT contain "gettimeofday". This is a grep feature.
 +
 +
=== Match any character ===
 +
grep -E "L.nux" ''file''
 +
matches any character that is not a newline, e.g. Linux, Lenux and Lnux in ''file''.
 +
 +
=== Match one or more times ===
 +
grep -E "L[i]+nux" ''file''
 +
Match if i is there at least once in ''file''
 +
The + here is a quantifier. It means, that i occurs 1 or more times. It is also possible to accept 0 or more times if you replace the + by a *.
  
 
= Read regular expressions =
 
= Read regular expressions =

Revision as of 12:04, 20 September 2008

Regular expressions allow you to formulate patterns to search for. Here's an example: It is easy to search for the string "Sep" in a file, you do it with

grep "Sep" file

This gives you all lines containing the string "Sep". But what do you do if you only want lines starting with "Sep", for example, to read all lines in your syslog regarding september? Then you need regular expressions. It works like this:

grep -E "^Sep" /var/log/messages

gives you all entries for september in your syslog. And there is much more you can do with regular expressions.

Escaping

The characters ^ and \ are seen as control-characters. ^ means "at the beginning of a line". With a backslash, you can escape these control-characters, meaning they act as body-characters again:

grep "^hallo" file

finds all occurrences of "hallo" at the beginning of a line in file.

grep "\^hallo" 

finds all occurrences of "^hallo" in a file

grep "\\^hallo"

finds all occurrences of "\^hallo" in a file

grep "\\\\^hallo"

finds all occurrences of "\\^hallo" in a file And so on...

Write regular expressions

Matching

For "finding a pattern defined by a regular expression", we speak of "matching".

Match at the beginning of a line

grep "^hallo" file

prints all occurrences of "hallo" at the beginning of a line in file.

Match at the end of a line

grep "hallo$" file

prints all occurrences of "hallo" at the end of a line in file.

Match string1 OR string2

grep -E "Sep|Aug" file

prints all lines from file that contain "Sep" or "Aug".

Match a group of characters

grep -E "L[I,1]NUX" file

prints all lines from file that contain "LINUX" or "L1NUX"

Match a range of characters

grep -E "foo[1-9]" file

prints all lines from file that contain "foo1" or "foo2" till "foo9"

Invert a group of characters

grep -E "for[^ e]" file

prints all lines from file that contain "for", but not followed by a space or an e, so not "for you" or "foresee"

Invert matches

grep -Ev "gettimeofday" file

prints all lines from file that do NOT contain "gettimeofday". This is a grep feature.

Match any character

grep -E "L.nux" file

matches any character that is not a newline, e.g. Linux, Lenux and Lnux in file.

Match one or more times

grep -E "L[i]+nux" file

Match if i is there at least once in file The + here is a quantifier. It means, that i occurs 1 or more times. It is also possible to accept 0 or more times if you replace the + by a *.

Read regular expressions

*

An asterisk is a quantifier saying "whatever number of".

grep -E "Li*nux" file
Lnux
Linux
Liinux
Liiinux

An asterisk is placed next to an atom that can be repeated in whatever number. In the above example, the atom is the i character, but it can also be a group of characters:

grep -E "ba(na)*" file
ba
bana
banana
bananana