Difference between revisions of "Regular expressions"

From Linuxintro
(4 intermediate revisions by 4 users not shown)
Line 20: Line 20:
 
For "finding a pattern defined by a regular expression", we speak of "matching".
 
For "finding a pattern defined by a regular expression", we speak of "matching".
  
== Beginning of a line is ==
+
That's way more clever than I was epxecitng. Thanks!
grep "^hallo" ''file''
 
prints all occurrences of "hallo" at the beginning of a line in ''file''.
 
  
 
== The end of a line ==
 
== The end of a line ==
Line 32: Line 30:
 
prints all lines from ''file'' that contain "Sep" ''or'' "Aug".
 
prints all lines from ''file'' that contain "Sep" ''or'' "Aug".
  
== Match a group of characters ==
+
I'm out of league here. Too much brain power on dipasly!
grep -E "L[I,1]NUX" ''file''
 
prints all lines from ''file'' that contain "LINUX" or "L1NUX"
 
  
 
== Match a range of characters ==
 
== Match a range of characters ==
Line 42: Line 38:
 
Posts like this birhgten up my day. Thanks for taking the time.
 
Posts like this birhgten up my day. Thanks for taking the time.
  
== Any character ==
+
That's 2 clveer by half and 2x2 clever 4 me. Thanks!
grep -E "L.nux" ''file''
 
matches any character that is not a newline, e.g. Linux, Lenux and Lnux in ''file''.  
 
  
 
== Match one or more times ==
 
== Match one or more times ==
Line 85: Line 79:
 
  as^df
 
  as^df
  
= Understand regular expressions =
+
Your ansewr was just what I needed. It’s made my day!
 
 
== Branches, Pieces and Atoms ==
 
A regular expression consists of one or more ''branches'', separated by "|", the "OR" sign. If one of the branches ''matches'', the expression matches:
 
grep -E "Tom|Harry"
 
Here, the expression is ''Tom''|''Harry'', and ''Tom'' and ''Harry'' are both branches.
 
 
 
A branch consists of one or more pieces, seen in its particular order. A piece is an atom optionally followed by a [[Regular_expressions#quantifiers|quantifier]]:
 
grep -E "To*m"
 
Here, T is a piece as well as o* and m.
 
 
 
An atom is a character, a bracket expression or a subexpression. Each line can be an atom:
 
a
 
b
 
[^e]
 
(this is a subexpression)
 
 
 
== quantifiers ==
 
A quantifier is used to define that an atom can exist several times. The * quantifier defines the atom in front of it can occur 0, 1 or several times:
 
grep -E "To*m"
 
Will find all lines containing Tom, Toom, Tooom and Tm.
 
 
 
<Rating comment=false>
 
How do you like this article?
 
1 (Hated it)
 
2
 
3
 
4
 
5 (Loved it)
 
</Rating>
 

Revision as of 14:32, 23 May 2011

Regular expressions allow you to formulate patterns to search for. Here's an example: It is easy to search for the string "Sep" in a file, you do it with

grep "Sep" file

This gives you all lines containing the string "Sep". But what do you do if you only want lines starting with "Sep", for example, to read all lines in your syslog regarding september? Then you need regular expressions. It works like this:

grep -E "^Sep" /var/log/messages

gives you all entries for september in your syslog. And there is much more you can do with regular expressions.

Escaping

The characters ^ and \ are seen as control-characters. ^ means "at the beginning of a line". With a backslash, you can escape these control-characters, meaning they act as body-characters again:

grep "^hallo" file

finds all occurrences of "hallo" at the beginning of a line in file.

grep "\^hallo" 

finds all occurrences of "^hallo" in a file

grep "\\^hallo"

finds all occurrences of "\^hallo" in a file

grep "\\\\^hallo"

finds all occurrences of "\\^hallo" in a file And so on...

Write regular expressions

For "finding a pattern defined by a regular expression", we speak of "matching".

That's way more clever than I was epxecitng. Thanks!

The end of a line

grep "hallo$" file

prints all occurrences of "hallo" at the end of a line in file.

Find string1 OR string2

grep -E "Sep|Aug" file

prints all lines from file that contain "Sep" or "Aug".

I'm out of league here. Too much brain power on dipasly!

Match a range of characters

grep -E "foo[1-9]" file

prints all lines from file that contain "foo1" or "foo2" till "foo9"

Posts like this birhgten up my day. Thanks for taking the time.

That's 2 clveer by half and 2x2 clever 4 me. Thanks!

Match one or more times

grep -E "L[i]+nux" file

Match if i is there at least once in file The + here is a quantifier. It means, that i occurs 1 or more times. It is also possible to accept 0 or more times if you replace the + by a *.

Read regular expressions

*

An asterisk is a quantifier saying "whatever number of".

grep -E "Li*nux" file
Lnux
Linux
Liinux
Liiinux

An asterisk is placed next to an atom that can be repeated in whatever number. In the above example, the atom is the i character, but it can also be a group of characters:

grep -E "ba(na)*" file
ba
bana
banana
bananana

^

The ^ character stands for

  • the beginning of a line if it stands at the beginning of a branch
# grep ^foo
barfoo
foo
foo
  • "not" if it stands behind a bracket
# grep for[^e]
foresee
for each
for each
  • the ^ character if it is escaped
# grep "\^"
adsf
as^df
as^df

Your ansewr was just what I needed. It’s made my day!