Difference between pages "Regular expressions" and "Dpkg"

From Linuxintro
(Difference between pages)
imported>ThorstenStaerk
 
imported>WikiSysop
(New page: If you understand rpm or dpkg and want to learn the other {| border=1 ! Issue ! rpm ! dpkg |- | how files are called || *.rpm || *.deb |- | list all installed packages || rpm -qa || dp...)
 
Line 1: Line 1:
<metadesc>How to write (and read!) Regular Expressions by examples. How to filter for lines containing string1 OR string2, how to filter for lines NOT containing string1, backreferences and the good stuff.</metadesc>
+
If you understand rpm or dpkg and want to learn the other
Regular expressions allow you to formulate string patterns to search or replace. For example, to show all lines that begin with the string ''Sep 13'' in a file ''myfile.txt'' issue:
 
[[grep]] -E "^''Sep 13''" ''myfile.txt''
 
In this case ''^Sep 13'' is your regular expression. You used it to search for a string. And there is much more you can do with regular expressions.
 
  
<pic src=http://linuxintro.org/images/Regular_Expressions.png link=http://linuxintro.org/images/Regular_Expressions.png width=70% caption="RegEx cheat sheet" align=right border=1 />
+
{| border=1
 
+
! Issue
= Escaping =
+
! rpm
The characters ^ and \ are seen as control-characters. ^ means "at the beginning of a line". With a backslash, you can ''escape'' these control-characters, meaning they act as body-characters again:
+
! dpkg
grep "^hallo" file
+
|-
finds all occurrences of "hallo" at the beginning of a line in ''file''.
+
| how files are called || *.rpm || *.deb
grep "\^hallo"
+
|-
finds all occurrences of "^hallo" in a file
+
| list all installed packages || rpm -qa || dpkg -l
grep "\\^hallo"
+
|-
finds all occurrences of "\^hallo" in a file
+
| list all installed packages by order of installation date || rpm -qa --last || ||
grep "\\\\^hallo"
+
|-
finds all occurrences of "\\^hallo" in a file
+
| install a package from a file || rpm -i ''file.rpm'' || dpkg -i ''file.deb''
And so on...
+
|-
 
+
| list package content        || rpm -ql ''package'' || dpkg -L ''package''  
= Write regular expressions =
+
|-
For "finding a pattern defined by a regular expression", we speak of "matching".
+
| find what package provides the installed file ''/bin/ls'' || rpm -qf ''/bin/ls'' || dpkg --search ''/bin/ls''  
 
+
|-
== Beginning of a line is ==
+
| look at a package file description || || dpkg -I ''file.deb''
grep "^hallo" ''file''
+
|-
prints all occurrences of "hallo" at the beginning of a line in ''file''.
+
| look at a package description || rpm -qi ''package'' ||
 
+
|-
== The end of a line ==
+
| find which installed package provides ''/bin/bash'' || rpm -qf ''/bin/bash'' || dpkg -S ''/bin/bash'' (to also search in not installed packages you can use apt-file)
grep "hallo$" ''file''
+
|-
prints all occurrences of "hallo" at the end of a line in ''file''.
+
| find what program provides the file ''Xlib.h'' || || auto-apt search ''Xlib.h''
 
+
|}
== Find string1 OR string2 ==
 
grep -E "Sep|Aug" ''file''
 
prints all lines from ''file'' that contain "Sep" ''or'' "Aug".
 
 
 
== Match a group of characters ==
 
grep -E "L[I,1]NUX" ''file''
 
prints all lines from ''file'' that contain "LINUX" or "L1NUX"
 
 
 
== Match a range of characters ==
 
grep -E "foo[1-9]" ''file''
 
prints all lines from ''file'' that contain "foo1" or "foo2" till "foo9"
 
 
 
== NOT the following characters ==
 
To invert matching for a group of characters
 
grep -E "for[^ e]" ''file''
 
prints all lines from ''file'' that contain "for", but not followed by a space or an e, so not "for you" or "foresee"
 
 
 
Also
 
[^\n]*
 
means "all characters till the next newline". This can be useful when writing parsers.
 
 
 
With grep you have an additional possibility to invert matches:
 
grep -Ev "gettimeofday" ''file''
 
prints all lines from ''file'' that do NOT contain "gettimeofday". This is a grep feature.
 
 
 
== Any character ==
 
grep -E "L.nux" ''file''
 
matches any character that is not a newline, e.g. Linux, Lenux and L7nux in ''file''.
 
 
 
== Match one or more times ==
 
grep -E "L[i]+nux" ''file''
 
Match if i is there at least once in ''file''
 
The + here is a quantifier. It means, that i occurs 1 or more times. It is also possible to accept 0 or more times if you replace the + by a *.
 
 
 
== Match ''n'' times ==
 
/etc/services is a table for protocols (services) and their port numbers. The protocols are filled up with blanks to have 16 characters. If you want to replace all protocols for port 3200 with sapdp00 you do it like this:
 
[[sed]] -ri "s/.{16}3200/sapdp00 3200/" /etc/services
 
 
 
== Backreferences ==
 
Backreferences allows you to reuse matches. For example consider the following line from /var/log/[[apache]]2/access_log:
 
<source>
 
84.163.99.149 - - [21/Jan/2012:15:23:40 +0100] "GET /wiki/Special:RecentChanges HTTP/1.1" 200 66493 "http://www.linuxintro.org/index.php?title=Configuring_and_securing_sshd&action=history" "Mozilla/5.0 (X11; Linux x86_64; rv:9.0.1) Gecko/20100101 Firefox/9.0.1"
 
</source>
 
If you want to "extract the string containing GET between the quotes" you best use backreferences like this:
 
[[cat]] /var/log/apache2/access_log | [[sed]] "s;.*\(GET [^\"]*\).*;\1;"
 
 
 
= Read regular expressions =
 
 
 
== * ==
 
An asterisk is a quantifier saying "whatever number of".
 
grep -E "Li*nux" file
 
Lnux
 
Linux
 
Liinux
 
Liiinux
 
An asterisk is placed next to an atom that can be repeated in whatever number. In the above example, the atom is the ''i'' character, but it can also be a group of characters:
 
grep -E "ba(na)*" file
 
ba
 
bana
 
banana
 
bananana
 
 
 
== ^ ==
 
The ^ character stands for
 
* the beginning of a line if it stands at the beginning of a branch
 
# grep ^foo
 
barfoo
 
foo
 
foo
 
* "not" if it stands behind a bracket
 
# grep for[^e]
 
foresee
 
for each
 
for each
 
* the ^ character if it is escaped
 
# grep "\^"
 
adsf
 
as^df
 
as^df
 
 
 
== ? ==
 
The ? character stands for
 
* non-greedy matching:
 
http://.*?/
 
 
 
= Understand regular expressions =
 
 
 
== Branches, Pieces and Atoms ==
 
A regular expression consists of one or more ''branches'', separated by "|", the "OR" sign. If one of the branches ''matches'', the expression matches:
 
grep -E "Tom|Harry"
 
Here, the expression is ''Tom''|''Harry'', and ''Tom'' and ''Harry'' are both branches.
 
 
 
A branch consists of one or more pieces, seen in its particular order. A piece is an atom optionally followed by a [[Regular_expressions#quantifiers|quantifier]]:
 
grep -E "To*m"
 
Here, T is a piece as well as o* and m.
 
 
 
An atom is a character, a bracket expression or a subexpression. Each line can be an atom:
 
a
 
b
 
[^e]
 
(this is a subexpression)
 
 
 
== quantifiers ==
 
A quantifier is used to define that an atom can exist several times. The * quantifier defines the atom in front of it can occur 0, 1 or several times:
 
grep -E "To*m"
 
Will find all lines containing Tom, Toom, Tooom and Tm.
 
 
 
= See also =
 
* [[scripting tutorial]]
 
* [http://www.linuxintro.org/regex RegEx ComPoser]
 
* [http://www.gskinner.com/RegExr/ RegEx training]
 
 
 
<stumbleuponbutton />
 
 
 
[[Category:Learning]]
 
[[Category:Concept]]
 
[[Category:Mindmap]]
 

Revision as of 08:59, 22 September 2008

If you understand rpm or dpkg and want to learn the other

Issue rpm dpkg
how files are called *.rpm *.deb
list all installed packages rpm -qa dpkg -l
list all installed packages by order of installation date rpm -qa --last
install a package from a file rpm -i file.rpm dpkg -i file.deb
list package content rpm -ql package dpkg -L package
find what package provides the installed file /bin/ls rpm -qf /bin/ls dpkg --search /bin/ls
look at a package file description dpkg -I file.deb
look at a package description rpm -qi package
find which installed package provides /bin/bash rpm -qf /bin/bash dpkg -S /bin/bash (to also search in not installed packages you can use apt-file)
find what program provides the file Xlib.h auto-apt search Xlib.h