Difference between revisions of "Strace: what a process does"

From Linuxintro
imported>ThorstenStaerk
imported>ThorstenStaerk
Line 16: Line 16:
 
  <abbr title="Syscall. Find out more with the command man 2 recv.">recv</abbr>(<abbr title="As we can tell from man 2 recv, the following is a file descriptor. Find out more with the command cat /prov/''PID''/fd/12">12</abbr>, "\37\36&\364T<\264\6Og\20\2\30%-\31C\3757\350{\260\0203\351\23&O\365(y\212"..., 16384, 0) = 1448
 
  <abbr title="Syscall. Find out more with the command man 2 recv.">recv</abbr>(<abbr title="As we can tell from man 2 recv, the following is a file descriptor. Find out more with the command cat /prov/''PID''/fd/12">12</abbr>, "\37\36&\364T<\264\6Og\20\2\30%-\31C\3757\350{\260\0203\351\23&O\365(y\212"..., 16384, 0) = 1448
 
The good news is that every line follows the same scheme:
 
The good news is that every line follows the same scheme:
  syscall(parameters) = return code
+
  syscall(parameters) = return value
 
Whatever you see for syscall you can get help on using the man(2) pages. So for the above example
 
Whatever you see for syscall you can get help on using the man(2) pages. So for the above example
 
  man 2 clock_gettime
 
  man 2 clock_gettime
 
  man 2 poll
 
  man 2 poll
Will give you the information what is going on.
+
man 2 recv
 +
Will give you the information what is going on. For example, clock_gettime retrieves the current time.
  
== analysis ==
 
Every line follows the syntax
 
syscall(arguments) = return value
 
  
The meaning of the syscall, its expected arguments and its return value can be determined from its man page. For example to understand the first line call [http://linux.die.net/man/2/clock_gettime man 2 clock_gettime]. It retrieves the current time.
+
Wow, that was easy. Now to the next line, poll. According to [http://linux.die.net/man/2/poll man 2 poll] it waits for an event on a file descriptor, like "data to be read". The file descriptor is the first argument, 12.  
 
 
Wow, that was easy. Now to the next line, poll. According to [http://linux.die.net/man/2/poll man 2 poll] it waits for an event on a file descriptor, like "data to be read". The file descriptor is the first argument, 12. What's this? Let's assume the process ID we are tracing is 7179. Then we can find out its file descriptors like this:
 
 
  /proc/7179/fd # ll
 
  /proc/7179/fd # ll
 
  total 0
 
  total 0

Revision as of 17:37, 24 November 2014

So you have a process that is crashing or misbehaving and you found the tool strace to tell you what the process does. However you don't understand its output? Then this article is for you.

starting strace

You can start a new process with strace or you can attach strace to an already running process. To start a new process just put strace in front of its command:

strace -s 99 -ff ls

To attach strace to an already running process, in this example firefox, find out the process' id:

# ps -A | grep firefox
2728 pts/1    00:00:02 firefox

Then call strace with the id as parameter:

strace -s 99 -ffp 2728

Understanding strace's output

The bad news is that strace's output looks tedious like this:

clock_gettime(CLOCK_MONOTONIC, {4433, 764347636}) = 0
poll([{fd=12, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 1, 0) = 1 ([{fd=12, revents=POLLIN|POLLRDNORM}])
recv(12, "\37\36&\364T<\264\6Og\20\2\30%-\31C\3757\350{\260\0203\351\23&O\365(y\212"..., 16384, 0) = 1448

The good news is that every line follows the same scheme:

syscall(parameters) = return value

Whatever you see for syscall you can get help on using the man(2) pages. So for the above example

man 2 clock_gettime
man 2 poll
man 2 recv

Will give you the information what is going on. For example, clock_gettime retrieves the current time.


Wow, that was easy. Now to the next line, poll. According to man 2 poll it waits for an event on a file descriptor, like "data to be read". The file descriptor is the first argument, 12.

/proc/7179/fd # ll
total 0
lr-x------ 1 root root 64 Apr 19 08:02 0 -> /dev/null
lrwx------ 1 root root 64 Apr 19 08:02 1 -> socket:[156171]
l-wx------ 1 root root 64 Apr 19 08:02 2 -> /var/log/apache2/error_log
lr-x------ 1 root root 64 Apr 19 08:02 4 -> pipe:[156208]
l-wx------ 1 root root 64 Apr 19 08:02 5 -> pipe:[156208]
l-wx------ 1 root root 64 Apr 19 08:02 6 -> /var/log/apache2/access_log
lrwx------ 1 root root 64 Apr 19 08:02 7 -> anon_inode:[eventpoll]
lrwx------ 1 root root 64 Apr 19 08:02 12 -> socket:[155097]

Ok, to find out what socket:[155097] is we use the command

  1. lsof | grep 155097
httpd2-pr 7158            root    3u     IPv4     155097      0t0        TCP *:http (LISTEN)
httpd2-pr 7179          wwwrun    3u     IPv4     155097      0t0        TCP *:http (LISTEN)

So this is a socket listening to the http port.

Search for a syscall

With strace, you can search for a special syscall. For example,

strace -e open command

will show you all files that have been touched (opened) by command. This can be used e.g. to find out where configuration changes are stored.

Performance analysis

With strace, you can find out the biggest time-consuming syscalls during a program run:

strace -c ls -R
Entries  Repository  Root
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 67.09    0.412153          14     29664           getdents64
 27.70    0.170168          11     14849        14 open
  4.24    0.026043           0    123740           write
  0.72    0.004443           0     14837           close
  0.20    0.001204           0     14836           fstat
  0.05    0.000285         285         1           execve
  0.00    0.000000           0        12           read
  0.00    0.000000           0         4         3 stat
  0.00    0.000000           0        33           mmap
  0.00    0.000000           0        18           mprotect
  0.00    0.000000           0         4           munmap
  0.00    0.000000           0        12           brk
  0.00    0.000000           0         2           rt_sigaction
  0.00    0.000000           0         1           rt_sigprocmask
  0.00    0.000000           0         2           ioctl
  0.00    0.000000           0         1         1 access
  0.00    0.000000           0         3           mremap
  0.00    0.000000           0         1           fcntl
  0.00    0.000000           0         1           getrlimit
  0.00    0.000000           0         1           statfs
  0.00    0.000000           0         1           arch_prctl
  0.00    0.000000           0         3         1 futex
  0.00    0.000000           0         1           set_tid_address
  0.00    0.000000           0         8           fadvise64
  0.00    0.000000           0         1           set_robust_list
------ ----------- ----------- --------- --------- ----------------
100.00    0.614296                198036        19 total

Now you go

man 2 getdents64

to find out what that syscall is about.

See also