Difference between pages "Strace: what a process does" and "File:Snapshot-gnuplot-weight.png"

From Linuxintro
(Difference between pages)
imported>ThorstenStaerk
 
(Maintenance script uploaded File:Snapshot-gnuplot-weight.png)
 
Line 1: Line 1:
<metadesc>How to strace a process. How to understand strace's output. How to follow sub-processes, how to format the string length so you get a sensible meaning, how to find the file descriptors and how to see where the process spends most of its time.</metadesc>
 
  
strace is a [[command]] that shows a process' calls to the kernel (aka syscalls). This helps you to do excellent troubleshooting and gain good performance tuning insights.
 
 
<pic border=1 src="http://www.linuxintro.org/strace-mindmap.jpg" width=50% align=right caption="strace cheat sheet" link="http://www.linuxintro.org/strace-mindmap.jpg"  />
 
 
= Understand strace's output =
 
Let's start with a real-world example: You want to [[troubleShoot sound]] on your computer. You know that speaker-test does some test noise. To find out what exactly happens, issue:
 
<abbr title="This is a prompt. The follow is a command you must enter.">#</abbr> strace <abbr title="String length should be 99 characters. It's no fun if the actual messages in strace's output are cut short.">-s 99</abbr> <abbr title="strace should follow sub-processes and show their behavior as well">-ff</abbr> speaker-test
 
 
You will find a lot of lines like
 
<abbr title="Process id. You get this field because you started strace with -ff">[pid 15287]</abbr> <abbr title="Syscall. Find out more with the command man 2 write.">write(6, "W", 1)            = 1</abbr>
 
<abbr title="Process id. You get this field because you started strace with -ff">[pid 15287]</abbr> <abbr title="Syscall. Find out more with the command man 2 recvmsg.">recvmsg</abbr>(<abbr title="As we can tell from man 2 recv, the following is a file descriptor. Find out more with the command cat /proc/PID/fd/7">7</abbr>, {msg_name(0)=NULL, msg_iov(1)=[{"L\0\0\0\2L\0\0\2+U\0\0\0\0\0\31vqU\0\0\0\0\0\0\0\0001TU \240\326\0\4\252\354TU \240\326\0\4\262%r\0\0\0\0\0\20\0\0r\0\0\0\0\0\2\260\220R\0\0\0\0\0\0\0\0R\0\0\0\0\0\2\260\220", 83}], msg_controllen=24, {cmsg_len=24, cmsg_level=SOL_SOCKET, cmsg_type=SCM_CREDENTIALS{pid=2289, uid=0, gid=0}}, msg_flags=0}, 0) = 83
 
<abbr title="Process id. You get this field because you started strace with -ff">[pid 15287]</abbr> <abbr title="Syscall. Find out more with the command man 2 write.">write(6, "W", 1)            = 1</abbr>
 
 
'''Looks complicated, right?''' But fear not, every line follows the same pattern:
 
[pid ''process id''] syscall(parameters) = return code
 
 
You will only find the first column with the '''process id''' if you use the -ff option (follow sub-processes). For the '''syscall''' (write or recvmsg in the above example) it is easy to obtain information - just type
 
man 2 ''syscall''
 
in the above example
 
man 2 write
 
and
 
man 2 recvmsg
 
And you will find abundant information what the syscall does. If you understand 50% you are doing well :)
 
The '''return code''' will also be documented in the syscall's man page.
 
 
'''So what happens here?''' Obviously speaker-test calls the kernel's ''recvmsg'' method. The interesting thing we get from <tt>man 2 recvmsg</tt> is that the first parameter is sockfd, the file descriptor of a socket. So it hands over a socket as a parameter. Let's look at process' file descriptors:
 
<abbr title="This is a prompt. The follow is a command you must enter.">#</abbr> cd /proc/15287/fd
 
ls -l
 
total 0
 
lrwx------ 1 root root 64 Apr  4 22:55 0 -> /dev/pts/4
 
lrwx------ 1 root root 64 Apr  4 22:55 1 -> /dev/pts/4
 
lrwx------ 1 root root 64 Apr  4 22:55 2 -> /dev/pts/4
 
lr-x------ 1 root root 64 Apr  4 22:55 3 -> pipe:[297487]
 
l-wx------ 1 root root 64 Apr  4 22:55 4 -> pipe:[297487]
 
lr-x------ 1 root root 64 Apr  4 22:55 5 -> pipe:[297488]
 
l-wx------ 1 root root 64 Apr  4 22:55 6 -> pipe:[297488]
 
lrwx------ 1 root root 64 Apr  4 22:55 7 -> socket:[297495]
 
 
aha! file descriptor 7 is the unix socket 297495. Let's look what this is connected to:
 
 
<abbr title="This is a prompt. The follow is a command you must enter.">#</abbr> ss -p [[piping||]] [[grep]] 297495
 
u_str  ESTAB      0      0      /run/user/0/pulse/native '''297496'''                * '''297495'''  users:(("'''pulseaudio'''",2289,28))
 
u_str  ESTAB      0      0                    * '''297495'''                * '''297496'''  users:(("'''speaker-test'''",15286,7))
 
 
We see - the socket 297495 is connected to the socket 297496. What is written into 297495 will be readable from 297496 and the other way round. This couple of sockets connects the speaker-test process with the pulseaudio process. 2289 is pulseaudio's process id, and 15286 is the root process id of speaker-test. This system is using pulseaudio for sound which is an important information for troubleshooting sound.
 
 
= strace start options =
 
You can start a new process with strace or you can attach strace to an already running process. To start a new process just put strace in front of its command:
 
strace <abbr title="print the first 99 characters of every output string">-s 99</abbr> <abbr title="follow child processes">-ff</abbr> <abbr title="command that you want to execute and trace">ls</abbr>
 
To attach strace to an already running process, in this example firefox, find out the process' id:
 
# ps -A | grep firefox
 
2728 pts/1    00:00:02 firefox
 
Then call strace with the id as parameter:
 
strace <abbr title="print the first 99 characters of every output string">-s 99</abbr> <abbr title="follow child processes">-ff</abbr><abbr title="process id">p</abbr> 2728
 
 
= limitations =
 
strace will show you all syscalls from a process, and that's it. This also mean that if your process hangs in an endless loop that does not do any syscall, strace will not show anything.
 
 
= interprocess communication =
 
Processes will typically communicate:
 
* via network
 
* via sockets
 
* via pipes
 
* via shared memory
 
* by writing into files
 
 
For example when using strace to [[troubleshoot sound]] you may see that speaker-test receives messages from file descriptor 7:
 
# strace -s 99 -ffp 32681
 
[pid 32682] recvmsg(7, {msg_name(0)=NULL, msg_iov(1)=[{"L\0\0\0\2L\0\0\0}U\0\0\0\0\0\22\10\277U\0\0\0\0\0\0\0\0001TU b|\0\17(\347TU b|\0\0175\342r\0\0\0\0\0\20\0\0r\0\0\0\0\0\2\260\220R\0\0\0\0\0\0\0\0R\0\0\0\0\0\2\260\220", 83}], msg_controllen=24, {cmsg_len=24, cmsg_level=SOL_SOCKET, cmsg_type=SCM_CREDENTIALS{pid=2289, uid=0, gid=0}}, msg_flags=0}, 0) = 83
 
Now let's find out what file descriptor 7 of process 32681 is:
 
# ll /proc/32681/fd/7
 
lrwx------ 1 root root 64 Apr  4 18:17 /proc/32681/fd/7 -> socket:[198064]
 
So it is unix socket 198064. Let's find out what process it is:
 
# ss -p | grep 198064
 
u_str  ESTAB      0      0                    * 198064                * 198065  users:(("speaker-test",32681,7))
 
u_str  ESTAB      0      0      /run/user/0/pulse/native 198065                * 198064  users:(("pulseaudio",2289,28))
 
 
= Search for a syscall =
 
With strace, you can search for a special syscall. For example,
 
strace -e open ''command''
 
will show you all files that have been touched (opened) by ''command''. This can be used e.g. to [[find out where configuration changes are stored]].
 
 
= Performance analysis =
 
With strace, you can find out the biggest time-consuming syscalls during a program run:
 
strace -c ls -R
 
Entries  Repository  Root
 
% time    seconds  usecs/call    calls    errors syscall
 
------ ----------- ----------- --------- --------- ----------------
 
  67.09    0.412153          14    29664          getdents64
 
  27.70    0.170168          11    14849        14 open
 
  4.24    0.026043          0    123740          write
 
  0.72    0.004443          0    14837          close
 
  0.20    0.001204          0    14836          fstat
 
  0.05    0.000285        285        1          execve
 
  0.00    0.000000          0        12          read
 
  0.00    0.000000          0        4        3 stat
 
  0.00    0.000000          0        33          mmap
 
  0.00    0.000000          0        18          mprotect
 
  0.00    0.000000          0        4          munmap
 
  0.00    0.000000          0        12          brk
 
  0.00    0.000000          0        2          rt_sigaction
 
  0.00    0.000000          0        1          rt_sigprocmask
 
  0.00    0.000000          0        2          ioctl
 
  0.00    0.000000          0        1        1 access
 
  0.00    0.000000          0        3          mremap
 
  0.00    0.000000          0        1          fcntl
 
  0.00    0.000000          0        1          getrlimit
 
  0.00    0.000000          0        1          statfs
 
  0.00    0.000000          0        1          arch_prctl
 
  0.00    0.000000          0        3        1 futex
 
  0.00    0.000000          0        1          set_tid_address
 
  0.00    0.000000          0        8          fadvise64
 
  0.00    0.000000          0        1          set_robust_list
 
------ ----------- ----------- --------- --------- ----------------
 
100.00    0.614296                198036        19 total
 
 
Now you go
 
[http://linux.die.net/man/2/getdents64 man 2 getdents64]
 
to find out what that syscall is about.
 
 
= See also =
 
* [[gdb|gdb, allowing you to monitor every action of a process]]
 
* [http://linux.die.net/man/1/strace man strace]
 
* http://try-linux.blogspot.de/2013/12/how-to-strace-process.html
 
* http://scn.sap.com/community/linux/blog/2014/04/11/dispwork-running-but-not-connected-to-message-server#
 
* [[TCPDump]]
 
 
[[Category:low-level]]
 
[[Category:command]]
 
[[Category:mindmap]]
 

Latest revision as of 14:34, 13 January 2021