Преглед на файлове

"Debugging beyond the debugger"

master
Chris Smith преди 5 години
родител
ревизия
bdc610a5ea
променени са 2 файла, в които са добавени 281 реда и са изтрити 0 реда
  1. 281
    0
      site/content/post/2019-05-08-debugging-beyond-the-debugger.md
  2. Двоични данни
      site/static/res/images/debugging/strace.png

+ 281
- 0
site/content/post/2019-05-08-debugging-beyond-the-debugger.md Целия файл

@@ -0,0 +1,281 @@
1
+---
2
+date: 2019-05-08
3
+title: Debugging beyond the debugger
4
+url: /2019/05/08/debugging-beyond-the-debugger/
5
+image: /res/images/debugging/strace.png
6
+description: What happens when your usual approach fails you?
7
+---
8
+
9
+Most programming -- and sysadmin -- problems can be debugged in a
10
+fairly straight forward manner using logs, print statements,
11
+educated guesses, or an actual debugger. Sometimes, though, the
12
+problem is more elusive. There's a wider box of tricks that can
13
+be employed in these cases but I've not managed to find a nice
14
+overview of them, so here's mine. I'm mainly focusing on Linux
15
+and similar systems, but there tend to be alternatives available
16
+for other Operating Systems or VMs if you seek them out.
17
+
18
+## Networking
19
+
20
+### tcpdump
21
+
22
+`tcpdump` prints out descriptions of packets on a network interface. You can
23
+apply filters to limit which packets are displayed, chose to dump the entire
24
+content of the packet, and so forth.
25
+
26
+<!--more-->
27
+
28
+Typical usage might look something like:
29
+
30
+{{< highlight console >}}
31
+# tcpdump -nSi eth0 port 80
32
+tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
33
+listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
34
+16:03:35.577781 IP6 2001:db8::1.54742 > 2001:db8::2.80: Flags [S], seq 2815779044, win 64800, options [mss 1440,sackOK,TS val 2378811665 ecr 0,nop,wscale 7], length 0
35
+16:03:35.586853 IP6 2001:db8::2.80 > 2001:db8::1.54742: Flags [S.], seq 1522609102, ack 2815779045, win 28560, options [mss 1440,sackOK,TS val 3063610173 ecr 2378811665,nop,wscale 7], length 0
36
+16:03:35.586877 IP6 2001:db8::1.54742 > 2001:db8::2.80: Flags [.], ack 1522609103, win 507, options [nop,nop,TS val 2378811674 ecr 3063610173], length 0
37
+16:03:35.620678 IP6 2001:db8::1.54742 > 2001:db8::2.80: Flags [P.], seq 2815779045:2815779399, ack 1522609103, win 507, options [nop,nop,TS val 2378811708 ecr 3063610173], length 354: HTTP: GET / HTTP/1.1
38
+{{< / highlight >}}
39
+
40
+Here you can see the start of a plain text HTTP request: the three-way
41
+handshake as the TCP connection is established followed by a GET request.
42
+Even if the data is encrypted as it will be in most cases, it's often useful
43
+to see the "shape" of the transmissions -- did the client start sending data
44
+when it connected, did the server ever respond, etc.
45
+
46
+[Daniel Miessler has a good tutorial on tcpdump](https://danielmiessler.com/study/tcpdump/)
47
+if you're not familiar with it and don't want to jump straight into the man
48
+page.
49
+
50
+#### ... with Docker
51
+
52
+Docker sets up separate network namespaces for each container. To see the
53
+traffic across the interfaces of a single container you can `nsenter` the
54
+container's network namespace:
55
+
56
+{{< highlight console >}}
57
+# nsenter -t $(docker inspect --format '{{.State.Pid}}' my_container) -n tcpdump -nS port 80
58
+{{< / highlight >}}
59
+
60
+This retrieves the PID for the container, and tells `nsenter` to enter the
61
+network (`-n`) namespace from the given target (`-t`) PID, and then run the
62
+given command (in this case `tcpdump ...`).
63
+
64
+### openssl s_client / s_server
65
+
66
+When a connection is using TLS it's often useful to try connecting to the
67
+server and see what certificate it presents, algorithms it negoitates, and
68
+so forth. OpenSSL offers two useful subcommands which can help with this:
69
+`s_client` for connecting as a client, and `s_server` for listening to
70
+connections.
71
+
72
+For example using `s_client` to connect to `google.com` on the standard
73
+HTTPS port shows us details about the server cert and its verification
74
+status:
75
+
76
+{{< highlight console >}}
77
+$ openssl s_client -connect google.com:443
78
+CONNECTED(00000003)
79
+depth=2 OU = GlobalSign Root CA - R2, O = GlobalSign, CN = GlobalSign
80
+verify return:1
81
+depth=1 C = US, O = Google Trust Services, CN = Google Internet Authority G3
82
+verify return:1
83
+depth=0 C = US, ST = California, L = Mountain View, O = Google LLC, CN = *.google.com
84
+verify return:1
85
+---
86
+Certificate chain
87
+ 0 s:C = US, ST = California, L = Mountain View, O = Google LLC, CN = *.google.com
88
+   i:C = US, O = Google Trust Services, CN = Google Internet Authority G3
89
+ 1 s:C = US, O = Google Trust Services, CN = Google Internet Authority G3
90
+   i:OU = GlobalSign Root CA - R2, O = GlobalSign, CN = GlobalSign
91
+---
92
+# ...
93
+{{< / highlight >}}
94
+
95
+Whereas connecting to my webserver and providing an unknown host in the SNI
96
+field results in an SSL alert 112 ("The server name sent was not recognized")
97
+and no server certificate is sent:
98
+
99
+{{< highlight console >}}
100
+$ openssl s_client -connect chameth.com:443 -servername example.com
101
+CONNECTED(00000003)
102
+140384831313024:error:14094458:SSL routines:ssl3_read_bytes:tlsv1 unrecognized name:../ssl/record/rec_layer_s3.c:1536:SSL alert number 112
103
+---
104
+no peer certificate available
105
+---
106
+# ...
107
+{{< / highlight >}}
108
+
109
+Often if you hit this kind of alert in an application the exact error will be
110
+lost somewhere in the many layers between the SSL library and the logs, so
111
+being able to directly connect and test can help diagnose a lot of issues.
112
+
113
+Once a connection is established you can read and write plain text and it
114
+will be encrypted and decrypted automatically.
115
+
116
+### Java apps
117
+
118
+If a Java app is involved in the connection, you can enable a lot of built-in
119
+debugging with a simple JVM property: `javax.net.debug`. You can tweak
120
+what exactly gets logged, but the easiest thing to do is just set the property
121
+to `all` and you'll see information about certificate chains, verification,
122
+and packet dumps:
123
+
124
+{{< highlight console >}}
125
+$ java -Djavax.net.debug=all -jar ....
126
+# ...
127
+found key for : duke
128
+chain [0] = [
129
+[
130
+  Version: V1
131
+  Subject: CN=Duke, OU=Java Software, O="Sun Microsystems, Inc.",
132
+  L=Cupertino, ST=CA, C=US
133
+# ...
134
+{{< / highlight >}}
135
+
136
+More information about Java's debugging options is available on
137
+[docs.oracle.com](https://docs.oracle.com/javase/7/docs/technotes/guides/security/jsse/ReadDebug.html).
138
+
139
+## Thread and core dumps
140
+
141
+Higher-level languages frequently provide an interactive way to dump the
142
+current executation state of all of their threads (a "thread dump"). This
143
+is useful to spot deadlocks, some types of race conditions, and as a
144
+quick and dirty method of investigating hangs or excessive CPU usage.
145
+
146
+With both Java and Go applications you can send a QUIT signal to have a
147
+thread dump printed out; Go applications will quit after doing so, Java
148
+ones will carry on running. At most terminals you can hit `Ctrl` and `\` to
149
+send a QUIT signal.
150
+
151
+For Java you can also use the `jstack` tool from the JDK to dump threads
152
+by PID; this can be useful if the application is running in the background
153
+or has redirected sysout:
154
+
155
+{{< highlight console >}}
156
+$ jstack 8321
157
+Attaching to process ID 8321, please wait...
158
+Debugger attached successfully.
159
+Client compiler detected.
160
+
161
+Thread t@5: (state = BLOCKED)
162
+ - java.lang.Object.wait(long) @bci=-1107318896 (Interpreted frame)
163
+ - java.lang.Object.wait(long) @bci=0 (Interpreted frame)
164
+ - java.lang.ref.ReferenceQueue.remove(long) @bci=44, line=116 (Interpreted frame)
165
+ - java.lang.ref.ReferenceQueue.remove() @bci=2, line=132 (Interpreted frame)
166
+ - java.lang.ref.Finalizer$FinalizerThread.run() @bci=3, line=159 (Interpreted frame)
167
+
168
+# ...
169
+{{< / highlight >}}
170
+
171
+A core dump provides more complete information about the state of a process,
172
+but is often more complex to interpret. The `gcore` utility from GDB will
173
+create a core dump of a process with a given PID. You can then generally
174
+load the core file using your normal debugger, depending on the language
175
+in question.
176
+
177
+## System calls
178
+
179
+`strace` is the swiss army knife for seeing what a process is doing. It
180
+details each system call made by a program (you can filter them down, of
181
+course). For example:
182
+
183
+{{< highlight console >}}
184
+$ strace -e read curl https://google.com/
185
+read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0 \236\0\0\0\0\0\0"..., 832) = 832
186
+read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0P!\0\0\0\0\0\0"..., 832) = 832
187
+read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\200l\2\0\0\0\0\0"..., 832) = 832
188
+read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\20Q\0\0\0\0\0\0"..., 832) = 832
189
+# ...
190
+read(3, "\0\0\0\0\0\0\0\4\25\345\366\302\273sE6\365wI\225\321|\3435Z\362\216\372\215\251aO"..., 253) = 253
191
+<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
192
+<TITLE>301 Moved</TITLE></HEAD><BODY>
193
+<H1>301 Moved</H1>
194
+The document has moved
195
+<A HREF="https://www.google.com/">here</A>.
196
+</BODY></HTML>
197
+read(3, "\27\3\3\0!", 5)                = 5
198
+# ...
199
+{{< / highlight >}}
200
+
201
+[Brendan Gregg](http://www.brendangregg.com/blog/2014-05-12/strace-wow-much-syscall.html)
202
+has a nice guide on `strace` and alternatives.
203
+
204
+### ... with docker
205
+
206
+When the application is running in docker you can usually just `strace` it
207
+from the host with the correct PID
208
+(from e.g. `docker inspect --format '{{.State.Pid}}' my_container`).
209
+Sometimes you may need to trace the startup of an application though, which is
210
+a bit trickier. Instead you can run a new container using the same PID
211
+namespace as your target, and the permissions needed to `strace`:
212
+
213
+{{< highlight console >}}
214
+$ docker run --rm -it --pid=container:my_container \
215
+  --net=container:my_container \
216
+  --cap-add sys_admin \
217
+  --cap-add sys_ptrace \
218
+  alpine
219
+{{< / highlight >}}
220
+
221
+From within the new container you can install strace, and trace any running
222
+program within the target container using `strace -p` as normal. To start a
223
+new program you need access to the target container's file system, which you
224
+can get to via `/proc/1/root` (PID `1` being the main process that docker
225
+started in the target container).
226
+
227
+## Files
228
+
229
+Sometimes the problem might relate to file access. There are a couple of
230
+straight forward - but nonetheless useful - tools which might help here.
231
+`inotifywait` uses the Linux `inotify` subsystem to watch files or directories
232
+for operations. For example:
233
+
234
+{{< highlight console >}}
235
+$ inotifywait -mr site/content
236
+Setting up watches.  Beware: since -r was given, this may take a while!
237
+Watches established.
238
+site/content/post/ MODIFY 2019-05-08-debugging-beyond-the-debugger.md
239
+site/content/post/ OPEN 2019-05-08-debugging-beyond-the-debugger.md
240
+site/content/post/ MODIFY 2019-05-08-debugging-beyond-the-debugger.md
241
+site/content/post/ MODIFY 2019-05-08-debugging-beyond-the-debugger.md
242
+site/content/post/ CLOSE_WRITE,CLOSE 2019-05-08-debugging-beyond-the-debugger.md
243
+# ...
244
+{{< / highlight >}}
245
+
246
+Here the `-m` switch makes `inotifywait` monitor the files forever (instead
247
+of exiting on the first modification, which is the normal behaviour) and `r`
248
+makes it recurse into the directory and monitor each file and subdirectory in
249
+there.
250
+
251
+If you want to see what processes currently have a file open, `fuser` is the
252
+go-to tool. For example:
253
+
254
+{{< highlight console >}}
255
+$ fuser -v /
256
+                     USER PID ACCESS COMMAND
257
+/:                   root     kernel mount /
258
+                     chris      2961 .rc.. systemd
259
+                     chris      2986 .r... gdm-x-session
260
+                     chris      2994 .r... dbus-daemon
261
+                     chris      3001 .r... gnome-session-b
262
+# ...
263
+{{< / highlight >}}
264
+
265
+## Honourable mentions
266
+
267
+These aren't really debugging tools, but I feel it's worth mentioning as
268
+they often feature somewhere along the debugging-of-weird-problems journey.
269
+
270
+I've seen some weird and wonderful problems happen
271
+because a disk is full, so a quick `df` early on in the debugging process
272
+never hurts. Some apps may hang, some may corrupt their config, some may
273
+fall over and die; sometimes the manner in which they fail doesn't obviously
274
+point to a disk space issue.
275
+
276
+Another issue that comes up now and then -- especially inside VMs or
277
+other environment that don't have a decent amount of "noise" happening --
278
+is entropy exhaustion. A quick look at `/proc/sys/kernel/random/entropy_avail`
279
+should be enough to confirm that everything is ticking along nicely. If it's
280
+exceedingly low then you may find that anything involving random number
281
+generation stalls (TLS connections for example).

Двоични данни
site/static/res/images/debugging/strace.png Целия файл


Loading…
Отказ
Запис