遇到了一个使用 glance 命令行工具下载镜像卡住的问题,镜像下载一部分之后就莫名其妙的卡住了,通过分析之后发现了两个问题。
问题一
一个问题和 glance image-download 的选项 --progress
有关,不知为何,客户端机器的两个终端 SecureCRT 和 MobaXterm 往终端写时都存在问题,由于 glance 是一个 Python 单线程程序,导致写终端卡住:
# ps aux | grep image-download
root 10481 6.3 0.0 258404 29304 pts/8 S+ 19:02 0:11 /usr/bin/python2 /usr/bin/glance image-download --file i1 5d02d508-44cd-429f-8c63-3e44f64f5c75 --progress
root 22708 0.0 0.0 112664 968 pts/7 S+ 19:05 0:00 grep --color=auto image-download
# pstack 10481
#0 0x00007f3906e30840 in __write_nocancel () from /lib64/libc.so.6
#1 0x00007f3906dbcfb3 in _IO_new_file_write () from /lib64/libc.so.6
#2 0x00007f3906dbe41c in __GI__IO_do_write () from /lib64/libc.so.6
#3 0x00007f3906dbc810 in __GI__IO_file_sync () from /lib64/libc.so.6
#4 0x00007f3906db20a2 in fflush () from /lib64/libc.so.6
#5 0x00007f3907a984c1 in file_flush () from /lib64/libpython2.7.so.1.0
#6 0x00007f3907b1493c in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#7 0x00007f3907b1457d in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#8 0x00007f3907b16efd in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#9 0x00007f3907aa0858 in function_call () from /lib64/libpython2.7.so.1.0
#10 0x00007f3907a7b9a3 in PyObject_Call () from /lib64/libpython2.7.so.1.0
#11 0x00007f3907a8a995 in instancemethod_call () from /lib64/libpython2.7.so.1.0
#12 0x00007f3907a7b9a3 in PyObject_Call () from /lib64/libpython2.7.so.1.0
#13 0x00007f3907ad1e03 in call_method () from /lib64/libpython2.7.so.1.0
#14 0x00007f3907b0fb01 in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#15 0x00007f3907b16efd in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#16 0x00007f3907b143fc in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#17 0x00007f3907b1457d in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#18 0x00007f3907b16efd in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#19 0x00007f3907b143fc in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#20 0x00007f3907b16efd in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#21 0x00007f3907b143fc in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#22 0x00007f3907b16efd in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#23 0x00007f3907b17002 in PyEval_EvalCode () from /lib64/libpython2.7.so.1.0
#24 0x00007f3907b3043f in run_mod () from /lib64/libpython2.7.so.1.0
#25 0x00007f3907b315fe in PyRun_FileExFlags () from /lib64/libpython2.7.so.1.0
#26 0x00007f3907b32889 in PyRun_SimpleFileExFlags () from /lib64/libpython2.7.so.1.0
#27 0x00007f3907b43a3f in Py_Main () from /lib64/libpython2.7.so.1.0
#28 0x00007f3906d68c05 in __libc_start_main () from /lib64/libc.so.6
#29 0x000000000040071e in _start ()
直接导致 socket 数据接收卡住,直接表现就是发送端的发送报文积压:
# ss -ntp | sort -nk 3 | tail
...
ESTAB 0 3647520 ::ffff:192.168.63.245:8262 ::ffff:192.168.63.250:64035 users:(("glance-api",pid=66095,fd=5))
# ss -ntpioem 'src ::ffff:192.168.63.245:8262 && dst ::ffff:192.168.63.250:64035'
State Recv-Q Send-Q Local Address:Port Peer Address:Port
ESTAB 0 3647520 ::ffff:192.168.63.245:8262 ::ffff:192.168.63.250:64035 users:(("glance-api",pid=66095,fd=5)) timer:(persist,32sec,0) uid:161 ino:3892928328 sk:ffff880b6aa1e300 <->
skmem:(r0,rb1061488,t0,tb4194304,f3248,w3695440,o0,bl0) ts sack cubic wscale:7,7 rto:218 backoff:8 rtt:17.091/21.364 ato:40 mss:65468 cwnd:25 ssthresh:58 send 766.1Mbps lastsnd:78721 lastrcv:110933 lastack:23017 pacing_rate 1532.2Mbps rcv_space:43690
# ss -ntpioem 'src ::ffff:192.168.63.245:8262 && dst ::ffff:192.168.63.250:64035'
State Recv-Q Send-Q Local Address:Port Peer Address:Port
ESTAB 0 3647520 ::ffff:192.168.63.245:8262 ::ffff:192.168.63.250:64035 users:(("glance-api",pid=66095,fd=5)) timer:(persist,6.431ms,0) uid:161 ino:3892928328 sk:ffff880b6aa1e300 <->
skmem:(r0,rb1061488,t0,tb4194304,f3248,w3695440,o0,bl0) ts sack cubic wscale:7,7 rto:218 backoff:8 rtt:17.091/21.364 ato:40 mss:65468 cwnd:25 ssthresh:58 send 766.1Mbps lastsnd:105081 lastrcv:137293 lastack:49377 pacing_rate 1532.2Mbps rcv_space:43690
# ss -ntpioem 'src ::ffff:192.168.63.245:8262 && dst ::ffff:192.168.63.250:64035'
State Recv-Q Send-Q Local Address:Port Peer Address:Port
ESTAB 0 3647520 ::ffff:192.168.63.245:8262 ::ffff:192.168.63.250:64035 users:(("glance-api",pid=66095,fd=5)) timer:(persist,1min44sec,0) uid:161 ino:3892928328 sk:ffff880b6aa1e300 <->
skmem:(r0,rb1061488,t0,tb4194304,f3248,w3695440,o0,bl0) ts sack cubic wscale:7,7 rto:218 backoff:9 rtt:17.091/21.364 ato:40 mss:65468 cwnd:25 ssthresh:58 send 766.1Mbps lastsnd:118697 lastrcv:150909 lastack:6993 pacing_rate 1532.2Mbps rcv_space:43690
由于 TCP 没有接收窗口,导致发送端持续处于零窗口探测阶段:
当然,同时通过 ss 查看客户端的接受队列,也能发现存在大量的报文积压。
显然,简单的解决办法就是去掉 --progress
选项。
问题二
还一个问题看上去与网络有关,通过 tcpdump 进行抓包可以发现 glance-api 发往 Ceph OSD 的重传报文无法顺利送达 OSD:
注意:如果客户端和服务端在同一台机器上,即使 IP 地址绑定在不同的网口上,但使用 tcpdump 时需要使用 -i
选项指定 lo
网口进行抓包。
这里最大的疑问在于,客户端的重传报文在服务端无法得到应答,但是最后服务端那个主动断链的 FIN
报文却在客户端能得到正确的响应,通过多次抓包得到的信息都是这样:
通过 ss 查看最后一个抓包所涉及的 socket 信息:
客户端(glance-api)
# ss -ntpioem 'src 192.168.66.246:52826'
State Recv-Q Send-Q Local Address:Port Peer Address:Port
ESTAB 0 6022 192.168.66.246:52826 192.168.87.4:6800 users:(("glance-api",pid=66079,fd=29)) timer:(on,1min1sec,11) uid:161 ino:612952361 sk:ffff88140c232d00 <->
skmem:(r0,rb6291456,t0,tb87040,f3511,w8777,o0,bl0) ts sack cubic wscale:7,7 rto:120000 backoff:11 rtt:0.112/0.028 ato:40 mss:1448 cwnd:1 ssthresh:7 send 103.4Mbps lastsnd:58688 lastrcv:385026 lastack:385026 pacing_rate 2057.1Mbps unacked:1 retrans:1/11 lost:1 rcv_rtt:1.875 rcv_space:3071208
# ss -ntpioem 'src 192.168.66.246:52826'
State Recv-Q Send-Q Local Address:Port Peer Address:Port
ESTAB 0 6922 192.168.66.246:52826 192.168.87.4:6800 users:(("glance-api",pid=66079,fd=29)) timer:(on,1.863ms,11) uid:161 ino:612952361 sk:ffff88140c232d00 <->
skmem:(r0,rb6291456,t0,tb87040,f2611,w9677,o0,bl0) ts sack cubic wscale:7,7 rto:120000 backoff:11 rtt:0.112/0.028 ato:40 mss:1448 cwnd:1 ssthresh:7 send 103.4Mbps lastsnd:118137 lastrcv:444475 lastack:444475 pacing_rate 2057.1Mbps unacked:1 retrans:1/11 lost:1 rcv_rtt:1.875 rcv_space:3071208
# ss -ntpioem 'src 192.168.66.246:52826'
State Recv-Q Send-Q Local Address:Port Peer Address:Port
ESTAB 0 7147 192.168.66.246:52826 192.168.87.4:6800 users:(("glance-api",pid=66079,fd=29)) timer:(on,1min46sec,12) uid:161 ino:612952361 sk:ffff88140c232d00 <->
skmem:(r0,rb6291456,t0,tb87040,f2386,w9902,o0,bl0) ts sack cubic wscale:7,7 rto:120000 backoff:12 rtt:0.112/0.028 ato:40 mss:1448 cwnd:1 ssthresh:7 send 103.4Mbps lastsnd:13925 lastrcv:460583 lastack:460583 pacing_rate 2057.1Mbps unacked:1 retrans:1/12 lost:1 rcv_rtt:1.875 rcv_space:3071208
# ss -ntpioem 'src 192.168.66.246:52826'
State Recv-Q Send-Q Local Address:Port Peer Address:Port
ESTAB 0 10372 192.168.66.246:52826 192.168.87.4:6800 users:(("glance-api",pid=66079,fd=29)) timer:(on,13sec,13) uid:161 ino:612952361 sk:ffff88140c232d00 <->
skmem:(r0,rb6291456,t0,tb87040,f3257,w13127,o0,bl0) ts sack cubic wscale:7,7 rto:120000 backoff:13 rtt:0.112/0.028 ato:40 mss:1448 cwnd:1 ssthresh:7 send 103.4Mbps lastsnd:106140 lastrcv:673118 lastack:673118 pacing_rate 2057.1Mbps unacked:1 retrans:1/13 lost:1 rcv_rtt:1.875 rcv_space:3071208
# ss -ntpioem 'src 192.168.66.246:52826'
State Recv-Q Send-Q Local Address:Port Peer Address:Port
ESTAB 0 11047 192.168.66.246:52826 192.168.87.4:6800 users:(("glance-api",pid=66079,fd=29)) timer:(on,1min24sec,14) uid:161 ino:612952361 sk:ffff88140c232d00 <->
skmem:(r0,rb6291456,t0,tb87040,f2582,w13802,o0,bl0) ts sack cubic wscale:7,7 rto:120000 backoff:14 rtt:0.112/0.028 ato:40 mss:1448 cwnd:1 ssthresh:7 send 103.4Mbps lastsnd:35333 lastrcv:722631 lastack:722631 pacing_rate 2057.1Mbps unacked:1 retrans:1/14 lost:1 rcv_rtt:1.875 rcv_space:3071208
# ss -ntpioem 'src 192.168.66.246:52826'
State Recv-Q Send-Q Local Address:Port Peer Address:Port
ESTAB 0 13897 192.168.66.246:52826 192.168.87.4:6800 users:(("glance-api",pid=66079,fd=29)) timer:(on,15sec,15) uid:161 ino:612952361 sk:ffff88140c232d00 <->
skmem:(r0,rb6291456,t0,tb87040,f3828,w16652,o0,bl0) ts sack cubic wscale:7,7 rto:120000 backoff:15 rtt:0.112/0.028 ato:40 mss:1448 cwnd:1 ssthresh:7 send 103.4Mbps lastsnd:104570 lastrcv:912188 lastack:912188 pacing_rate 2057.1Mbps unacked:1 retrans:1/15 lost:1 rcv_rtt:1.875 rcv_space:3071208
服务端(Ceph OSD)
# ss -ntpioem 'dst 192.168.66.246:52826'
State Recv-Q Send-Q Local Address:Port Peer Address:Port
ESTAB 0 0 192.168.87.4:6800 192.168.66.246:52826 users:(("ceph-osd",pid=7501,fd=153)) uid:167 ino:3025637930 sk:ffff880fd1450000 <->
skmem:(r0,rb369280,t0,tb4194304,f0,w0,o0,bl0) ts sack cubic wscale:7,7 rto:201 rtt:0.881/0.176 ato:40 mss:1448 cwnd:1549 ssthresh:1536 send 20367.3Mbps lastsnd:423659 lastrcv:423742 lastack:423658 retrans:0/723 rcv_rtt:35472 rcv_space:29341
# ss -ntpioem 'dst 192.168.66.246:52826'
State Recv-Q Send-Q Local Address:Port Peer Address:Port
ESTAB 0 0 192.168.87.4:6800 192.168.66.246:52826 users:(("ceph-osd",pid=7501,fd=153)) uid:167 ino:3025637930 sk:ffff880fd1450000 <->
skmem:(r0,rb369280,t0,tb4194304,f0,w0,o0,bl0) ts sack cubic wscale:7,7 rto:201 rtt:0.881/0.176 ato:40 mss:1448 cwnd:1549 ssthresh:1536 send 20367.3Mbps lastsnd:489181 lastrcv:489264 lastack:489180 retrans:0/723 rcv_rtt:35472 rcv_space:29341
# ss -ntpioem 'dst 192.168.66.246:52826'
State Recv-Q Send-Q Local Address:Port Peer Address:Port
ESTAB 0 0 192.168.87.4:6800 192.168.66.246:52826 users:(("ceph-osd",pid=7501,fd=153)) uid:167 ino:3025637930 sk:ffff880fd1450000 <->
skmem:(r0,rb369280,t0,tb4194304,f0,w0,o0,bl0) ts sack cubic wscale:7,7 rto:201 rtt:0.881/0.176 ato:40 mss:1448 cwnd:1549 ssthresh:1536 send 20367.3Mbps lastsnd:677224 lastrcv:677307 lastack:677223 retrans:0/723 rcv_rtt:35472 rcv_space:29341
# ss -ntpioem 'dst 192.168.66.246:52826'
State Recv-Q Send-Q Local Address:Port Peer Address:Port
ESTAB 0 0 192.168.87.4:6800 192.168.66.246:52826 users:(("ceph-osd",pid=7501,fd=153)) uid:167 ino:3025637930 sk:ffff880fd1450000 <->
skmem:(r0,rb369280,t0,tb4194304,f0,w0,o0,bl0) ts sack cubic wscale:7,7 rto:201 rtt:0.881/0.176 ato:40 mss:1448 cwnd:1549 ssthresh:1536 send 20367.3Mbps lastsnd:727560 lastrcv:727643 lastack:727559 retrans:0/723 rcv_rtt:35472 rcv_space:29341
# ss -ntpioem 'dst 192.168.66.246:52826'
State Recv-Q Send-Q Local Address:Port Peer Address:Port
ESTAB 0 0 192.168.87.4:6800 192.168.66.246:52826 users:(("ceph-osd",pid=7501,fd=153)) uid:167 ino:3025637930 sk:ffff880fd1450000 <->
skmem:(r0,rb369280,t0,tb4194304,f0,w0,o0,bl0) ts sack cubic wscale:7,7 rto:201 rtt:0.881/0.176 ato:40 mss:1448 cwnd:1549 ssthresh:1536 send 20367.3Mbps lastsnd:918597 lastrcv:918680 lastack:918596 retrans:0/723 rcv_rtt:35472 rcv_space:29341
显然,客户端一直在尝试重传,但客户端始终未收到任何 TCP 报文。
这里仍是一个悬案,临时的规避手段就是设置 /proc/sys/net/ipv4/tcp_retries2
系统参数,或者 TCP_USER_TIMEOUT
socket 选项。
参考资料
Manpage of PCAP-FILTER
https://www.tcpdump.org/manpages/pcap-filter.7.html
Linux TCP_RTO_MIN, TCP_RTO_MAX and the tcp_retries2 sysctl
https://pracucci.com/linux-tcp-rto-min-max-and-tcp-retries2.html
The “Out of socket memory” error
https://blog.tsunanet.net/2011/03/out-of-socket-memory.html
How to drop 10 million packets per second
https://blog.cloudflare.com/how-to-drop-10-million-packets/
Problem with target “NOTRACK”
https://github.com/ldx/python-iptables/issues/204
nf_conntrack: table full, dropping packet
Caveats about Linux connection tracking and high traffic servers
https://i-admin.blogspot.com/2014/02/caveats-about-linux-connection-tracking.html
最后修改于 2019-12-29