今天工作上碰到一个问题需要知道udp的丢包数据。实际上我不相信能简单地得到udp的丢包精确数据。理由是,网卡负载太高时有些包连网卡都没收到,根本不可能来得及汇报给内核。另外,如果是路由器把udp丢了,那udp的目的端机器当然更不可能感知到有丢包了。
这时,同事说netstat -us (–statistic)可以看到udp的丢包。这里的u选项指的是只展示udp相关的统计,s选项自然表示的是统计了。如果不用u选项,则出显示所有统计数据。下面是我的机器上的输出。
Ip:
203440255187 total packets received
0 forwarded
0 incoming packets discarded
201612429535 incoming packets delivered
1064529177 requests sent out
15 fragments dropped after timeout
3058122492 reassemblies required
1230296840 packets reassembled ok
15 packet reassembles failed
Icmp:
14869220 ICMP messages received
3965512 input ICMP message failed.
ICMP input histogram:
destination unreachable: 6054246
timeout in transit: 687
echo requests: 8570532
echo replies: 243755
12913011 ICMP messages sent
0 ICMP messages failed
ICMP output histogram:
destination unreachable: 4097869
time exceeded: 5
echo request: 244605
echo replies: 8570532
IcmpMsg:
InType0: 243755
InType3: 6054246
InType8: 8570532
InType11: 687
OutType0: 8570532
OutType3: 4097869
OutType8: 244605
OutType11: 5
Tcp:
111681768 active connections openings
4186820 passive connection openings
24951865 failed connection attempts
55064041 connection resets received
275 connections established
1033901799 segments received
1776166765 segments send out
12156205 segments retransmited
6705 bad segments received.
106348033 resets sent
Udp:
198894689917 packets received
472986510 packets to unknown port received.
1146976531 packet receive errors
116750744 packets sent
110301286 receive buffer errors
0 send buffer errors
UdpLite:
TcpExt:
423 invalid SYN cookies received
693 packets pruned from receive queue because of socket buffer overrun
19 packets pruned from receive queue
11309370 TCP sockets finished time wait in fast timer
106 packets rejects in established connections because of timestamp
10210477 delayed acks sent
20811 delayed acks further delayed because of locked socket
Quick ack mode was activated 8856 times
17118697 packets directly queued to recvmsg prequeue.
301717551 bytes directly in process context from backlog
152118951904 bytes directly received in process context from prequeue
104771733 packet headers predicted
15179703 packets header predicted and directly queued to user
218747377 acknowledgments not containing data payload received
102637644 predicted acknowledgments
7293 times recovered from packet loss by selective acknowledgements
Detected reordering 40 times using FACK
Detected reordering 27 times using SACK
Detected reordering 1088 times using time stamp
476 congestion windows fully recovered without slow start
5287 congestion windows partially recovered using Hoe heuristic
236 congestion windows recovered without slow start by DSACK
151673 congestion windows recovered without slow start after partial ack
1 timeouts after reno fast retransmit
4 timeouts after SACK recovery
10540 timeouts in loss state
7232 fast retransmits
649 forward retransmits
1871 retransmits in slow start
11612658 other TCP timeouts
TCPLossProbes: 93185
TCPLossProbeRecovery: 14667
2431 packets collapsed in receive queue due to low socket buffer
8814 DSACKs sent for old packets
3350 DSACKs received
1 DSACKs for out of order packets received
90851 connections reset due to unexpected data
214 connections reset due to early user close
352 connections aborted due to timeout
TCPDSACKIgnoredNoUndo: 1571
TCPSpuriousRTOs: 7
TCPSackShifted: 94
TCPSackMerged: 131
TCPSackShiftFallback: 21183
TCPTimeWaitOverflow: 1876775
TCPRcvCoalesce: 15711184
TCPOFOQueue: 3194
TCPChallengeACK: 2337394
TCPSYNChallenge: 13608
TCPSpuriousRtxHostQueues: 1982796
IpExt:
InBcastPkts: 46443933
InOctets: 44312451521655
OutOctets: 1915626725817
InBcastOctets: 6827280595
喂,要是转载文章。麻烦贴一下出处 ykyi.net 采集爬虫把链接也抓走
这里面确实有两个疑似表示udp的丢包数的数据:
Udp:
1146976531 packet receive errors
110301286 receive buffer errors
于是,当然首先是看linux man page。结果netstat的man手册里居然没有这些字段的介绍。
跟住,问google。没想到,答案就是netstat -s的输出并没有准确的文档(pooly documented)。
这里有个贴子问了相同的问题 https://www.reddit.com/r/linux/comments/706wsa/detailed_explanation_of_all_netstat_statistics/
简单地说,回贴人告诉他,“别用netstat,而是用nstat和ip tools”“这是个不可能的任务,除非看完成吨源代码”。
blablabla …
事实上,看了google到的一些贴子后,还是大概知道了真相。
1146976531 packet receive errors
这一句对应关于UDP的一个RFC标准的文档 中定义的字段 udpInErrors。
“The number of received UDP datagrams that could not be
delivered for reasons other than the lack of an application
at the destination port.”
udpInErrors表示操作系统收到的不能被投递的UDP包,不能投递的原因除了没有应用程序开启了对应的端口。
而这一行
110301286 receive buffer errors
这一行对应 nstat -a -z (下文会再提到nstat)输出中的 UdpRcvbufErrors 字段。我没有找到RFC关于UdpRcvbufErrors字段的定义。
IBM官网上有个网页简单介绍了UdpRcvbufErrors: Number of UDP buffer receive errors. (UDP的缓冲收到错误的次数)。
再结合这篇文章: 为何udp会被丢弃Why do UDP packets get dropped。我非常有信心的认为 UdpRcvbufErrors 表示的是操作系统的内核tcp栈给udp socket分配的缓冲出错(缓冲满)的次数。至于网卡自己的缓冲,和操作系统的缓冲是两回事。网卡的缓冲出错不会被计入这个计数。udp经过的路由的丢包数当然只能够查看对应的路由器的统计数据了。
另外,因为netstat已经被废弃,不建议使用。而是用 nstat 和 ss 这两个新命令代替。
nstat的输出相当于netstat -s的输出。但nstat会输出比netstat -s更多的字段信息,且绝大多数字段名对应到RFC标准中用的字段名。
可任意转载本文,但需要注明出处!!!谢谢
Why do UDP packets get dropped: https://jvns.ca/blog/2016/08/24/find-out-where-youre-dropping-packets/
1: https://tools.ietf.org/html/rfc4113
2: https://www.ibm.com/support/knowledgecenter/STXNRM_3.13.4/coss.doc/deviceapi_response_2.html