為何 Ethereal 抓到的封包都顯示 "TCP CHECKSUM INCORRECT"

今天收到一封 mail 說 site 的人寄來一個 Ethereal 檔,但裡面由 vbsu2 bge0 到 vscsmp1 bge0 的封包有一堆的 "TCP CHECKSUM INCORRECT",問說是不是我們網路或者機器出了什麼問題了...

我查了一下,這個問題有兩種可能,第一種當然是 機器的 checksum 真的出狀況,不過我檢查了一下兩台機器的網路設定都正常,"netstat -i" 看到的也沒有問題:
vbsu2# netstat -i
Name Mtu Net/Dest Address Ipkts Ierrs Opkts Oerrs Collis Queue
lo0 8232 loopback localhost 185648811 0 185648811 0 0 0
bge0 1500 vbsu2 vbsu2 3852441355 0 2300074072 0 0 0
bge1 1500 vbsu2 vbsu2 1308849274 0 654286221 0 0 0
bge2 1500 vbsu2_bge2 vbsu2_bge2 90997818 0 70916223 0 1 0
bge3 1500 vbsu2_bge3 vbsu2_bge3 66905686 0 68586413 0 1 0
vscsmp1# netstat -i
Name Mtu Net/Dest Address Ipkts Ierrs Opkts Oerrs Collis Queue
lo0 8232 loopback localhost 3009726492 0 3009726492 0 0 0
bge0 1500 vscsmp1 vscsmp1 2974853663 0 1088339819 0 0 0
bge3 1500 vscsmp1_qfe3 vscsmp1_qfe3 407128947 0 3519356 0 0 0
qfe0 1500 vscsmp1 vscsmp1 1227705888 0 831020139 0 0 0
qfe2 1500 172.16.0.128 172.16.0.129 3305973879 0 3289022674 0 0 0
bge2 1500 172.16.1.0 172.16.1.1 3279046154 7 3280696443 0 0 0
qfe1 1500 vscsmp1_qfe1 vscsmp1_qfe1 456708510 0 11583742 0 0 0
再來是檢查 Switch 上面這兩台機器所界接的 port 狀況:
VSW01# sh int Gi2/0/15
GigabitEthernet2/0/15 is up, line protocol is up (connected)
Hardware is Gigabit Ethernet, address is 0014.1c2e.a60f (bia 0014.1c2e.a60f)
MTU 1500 bytes, BW 100000 Kbit, DLY 100 usec,
reliability 255/255, txload 1/255, rxload 1/255
Encapsulation ARPA, loopback not set
Keepalive set (10 sec)
Full-duplex, 100Mb/s, media type is 10/100/1000BaseTX
input flow-control is off, output flow-control is unsupported
ARP type: ARPA, ARP Timeout 04:00:00
Last input never, output 00:00:29, output hang never
Last clearing of "show interface" counters never
Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 0
Queueing strategy: fifo
Output queue: 0/40 (size/max)
5 minute input rate 435000 bits/sec, 66 packets/sec
5 minute output rate 442000 bits/sec, 120 packets/sec
2299440955 packets input, 3364519803 bytes, 0 no buffer
Received 130985710 broadcasts (0 multicast)
0 runts, 0 giants, 0 throttles
0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
0 watchdog, 0 multicast, 0 pause input
0 input packets with dribble condition detected
1888829558 packets output, 1756426108 bytes, 0 underruns
0 output errors, 0 collisions, 4 interface resets
0 babbles, 0 late collision, 0 deferred
0 lost carrier, 0 no carrier, 0 PAUSE output
0 output buffer failures, 0 output buffers swapped out
VSW01# sh int Gi2/0/1
GigabitEthernet2/0/1 is up, line protocol is up (connected)
Hardware is Gigabit Ethernet, address is 0014.1c2e.a601 (bia 0014.1c2e.a601)
MTU 1500 bytes, BW 100000 Kbit, DLY 100 usec,
reliability 255/255, txload 3/255, rxload 6/255
Encapsulation ARPA, loopback not set
Keepalive set (10 sec)
Full-duplex, 100Mb/s, media type is 10/100/1000BaseTX
input flow-control is off, output flow-control is unsupported
ARP type: ARPA, ARP Timeout 04:00:00
Last input never, output 00:00:19, output hang never
Last clearing of "show interface" counters never
Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 0
Queueing strategy: fifo
Output queue: 0/40 (size/max)
5 minute input rate 2603000 bits/sec, 703 packets/sec
5 minute output rate 1218000 bits/sec, 626 packets/sec
1082693515 packets input, 2233319119 bytes, 0 no buffer
Received 554691 broadcasts (0 multicast)
0 runts, 0 giants, 0 throttles
0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
0 watchdog, 0 multicast, 0 pause input
0 input packets with dribble condition detected
1859379099 packets output, 1999426888 bytes, 0 underruns
0 output errors, 0 collisions, 4 interface resets
0 babbles, 0 late collision, 0 deferred
0 lost carrier, 0 no carrier, 0 PAUSE output
0 output buffer failures, 0 output buffers swapped out
VSW01#
從 Switch 上看來並沒有什麼問題,沒有任何的 packet error 出現。所以暫時排除機器本身 checksum error 的可能性。

那麼接下來有可能的就是這兩台 SunFire v240 所用的網卡上的 checksum offload 造成的了,順便再查了一下,我們機器上所用的 ce, bge 網卡都有 support TCP checksum offload,而且 default 居然還是 enable 的。

所以這裡有兩種方式可以不再看到這個 "TCP CHECKSUM INCORRECT" 出現,第一種就是直接到機器上把 /etc/system 加上下面這一行:
set ip:dohwcksum = 0
然後重起機器就可以了。

第二種方式比較鴕鳥一點,就是直接到 Ethereal 裡面去把 Checksum 的檢查選項關掉:
打開 Ethereal,
點選 Edit
> Preference
然後點開 Protocol
選擇 IP,然後把右邊視窗的 "Validate the IP checksum if possible" 選項關掉。
然後就天下太平了。
參考網站:The Ethereal Wiki page

這邊直接把 TCP checksum offloading 的用意直接貼上,大家參考看看:

Many Gigabit network adapters have the "Checksum offload" feature enabled by default. When this is enabled, the adapter performs the time-consuming process of calculating the checksum which appears in both the IP header and in the TCP header of a packet.

For some network drivers, if the checksum calculations are offloaded then the checksum value(s) are set to zero. Ethereal captures each outgoing packet before it goes to the adapter, thus the checksum for the packet was not calculated.
0 Responses