今天在檢查兩台 AAA server 時才發現居然 / 已經到了 94% 了,SNMP server 上滿滿都是從這兩台機器上送來的 Alarm Traps;這可真是糟糕,趕快查一下到底是啥東西佔了這麼多硬碟空間...
不過由於這個目錄下的檔案太多,用 rm 的指令通常會得到像剛剛 du 所看到的錯誤訊息:
[root@KHXAAAS2 ~]# df -h;首先到 / 下面用 du -sh * 來查詢一下目前的 / 的使用狀態:
Filesystem Size Used Avail Use% Mounted on
/dev/sda2 4.0G 3.6G 237M 94% /
/dev/sda1 1012M 40M 921M 5% /boot
none 4.0G 0 4.0G 0% /dev/shm
/dev/sda3 4.0G 41M 3.7G 2% /inactive_root
/dev/sda6 21G 78M 19G 1% /others
/NFSDB/radacct 537G 384M 509G 1% /opt/AAA-6.1.8-20081202/run/radacct
KHXDB:/DB/accounting 537G 384M 509G 1% /NFSDB/radacct
[root@KHXAAAS2 ~]# cd /很明顯的,/opt 跟 /var 佔了很大的空間,/opt 我安裝了幾個大的 AAA service 在裡面,先慢點處理,這個 /var 佔據了 1.5G 實在有點奇怪了,繼續往下查...
[root@KHXAAAS2 /]# du -sh *
5.6M bin
6.4M boot
216K dev
44M etc
32K home
20K inactive_root
8.0K initrd
97M lib
16K lost+found
16K media
8.0K misc
8.0K mnt
247M NFSDB
1.3G opt
2.0M others
du: cannot read directory `proc/857/task': No such file or directory
du: cannot read directory `proc/857/fd': No such file or directory
916M proc
2.0M root
176K rpm
18M sbin
0 selinux
8.0K srv
0 sys
336K tmp
2.0G usr
1.5G var
[root@KHXAAAS2 /]# cd /var看來 spool 的目錄下有問題,繼續...
[root@KHXAAAS2 var]# du -sh *
12K account
2.2M cache
16K crash
28K db
32K empty
16K ftp
300K gdm
41M lib
8.0K local
132K lock
42M log
4.0K mail
24K net-snmp
8.0K nis
8.0K opt
8.0K preserve
276K run
1.5G spool
8.0K tmp
8.0K tux
2.1M www
24K yp
[root@KHXAAAS2 var]# cd spool/
[root@KHXAAAS2 spool]# ls -al
total 6404
drwxr-xr-x 13 root root 4096 Jun 12 2008 .
drwxr-xr-x 23 root root 4096 Jun 12 2008 ..
drwxr-xr-x 2 root root 4096 Jun 12 2008 anacron
drwx------ 3 daemon daemon 4096 Jun 12 2008 at
drwxrwx--- 2 smmsp smmsp 6443008 Dec 24 12:53 clientmqueue
drwx------ 2 root root 4096 Dec 24 11:49 cron
drwx--x--- 3 root sys 4096 Jun 12 2008 cups
drwxr-xr-x 2 root root 4096 Aug 13 2004 lpd
drwxrwxr-x 2 root mail 4096 Aug 13 2004 mail
drwx------ 2 root mail 4096 Jan 23 2007 mqueue
drwxr-xr-x 2 rpm rpm 4096 Aug 2 2007 repackage
drwxr-xr-x 2 root root 4096 Sep 5 2007 up2date
drwxrwxrwt 2 root root 4096 Oct 5 2004 vbox
[root@KHXAAAS2 spool]# du -sh *
32K anacron
20K at
1.5G clientmqueue
16K cron
16K cups
8.0K lpd
8.0K mail
8.0K mqueue
8.0K repackage
8.0K up2date
8.0K vbox
[root@KHXAAAS2 spool]# cd clientmqueue/看起來,這個 /var/spool/clientmqueue 的目錄下的東西太多了,連 du 的指令都下不了,我們用 ls 來看一下好了...
[root@KHXAAAS2 clientmqueue]# du -sh *
-bash: /usr/bin/du: Argument list too long
[root@KHXAAAS2 clientmqueue]# ls看起來真的是太多了,只好先用 Ctrl+C 中斷一下,不過幸好這些檔案產生的原因都是由於有某個用戶 (通常是 root 啦)新增了某個 cron job,而 cron job 裡面程序有輸出內容,這些內容會以 mail 的方式發給剛剛建立 cron job 的用戶,假如這時 sendmail 沒有啟動便會產生這些 queue 檔,基本上並不是什麼重要的檔案,一句話,殺無赦就對了~
dfm5C422X2007728 dfmA8CE2sU014445 dfmAQIX3He008540 dfmBF1O3tM024344 qfm9U9325M025071 qfmAHFp3o7008591 qfmB5MG2ik015455 dfm5D421x8008729 dfmA8CF2ex014725 dfmAQIY3JM008766 dfmBF1P2CK024575 qfm9U942p8025326 qfmAHFq22Z008848 qfmB5MH2dU015683 dfm5D4222i021334 dfmA8Cf2Oe022040 dfmAQIZ2lS008993 dfmBF1p2T6030541 qfm9U953gs025583 qfmAHFQ2RH002131 qfmB5Mh2Zw021688 dfm5DK23Jv003790 dfmA8CG2TR015000 dfmAQJ02LM014708 dfmBF1q22Z030769 qfm9U962PF025835 qfmAHFR22P002445 qfmB5Mi3jw021934 dfm5EK22jL009067 dfmA8Cg3uP022313 dfmAQJ13b6014942 dfmBF1Q2Tk024802 qfm9U972GP026084 qfmAHFr2PM009102 qfmB5MI3sc015914 dfm5OK314c006681 dfmA8Ch291022592 dfmAQJ23VT015171 dfmBF1r3jO030995 qfm9U983lb026334 qfmAHFs2XF009366 qfmB5MJ2hf016139 dfm5PK323O007690 dfmA8CH2so015312 dfmAQJ32Tu015402 dfmBF1R3xM025033 qfm9U992mX026621 qfmAHFS3sD002719 qfmB5Mj3Ia022161 dfm5QK32Ur008747 dfmA8Ci25T022861 dfmAQJ420C015631 dfmBF1s2LO031226 qfm9U9A20p026912 qfmAHFT2AP002970 qfmB5MK2P7016366 dfm5RK335g022157 dfmA8CI2UK015587 dfmAQJ52GN015858 dfmBF1S3st025260 qfm9U9a31w001040 qfmAHFt2ha009618 qfmB5Mk2SN022392 dfm5SK32xd023117 dfmA8Cj23T023137 dfmAQJ62Dm016089 dfmBF1T2Qt025491 qfm9U9B2aF027174 qfmAHFU2RT003229 qfmB5Ml2Ab022619 dfm5UK31gY007333 dfmA8CJ34Q015865 dfmAQJ72B4016316 dfmBF1t2RL031453 qfm9U9b2dR001296 qfmAHFu3dG009875 qfmB5ML2kM016594 dfm61K328o008143 dfmA8Ck20E023413 dfmAQJ82fO016547 dfmBF1U250025718 qfm9U9c22o001541 qfmAHFv30Z010126 qfmB5Mm2Gk022851 dfm62K32BT010023 dfmA8CK3r4016134 dfmAQJ93WG016775 dfmBF1u39e031684 qfm9U9C29A027419 qfmAHFV3pd003482 qfmB5MM2ZI016827 dfm63K32ia011151 dfmA8CL2dQ016417 dfmAQJa2On023080 dfmBF1v2Gh031911 qfm9U9d2Wa001793 qfmAHFW2J4003739 qfmB5Mn2fg023079 dfm64K32E5011955 dfmA8Cl3SM023691 dfmAQJA3DY017002 dfmBF1V2s6025962 qfm9U9D3p9027670 qfmAHFw2XB010386 qfmB5MN3Wi017052 dfm65K31uZ012771 dfmA8CM27o016689 dfmAQJB2YJ017245 dfmBF1w2pd032138 qfm9U9E2uC027923 qfmAHFX2Ep003993 qfmB5Mo3oH023305 dfm67K31mg029851 dfmA8Cm39g023966 dfmAQJb3pV023307 dfmBF1W2tF026190 qfm9U9e2Xh002049 qfmAHFx2kM010637 qfmB5MO3oS017280 dfm68K310Y030656 dfmA8CN2BY016971 dfmAQJC2qs017472 dfmBF1X3iF026417 qfm9U9F2kZ028177 qfmAHFY2Zm004250 qfmB5MP2jh017506 ^C
不過由於這個目錄下的檔案太多,用 rm 的指令通常會得到像剛剛 du 所看到的錯誤訊息:
-bash: /bin/rm: Argument list too long看來直接用 rm 是殺不掉了,沒關係,那就用下面的指令吧:
[root@KHXAAAS2 clientmqueue]# pwd看吧,用了 ls | xargs rm -f 果然輕鬆的就殺掉這一堆的檔案囉,在檢查一下硬碟的使用狀況:
/var/spool/clientmqueue
[root@KHXAAAS2 clientmqueue]# ls | xargs rm -f
[root@KHXAAAS2 clientmqueue]# ls -al
total 6316
drwxrwx--- 2 smmsp smmsp 6443008 Dec 24 13:18 .
drwxr-xr-x 13 root root 4096 Jun 12 2008 ..
[root@KHXAAAS2 clientmqueue]# df -h看吧,果然清空之後,使用空間多了很多...不過這只是治標,不是治本,重點是要記得去把那個兇手,也就是那個 cron job 的最後加上一段 > /dev/null 2>&1,例如:
Filesystem Size Used Avail Use% Mounted on
/dev/sda2 4.0G 2.9G 956M 76% /
/dev/sda1 1012M 40M 921M 5% /boot
none 4.0G 0 4.0G 0% /dev/shm
/dev/sda3 4.0G 41M 3.7G 2% /inactive_root
/dev/sda6 21G 78M 19G 1% /others
/NFSDB/radacct 537G 384M 509G 1% /opt/AAA-6.1.8-20081202/run/radacct
KHXDB:/DB/accounting 537G 384M 509G 1% /NFSDB/radacct
[root@KHXAAAS2 clientmqueue]#
[root@KHXAAAS2 ~]# crontab -l這樣就可以避免問題再次重複發生囉...OK,報告完畢~ (詳全文...)
* * * * * /etc/init.d/snmp_cron.sh > /dev/null 2>&1