Last modified: 2013-04-25 08:07:01 UTC
On instances we have fuse.glusterfs mount which send information logs to /var/log/glusterfs . For example: hashar@deployment-jobrunner06:/var/log/glusterfs$ ll total 3920 -rw------- 1 root root 31039 Sep 20 22:38 data-home.log -rw-r--r-- 1 root root 3972261 Oct 17 11:08 data-project.log The data-project.log file eventually filled all the disk space. To prevent this, any log file should probably be rotated on a weekly basis at least and purged after sometime.
Raising priority, this killed the beta apaches boxes. /var/log is in the / partition so a full disk cause a lot of issues.
Raising priority again. This has again killed several beta boxes over the last two weeks.
There were several broken links within /data/project/apache which seemed to be making gluster lose its mind. Obviously that's a gluster bug, but I don't have much insight into why it couldn't cope. I removed the broken files that gluster was complaining about, and replaced them via 'git reset --hard'. This appears to have quelled gluster's fears, and I'm pretty sure the actual files are still the way I found them. I have a test box set up doing a rotation test with gluster log files. Presuming that test goes well I'll commit that change in a couple of days.
https://gerrit.wikimedia.org/r/#/c/42796/
Seems to rotate fine now :-] Thanks Andrew!
I think they are rotating improperly on the servers now, though. The log files show 0 as their file size. I'd imagine it's still writing to the old inodes there.
Logs on the server should be fixed by https://gerrit.wikimedia.org/r/#/c/43962/1/files/logrotate/glusterlogs
Looks like like the logrotate does not work anymore: root@deployment-bastion:/var/log/glusterfs# ll -rt total 213956 -rw------- 1 root root 0 Feb 19 06:27 home.log -rw------- 1 root root 0 Feb 19 06:27 data-project.log -rw------- 1 root root 109185374 Mar 1 18:12 data-project.log.1 -rw------- 1 root root 109894887 Mar 1 18:17 home.log.1 I noticed the instance has two logrorate configuration files which are most probably conflicting: /etc/logrotate.d/glusterfs-common /var/log/glusterfs/*.log { daily rotate 7 delaycompress compress notifempty missingok } And the puppet provided one: cat glusterlogs ##################################################################### ### THIS FILE IS MANAGED BY PUPPET ### puppet:///files/logrotate/glusterlogs ##################################################################### # Rotate client logs /var/log/glusterfs/*.log { missingok rotate 3 weekly compress postrotate /usr/bin/killall -HUP glusterfs > /dev/null 2>&1 || true /usr/bin/killall -HUP glusterd > /dev/null 2>&1 || true endscript } # Rotate server brick logs /var/log/glusterfs/bricks/*.log { missingok rotate 3 weekly compress postrotate /usr/bin/killall -HUP glusterfsd > /dev/null 2>&1 || true endscript } The glusterfs-common package does provide /etc/logrotate.d/glustefs-common and it is missing the HUP signaling :(
I have emailed Andrew to find out the status of this bug.
Note to self: I removed the gluster-installed log rotate on testlabs-abogott-dev; now waiting a few days to see if it shapes up
Yep, removing gluster's file helps. So... https://gerrit.wikimedia.org/r/#/c/57426/
Seems to works now, and the rotated files get compressed. Thank you Andrew!