Last modified: 2014-07-28 16:08:39 UTC
https://metrics.wmflabs.org/ in currently (2014-07-28 15:29) very unresponsive (and may appear down). Some pages (like uploading a new cohort) temporary gave me Wikimetrics is experiencing problems errors in the browser. Load is somewhere 25-35-ish. Of the processes, it stands out that there are ~100 queue processes and ~130 mysqld processes.
Assigning to milimetric, as he ist about to kill the relevant jobs.
This is due to recurring reports I ran to test wikimetrics and see if it could handle back-filling lots of data. It back-filled 2 large wikis at a time all the way to 2007. However, when running 5 wikis at a time, the system became unstable and basically everything that could have possibly gone wrong went wrong. Further optimization work is clearly needed. For now, cleaning up after the mess: * killed queue and scheduler * delete from report where user_id = 461; -- this is the WikimetricsBot user * copy relevant queue logs to: /data/project/wikimetrics/backup/bug-68743-logs/ * restart whole system * purge any messages from celery that needed to be purged
also, I deleted the symlinks from the /var/lib/wikimetrics/public/datafiles folder. This leaves the system in a fairly clean state. I left the old report results there as they may be interesting to compare to the manually generated data, or to be used for troubleshooting.