Last modified: 2014-07-28 16:08:39 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T70743, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 68743 - metrics.wikimedia.org (Wikimetrics) unresponsive
metrics.wikimedia.org (Wikimetrics) unresponsive
Status: RESOLVED FIXED
Product: Analytics
Classification: Unclassified
Wikimetrics (Other open bugs)
unspecified
All All
: Unprioritized normal
: ---
Assigned To: Dan Andreescu
u=Community c=Wikimetrics p=0 s=2014-...
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-07-28 15:38 UTC by christian
Modified: 2014-07-28 16:08 UTC (History)
7 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description christian 2014-07-28 15:38:12 UTC
https://metrics.wmflabs.org/

in currently (2014-07-28 15:29) very unresponsive (and may appear down).
Some pages (like uploading a new cohort) temporary gave me

  Wikimetrics is experiencing problems

errors in the browser.

Load is somewhere 25-35-ish.

Of the processes, it stands out that there are ~100 queue processes and
~130 mysqld processes.
Comment 1 christian 2014-07-28 15:53:19 UTC
Assigning to milimetric, as he ist about to kill the relevant jobs.
Comment 2 Dan Andreescu 2014-07-28 16:03:03 UTC
This is due to recurring reports I ran to test wikimetrics and see if it could handle back-filling lots of data.  It back-filled 2 large wikis at a time all the way to 2007.  However, when running 5 wikis at a time, the system became unstable and basically everything that could have possibly gone wrong went wrong.  Further optimization work is clearly needed.  For now, cleaning up after the mess:

* killed queue and scheduler
* delete from report where user_id = 461; -- this is the WikimetricsBot user
* copy relevant queue logs to: /data/project/wikimetrics/backup/bug-68743-logs/
* restart whole system
* purge any messages from celery that needed to be purged
Comment 3 Dan Andreescu 2014-07-28 16:08:39 UTC
also, I deleted the symlinks from the /var/lib/wikimetrics/public/datafiles folder.  This leaves the system in a fairly clean state.  I left the old report results there as they may be interesting to compare to the manually generated data, or to be used for troubleshooting.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links