Last modified: 2013-10-24 11:09:36 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T54584, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 52584 - Dump files not being updated on /public
Dump files not being updated on /public
Status: RESOLVED FIXED
Product: Wikimedia Labs
Classification: Unclassified
tools (Other open bugs)
unspecified
All All
: Unprioritized normal
: ---
Assigned To: Ariel T. Glenn
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-08-06 20:00 UTC by bgwhite
Modified: 2013-10-24 11:09 UTC (History)
3 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description bgwhite 2013-08-06 20:00:31 UTC
The dump files on  /public/datasets/public/ haven't been updated for awhile.
Comment 1 Tim Landscheidt 2013-08-06 20:10:03 UTC
Spoke with Ariel on IRC:

| <apergos> scfc_de: that's on my side, I'm busy moving stuff around
| <apergos> when the move looks stable I'll restart that
Comment 2 Ariel T. Glenn 2013-08-07 16:51:13 UTC
I've re-enabled the cron job so keep a look out for new files.
Comment 3 Ariel T. Glenn 2013-08-08 06:40:20 UTC
I see last night's rsync is going; it will be a while before it catches up but it will eventually. Closing.
Comment 4 bgwhite 2013-08-13 23:36:28 UTC
It's not exactly going right.  It appears rsync is only syncing directories in which the entire dump has been completed.  Therefore, enwiki's 2013-08-05 dump, which is still running, is not showing up even though the main dump files were available over a week ago.  Same goes for dewiki's 2013-07-27 dump as the main dump files were available two weeks ago. The majority of dumps run twice a month.

*pages-articles.xml.bz2 is the main dump file that includes the current revision for everything in article, template, file and WP space.
*pages-meta-current.xml.bz2 is the main dump file that includes the current revision for everything.
Comment 5 Ariel T. Glenn 2013-08-14 04:38:17 UTC
This is how the script has always run; we make sure the dump has completed and is known to be good before copying files over.
Comment 6 bgwhite 2013-08-14 05:47:37 UTC
Arggh.  So the dump files on /public are now worthless.  

If I run with dewiki and enwiki on labs, it will take upto 30 days after *pages-articles.xml.bz2 have been posted for checkwiki results to be known.  It takes checkwiki upto 7 days for an enwiki run to be done on labs, last one took 6 days.  However, if I download the dumpfile myself and run it on my old laptop, I can have the results out in 15 hours.

This is frustrating.  THE major complaint of checkwiki on toolserver has been untimeliness of results from dump files. I was hoping to get away from running different language runs on my laptop to hand out to people.  

For some reason, people running bots like to get their work done before the next dump has started.  Bot owner fixes an article,  but article still shows up in next dump run as not fixed. Article changes slightly in the meantime.  Bot runs again on the article, problem was fixed, but bot does a "cleanup" due to article changing.  Too many of those and bot gets banned.  So, with current setup of labs, checkwiki is now dead to larger languages.
Comment 7 Ariel T. Glenn 2013-08-14 05:59:13 UTC
enwiki runs once a month.  The run itself takes about 11 days.  Even if it took two whole days of copying for the new files to arrive, this would still be plenty of time for post-processing and bot runs before the next dumps get produced.

I don't know how long dewiki will take to run now that we have it going out of the new datacenter; we haven't had a new complete run over there yet but it will be less than it used to be, since we've parallelized the run.

If processing the xml files is that much quicker on your laptop, that's a different issue than the dumps no longer being copied to the shared gluster mount.  Please open a new bug for that.
Comment 8 bgwhite 2013-08-14 07:43:40 UTC
9 days does not give enough time when it takes longer than that to get the bots done. You don't just hit enter and the bots do the rest.  That is not assuming one is not on vacation, sick, weekend, etc.  The bots don't catch everything, so what is left has to be done manually and if they don't get done, bot run on them next go around.  I HAVE NO CHOICE but to give up on labs for larger languageS.  Notice plural, this also affects other languages and said that in last message.

Taking 11 days, sitting and doing nothing IS THE MAIN PROBLEM.
Saying "This is how the script has always run" is a cop out answer.
Why wait to copy files over on larger languages?  I obviously haven't seen every case when bad dumps appear. But, the only cases I've seen, the file won't be written. Are there any other common scenarios of bad dumps?


Oh, I brought up why the queue being so slow. Response I got back was my program was horribly written and change from Perl. Great &#($? response.  Yea, that explains why it takes upto 7 days to run vs 15 hours on a three-year old laptop running under a one cpu VM. It was also brought up at Wikimania and the response was labs might be adding one or two computers to the queue.  Ahh, not enough funding.  The #2 problem of sysadmins behind users... as my t-shirt says "SELECT * FROM users WHERE clue > 0; 0 rows returned".
Comment 9 Ariel T. Glenn 2013-08-14 09:12:48 UTC
We don't remove a good past dump from the gluster mount (in order to copy a new good one) unless we know the new one is good.  This means we wait til completion.  That's a deliberate esign decision.  If you want to have access to the most recent files whether or not they are good (perhaps in another directory), please open a bug for that.  This bug was about the copy not happening at all, when has been fixed.  Thank you.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links