Last modified: 2013-10-10 17:16:57 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T55475, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 53475 - Dump pl.wikimedia daily
Dump pl.wikimedia daily
Status: NEW
Product: Datasets
Classification: Unclassified
General/Unknown (Other open bugs)
unspecified
All All
: Low enhancement (vote)
: ---
Assigned To: Ariel T. Glenn
http://pl.wikimedia.org
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-08-28 14:30 UTC by Marcin Cieślak
Modified: 2013-10-10 17:16 UTC (History)
1 user (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Marcin Cieślak 2013-08-28 14:30:32 UTC
Because Wikimedia Polska (Polish Wikimedia chapter) is receiving majority of funds via public donations done via Polish tax system, there are certain reporting requirements attached. One of those is to run a publicly-facing website that contains relevant "public sector information" as the information to the general public.

Those "public sector information" websites are plain websites with some certain requirements. There is a requirement that "public sector information database needs to be backed up on separate medium within 24 hours after the last change; if the information is changed more often than once per 24 hours, it is enough to provide a backup once per 24 hours". 

To avoid duplicated effort, we currently explore the possibility to use the existing chapter's wiki, http://pl.wikimedia.org/ as the "public sector information" website (we currently meet 90%+ of requirements). 

I have currently two ideas how to fulfill this "backup requirement":

1) to generate our own dumps using Special:Export or toolserver.org (or its labs equivalent, onces databases are available there),

2) to use Wikimedia XML dump service for this purpose.

I am not aware if Foundation-hosted wikis have some other form of backup available.

Would that be possible to increase frequency of plwikimedia XML dumps to occur once per 24 hours? I know there are currently some space issues, but maybe they will be resolved later.

Currently http://dumps.wikimedia.org/plwikimedia/20130821/plwikimedia-20130821-pages-meta-history.xml.7z is only 25 megabytes.
Comment 1 Ariel T. Glenn 2013-08-28 18:05:19 UTC
Our dump system is designed for dumps on a rolling basis, with the wiki that has waited the longest being run next; there's no efficient way to provision for daily dumps, unfortunately.

If you only need to back up a certain subset of pages, then Special:Export or even the api is your best bet.  That could be done via a cron job with a minimum of fuss.
Comment 2 Sam Reed (reedy) 2013-08-29 01:49:45 UTC
Noting that the database is replicated to multiple database slaves (yes, I know, this isn't different medium), then we have LVS snapshots.

Do these backups need to be made public?

reedy@fenari:~$ time ./sqldump plwikimedia > plwikimedia.sql

real    0m33.417s
user    0m0.484s
sys     0m0.104s
reedy@fenari:~$ du --si plwikimedia.sql
26M     plwikimedia.sql
reedy@fenari:~$
Comment 3 Marcin Cieślak 2013-10-10 17:16:57 UTC
No, don't think so. For public access we still have dumps, which is way better than most government sites. Do you think we could run those unofficial SQL dumps somewhere?

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links