Last modified: 2013-10-10 17:16:57 UTC
Because Wikimedia Polska (Polish Wikimedia chapter) is receiving majority of funds via public donations done via Polish tax system, there are certain reporting requirements attached. One of those is to run a publicly-facing website that contains relevant "public sector information" as the information to the general public. Those "public sector information" websites are plain websites with some certain requirements. There is a requirement that "public sector information database needs to be backed up on separate medium within 24 hours after the last change; if the information is changed more often than once per 24 hours, it is enough to provide a backup once per 24 hours". To avoid duplicated effort, we currently explore the possibility to use the existing chapter's wiki, http://pl.wikimedia.org/ as the "public sector information" website (we currently meet 90%+ of requirements). I have currently two ideas how to fulfill this "backup requirement": 1) to generate our own dumps using Special:Export or toolserver.org (or its labs equivalent, onces databases are available there), 2) to use Wikimedia XML dump service for this purpose. I am not aware if Foundation-hosted wikis have some other form of backup available. Would that be possible to increase frequency of plwikimedia XML dumps to occur once per 24 hours? I know there are currently some space issues, but maybe they will be resolved later. Currently http://dumps.wikimedia.org/plwikimedia/20130821/plwikimedia-20130821-pages-meta-history.xml.7z is only 25 megabytes.
Our dump system is designed for dumps on a rolling basis, with the wiki that has waited the longest being run next; there's no efficient way to provision for daily dumps, unfortunately. If you only need to back up a certain subset of pages, then Special:Export or even the api is your best bet. That could be done via a cron job with a minimum of fuss.
Noting that the database is replicated to multiple database slaves (yes, I know, this isn't different medium), then we have LVS snapshots. Do these backups need to be made public? reedy@fenari:~$ time ./sqldump plwikimedia > plwikimedia.sql real 0m33.417s user 0m0.484s sys 0m0.104s reedy@fenari:~$ du --si plwikimedia.sql 26M plwikimedia.sql reedy@fenari:~$
No, don't think so. For public access we still have dumps, which is way better than most government sites. Do you think we could run those unofficial SQL dumps somewhere?