Last modified: 2011-11-08 17:14:19 UTC
Without citing stats, these huge files demand multisourcing, either over HTTP using mirrors, or even better, using bittorrent. I hear this will dramatically improve bandwidth demand. bittorrent is particularly nice, because files can be selectively downloaded from within the bundle. You could provide a single torrent containing all outputs from a particular wiki snapshot date.
http://meta.wikimedia.org/wiki/Mirroring_Wikimedia_project_XML_dumps As for the BitTorrent part, that would be somewhat feasible, having the tracker on WMF, but seeding from might be more of an issue
This is not an area I know much about, but what is the objection to seeding? I imagine you will get the maximum benefit by using an open tracker which is already tied into search services. And if your mirrors agree to use this protocol, they would provide a natural pool of seeders, even before they have finished replicating. One major down side of the torrent idea is that it would be inefficient to offer incomplete dumps, because the .torrent would have to be changed as data grows. Unless there is a workaround, it would only make sense to wait until the dump is completed--by which point the data has aged...
Rephrased subject.
Once the dump is available there is nothing preventing someone in the community or several someones from setting up a torrent of these files, and I encourage folks to do so (as has been done a number of times in the past). Waiting til the dump is completed before adding it to a torrent is a good idea in all cases; only then are we sure that the files are intact and worth your while to download. Folks that have talked with us about setting up a mirror site have expressed a preference for rsync, and that works best for us for distributing a subset of the dumps for mirroring.
Per Ariel comment I am closing this bug. Either set your own torrent or ask a rsync access.