Last modified: 2011-11-08 17:14:19 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T29653, the corresponding Phabricator task for complete and up-to-date bug report information.

Bug 27653 - Provide dumps using bittorrent


Summary:	Provide dumps using bittorrent

Status:	RESOLVED WONTFIX

Product:	Datasets
Classification:	Unclassified
Component:	General/Unknown (Other open bugs)
Version:	unspecified
Hardware:	All All

Importance:	Normal enhancement (vote)
Target Milestone:	---
Assigned To:	Ariel T. Glenn

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:
	Show dependency tree / graph

Reported:	2011-02-23 08:43 UTC by Adam Wight
Modified:	2011-11-08 17:14 UTC (History)
CC List:	4 users (show)

See Also:
Web browser:	---
Mobile Platform:	---
Assignee Huggle Beta Tester:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Description Adam Wight 2011-02-23 08:43:03 UTC

Without citing stats, these huge files demand multisourcing, either over HTTP using mirrors, or even better, using bittorrent.  I hear this will dramatically improve bandwidth demand.

bittorrent is particularly nice, because files can be selectively downloaded from within the bundle.  You could provide a single torrent containing all outputs from a particular wiki snapshot date.

Comment 1 Sam Reed (reedy) 2011-02-23 10:55:57 UTC

http://meta.wikimedia.org/wiki/Mirroring_Wikimedia_project_XML_dumps

As for the BitTorrent part, that would be somewhat feasible, having the tracker on WMF, but seeding from might be more of an issue

Comment 2 Adam Wight 2011-02-23 17:29:50 UTC

This is not an area I know much about, but what is the objection to seeding?  I imagine you will get the maximum benefit by using an open tracker which is already tied into search services.  And if your mirrors agree to use this protocol, they would provide a natural pool of seeders, even before they have finished replicating.

One major down side of the torrent idea is that it would be inefficient to offer incomplete dumps, because the .torrent would have to be changed as data grows.  Unless there is a workaround, it would only make sense to wait until the dump is completed--by which point the data has aged...

Comment 3 Antoine "hashar" Musso (WMF) 2011-02-23 22:04:30 UTC

Rephrased subject.

Comment 4 Ariel T. Glenn 2011-09-18 06:58:41 UTC

Once the dump is available there is nothing preventing someone in the community or several someones from setting up a torrent of these files, and I encourage folks to do so (as has been done a number of times in the past).  

Waiting til the dump is completed before adding it to a torrent is a good idea in all cases; only then are we sure that the files are intact and worth your while to download.

Folks that have talked with us about setting up a mirror site have expressed a preference for rsync, and that works best for us for distributing a subset of the dumps for mirroring.

Comment 5 Antoine "hashar" Musso (WMF) 2011-11-08 17:14:19 UTC

Per Ariel comment I am closing this bug. Either set your own torrent or ask a rsync access.

Wikimedia Bugzilla is closed!

Search

Personal tools

Navigation

Links