Last modified: 2014-10-18 21:18:18 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T59739, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 57739 - all-titles file doesn't include namespace prefix
all-titles file doesn't include namespace prefix
Status: NEW
Product: Datasets
Classification: Unclassified
General/Unknown (Other open bugs)
unspecified
All All
: Low normal (vote)
: ---
Assigned To: Ariel T. Glenn
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-11-29 13:23 UTC by Xavier Combelle
Modified: 2014-10-18 21:18 UTC (History)
2 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Xavier Combelle 2013-11-29 13:23:45 UTC
the file *-all-titles.gz seems not include namespace prefix 
for example 

http://dumps.wikimedia.org/commonswiki/20131121/commonswiki-20131121-all-titles.gz

It seems that in the curent process (currently: http://git.wikimedia.org/blob/operations%2Fdumps.git/11e9b23b4bc76bf3d89e1fb32348c7a11079bd55/xmldumps-backup%2Fworker.py#L4043 )
it's a simple query
query="select page_title from page;"

and the namespace is not in page_title

it makes this file nearly useless as one is unable to make the difference between a title in the main namespace and a title in an other namespace or between two different namespace
Comment 1 Xavier Combelle 2013-11-29 13:25:52 UTC
a work around is to use stub-meta-current which contain the prefixed titles but this file is quite bigger as it contains extra informations

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links