Last modified: 2014-07-25 07:21:15 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T57259, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 55259 - xmlreader.py fails a lot
xmlreader.py fails a lot
Status: NEW
Product: Pywikibot
Classification: Unclassified
xmlreader.py (Other open bugs)
unspecified
All All
: Unprioritized normal
: ---
Assigned To: Pywikipedia bugs
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-10-05 04:52 UTC by Kunal Mehta (Legoktm)
Modified: 2014-07-25 07:21 UTC (History)
2 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Kunal Mehta (Legoktm) 2013-10-05 04:52:49 UTC
Originally from: http://sourceforge.net/p/pywikipediabot/bugs/1245/
Reported by: emijrp
Created on: 2010-10-03 13:51:00
Subject: xmlreader.py fails a lot
Original description:
Hi all;

I think that there is an error in xmlreader.py. When parsing a full revision XML \(in this case\[1\]\), using this code\[2\] \(look at the try-catch, it writes in console when it fails\) I get correctly username, timestamp and revisionid, but sometimes, the page title and the page id are None or empty string.

The first error is:
\['', None, 'QuartierLatin1968', '2004-10-10T04:24:14Z', '4267'\] \#look the empty string for the title, and the None for pageid

But if we do:
7za e -bd -so kwwiki-20100926-pages-meta-history.xml.7z 2>/dev/null | egrep -i '2004-10-10T04::14Z' -C20

We get this\[3\], which is OK, the page title and the page id are available in the XML, but not correctly parsed. And this is not the only page title and page it that fails.

Perhaps I have missed something, because I'm learning to parsing XML. Sorry in that case.

Regards,
emijrp

\[1\] http://download.wikimedia.org/kwwiki/20100926/kwwiki-20100926-pages-meta-history.xml.7z
\[2\] http://pastebin.ca/1951930
\[3\] http://pastebin.ca/1951937

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links