Last modified: 2014-06-16 13:06:40 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T68661, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 66661 - `mwxml2sql' fails to process `enwikinews-20140605-pages-meta-current.xml.bz2' when it encounters `<ns>90</ns>'
`mwxml2sql' fails to process `enwikinews-20140605-pages-meta-current.xml.bz2...
Status: NEW
Product: Datasets
Classification: Unclassified
General/Unknown (Other open bugs)
unspecified
PC Linux
: Unprioritized major (vote)
: ---
Assigned To: Ariel T. Glenn
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-06-16 13:06 UTC by wp mirror
Modified: 2014-06-16 13:06 UTC (History)
1 user (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description wp mirror 2014-06-16 13:06:40 UTC
0) Summary

I tried to build a mirror of `enwikinews' using `mwxml2sql'. This failed whenever `mwxml2sql' encountered a page from namespace 90 (Thread).

I tried again using `maintenance/importDump.php'. This worked better. However, it appears that `importDump.php' ignores namespace 90, because no such pages are later found in the `enwikinews.page' database table.

1) Dataset

`enwikinews-20140605-pages-meta-current.xml.bz2'

2) Error messages

WHINE: (155323) no end page tag

When I divide the XML data dump into smaller files of say 1000 pages, I can find many more such errors.

3) Pages that cause errors

  <page>
    <title>Thread:Comments:Chip and PIN 'not fit for purpose', says Cambridge r\
esearcher/Those in positions of power shirking responsibility and lying?</title\
>
    <ns>90</ns>
    <id>155323</id>
<DiscussionThreading>
        <ThreadSubject>Those in positions of power shirking responsibility and \
lying?</ThreadSubject>
        <ThreadPage>Comments:Chip and PIN 'not fit for purpose', says Cambridge\
 researcher</ThreadPage>
    <ns>90</ns>
    <id>155323</id>
<DiscussionThreading>
        <ThreadSubject>Those in positions of power shirking responsibility and \
lying?</ThreadSubject>
        <ThreadPage>Comments:Chip and PIN 'not fit for purpose', says Cambridge\
 researcher</ThreadPage>
        <ThreadID>92</ThreadID>
        <ThreadAuthor>70.31.58.181</ThreadAuthor>
        <ThreadEditStatus>has-reply</ThreadEditStatus>
        <ThreadType>normal</ThreadType>
        <ThreadSignature>[[Special:Contributions/70.31.58.181|70.31.58.181]] ([\
[User talk:70.31.58.181|talk]])</ThreadSignature>
</DiscussionThreading>
    <revision>
      <id>958267</id>
      <timestamp>2010-02-15T04:04:56Z</timestamp>
      <contributor>
        <ip>70.31.58.181</ip>
      </contributor>
      <comment>New thread: Those in positions of power shirking responsibility \
and lying?</comment>
      <text xml:space="preserve">&quot;All the banks are lying. They are malici\
ously and wilfully deceiving the customer [...] The system is not fit for purpo\
se.&quot;                                                                       
I'm so surprised that I've apparently transcended a serious remark and instead \
am being sarcastic.  Incidentally, only part of that sentence was sarcastic.</t\
ext>
      <sha1>rjidk12i4hv2mxia3a8qq620rlc7lok</sha1>
      <model>wikitext</model>
      <format>text/x-wiki</format>
    </revision>
  </page>

4) Namespace of pages that cause errors

      <namespace key="90" case="first-letter">Thread</namespace>

5) Use of `importDump.php'

Apparently `importDump.php' ignores namespace 90.

mysql> select page_id,page_namespace,page_title from enwikinews.page where page_id=155323;
Empty set (0.00 sec)
mysql> select page_id,page_namespace,page_title from enwikinews.page where page_namespace=90;
Empty set (0.00 sec)

Sincerely Yours,
Kent

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links