Last modified: 2013-08-19 14:01:07 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T34439, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 32439 - java.sql.SQLException: Incorrect string value: '\xF0\x9D\x9E\xB1_\xF0...' for column 'page_title' at row 9
java.sql.SQLException: Incorrect string value: '\xF0\x9D\x9E\xB1_\xF0...' for...
Status: RESOLVED WORKSFORME
Product: Utilities
Classification: Unclassified
mwdumper (Other open bugs)
unspecified
PC Windows 7
: Normal blocker (vote)
: ---
Assigned To: Brion Vibber
aklapper-moreinfo
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-11-16 08:38 UTC by eyal
Modified: 2013-08-19 14:01 UTC (History)
3 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description eyal 2011-11-16 08:38:36 UTC
the dump file i'm reading is : 
enwiki-latest-pages-articles.xml.bz2(aug 08,2011)

i'm inserting the values into mysql db according do the wiki sql db definition,
after i removed the tables indexes constraints.

i will be more then glad to know if there's a way to work around it, and ignore the problematic rows and continue reading, and writing  the rest of the file.

thank







2,260,000 pages (36.843/sec), 2,260,000 revs (36.843/sec)
2,261,000 pages (36.842/sec), 2,261,000 revs (36.842/sec)
2,262,000 pages (36.841/sec), 2,262,000 revs (36.841/sec)
2,263,000 pages (36.839/sec), 2,263,000 revs (36.839/sec)
2,264,000 pages (36.837/sec), 2,264,000 revs (36.837/sec)
2,265,000 pages (36.838/sec), 2,265,000 revs (36.838/sec)
java.io.IOException: java.sql.SQLException: Incorrect string value: '\xF0\x9D\x9E\xB1_\xF0...' for column 'page_title' at row 9
	at org.mediawiki.importer.XmlDumpReader.readDump(XmlDumpReader.java:92)
	at org.mediawiki.dumper.gui.DumperGui$1.run(DumperGui.java:206)
Caused by: org.xml.sax.SAXException: java.sql.SQLException: Incorrect string value: '\xF0\x9D\x9E\xB1_\xF0...' for column 'page_title' at row 9
	at org.mediawiki.importer.XmlDumpReader.endElement(XmlDumpReader.java:227)
	at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source)
	at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanEndElement(Unknown Source)
	at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
	at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
	at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
	at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
	at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
	at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
	at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
	at javax.xml.parsers.SAXParser.parse(SAXParser.java:395)
	at javax.xml.parsers.SAXParser.parse(SAXParser.java:198)
	at org.mediawiki.importer.XmlDumpReader.readDump(XmlDumpReader.java:88)
Comment 1 eyal 2011-11-23 10:20:07 UTC
i found a way to fix the problem...

the sql schema file provided by wikipedia, at :

http://svn.wikimedia.org/viewvc/mediawiki/trunk/phase3/maintenance/tables.sql

has some defaults in it.

steps for solution :

1.at page table defintion :
at page_title field :change the varchar type to varbinary.
2.at table revision :
field re_comment : change type tinyblob to mediumblob(not the exception i described above, but still it's necessary if you want to avoid future exceptions.

that's it you're good to go..
Comment 2 Andre Klapper 2012-10-09 15:58:35 UTC
eyal:

(In reply to comment #0)
> the dump file i'm reading is : 
> enwiki-latest-pages-articles.xml.bz2(aug 08,2011)

Exact and full command used for this is welcome (without any potential user password of course).

Did you use the old/outdated jar from http://download.wikimedia.org/tools/ or the source from trunk/master?
Comment 3 Andre Klapper 2012-10-21 02:05:57 UTC
eyal: Could you answer comment 2 please?
Comment 4 Andre Klapper 2013-08-19 14:01:07 UTC
Unfortunately closing this report as no further information has been provided.

eyal: Please feel free to reopen this report if you can provide the information asked for and if this still happens. Thanks!

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links