Last modified: 2013-08-19 14:01:07 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T34439, the corresponding Phabricator task for complete and up-to-date bug report information.

Bug 32439 - java.sql.SQLException: Incorrect string value: '\xF0\x9D\x9E\xB1_\xF0...' for column 'page_title' at row 9


Summary:	java.sql.SQLException: Incorrect string value: '\xF0\x9D\x9E\xB1_\xF0...' for...

Status:	RESOLVED WORKSFORME

Product:	Utilities
Classification:	Unclassified
Component:	mwdumper (Other open bugs)
Version:	unspecified
Hardware:	PC Windows 7

Importance:	Normal blocker (vote)
Target Milestone:	---
Assigned To:	Brion Vibber

URL:
Whiteboard:	aklapper-moreinfo
Keywords:

Depends on:
Blocks:
	Show dependency tree / graph

Reported:	2011-11-16 08:38 UTC by eyal
Modified:	2013-08-19 14:01 UTC (History)
CC List:	3 users (show)

See Also:
Web browser:	---
Mobile Platform:	---
Assignee Huggle Beta Tester:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Description eyal 2011-11-16 08:38:36 UTC

the dump file i'm reading is : 
enwiki-latest-pages-articles.xml.bz2(aug 08,2011)

i'm inserting the values into mysql db according do the wiki sql db definition,
after i removed the tables indexes constraints.

i will be more then glad to know if there's a way to work around it, and ignore the problematic rows and continue reading, and writing  the rest of the file.

thank







2,260,000 pages (36.843/sec), 2,260,000 revs (36.843/sec)
2,261,000 pages (36.842/sec), 2,261,000 revs (36.842/sec)
2,262,000 pages (36.841/sec), 2,262,000 revs (36.841/sec)
2,263,000 pages (36.839/sec), 2,263,000 revs (36.839/sec)
2,264,000 pages (36.837/sec), 2,264,000 revs (36.837/sec)
2,265,000 pages (36.838/sec), 2,265,000 revs (36.838/sec)
java.io.IOException: java.sql.SQLException: Incorrect string value: '\xF0\x9D\x9E\xB1_\xF0...' for column 'page_title' at row 9
	at org.mediawiki.importer.XmlDumpReader.readDump(XmlDumpReader.java:92)
	at org.mediawiki.dumper.gui.DumperGui$1.run(DumperGui.java:206)
Caused by: org.xml.sax.SAXException: java.sql.SQLException: Incorrect string value: '\xF0\x9D\x9E\xB1_\xF0...' for column 'page_title' at row 9
	at org.mediawiki.importer.XmlDumpReader.endElement(XmlDumpReader.java:227)
	at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source)
	at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanEndElement(Unknown Source)
	at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
	at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
	at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
	at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
	at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
	at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
	at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
	at javax.xml.parsers.SAXParser.parse(SAXParser.java:395)
	at javax.xml.parsers.SAXParser.parse(SAXParser.java:198)
	at org.mediawiki.importer.XmlDumpReader.readDump(XmlDumpReader.java:88)

Comment 1 eyal 2011-11-23 10:20:07 UTC

i found a way to fix the problem...

the sql schema file provided by wikipedia, at :

http://svn.wikimedia.org/viewvc/mediawiki/trunk/phase3/maintenance/tables.sql

has some defaults in it.

steps for solution :

1.at page table defintion :
at page_title field :change the varchar type to varbinary.
2.at table revision :
field re_comment : change type tinyblob to mediumblob(not the exception i described above, but still it's necessary if you want to avoid future exceptions.

that's it you're good to go..

Comment 2 Andre Klapper 2012-10-09 15:58:35 UTC

eyal:

(In reply to comment #0)
> the dump file i'm reading is : 
> enwiki-latest-pages-articles.xml.bz2(aug 08,2011)

Exact and full command used for this is welcome (without any potential user password of course).

Did you use the old/outdated jar from http://download.wikimedia.org/tools/ or the source from trunk/master?

Comment 3 Andre Klapper 2012-10-21 02:05:57 UTC

eyal: Could you answer comment 2 please?

Comment 4 Andre Klapper 2013-08-19 14:01:07 UTC

Unfortunately closing this report as no further information has been provided.

eyal: Please feel free to reopen this report if you can provide the information asked for and if this still happens. Thanks!

Wikimedia Bugzilla is closed!

Search

Personal tools

Navigation

Links