Last modified: 2014-10-23 14:54:05 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T71747, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 69747 - Accessing bug 9444 via XML RPC API crashes due to invalid byte sequence: "not well-formed (invalid token)"
Accessing bug 9444 via XML RPC API crashes due to invalid byte sequence: "not...
Status: RESOLVED FIXED
Product: Wikimedia
Classification: Unclassified
Bugzilla (Other open bugs)
wmf-deployment
All All
: High major (vote)
: ---
Assigned To: Andre Klapper
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-08-19 16:20 UTC by Andre Klapper
Modified: 2014-10-23 14:54 UTC (History)
4 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Andre Klapper 2014-08-19 16:20:10 UTC
[...]
body: '<?xml version="1.0" encoding="UTF-8"?><methodResponse><params><param><value><struct><member><name>bugs</name><value><struct><member><name>9444</name><value><struct><member><name>comments</name><value><array><data><value><struct><member><name>is_private</name><value><boolean>0</boolean></value></member><member><name>count</name><value><int>0</int></value></member><member><name>creator</name><value><string>papadako@csd.uoc.gr</string></value></member><member><name>time</name><value><dateTime.iso8601>20070329T08:11:13</dateTime.iso8601></value></member><member><name>bug_id</name><value><int>9444</int></value></member><member><name>author</name><value><string>papadako@csd.uoc.gr</string></value></member><member><name>text</name><value><string>A database error has occurred Query: SELECT\nmath_outputhash,math_html_conservativeness,math_html,math_mathml FROM math WHERE\nmath_inputhash = \'\xef\xbf\xbd\xef\xbf\xbd\xd7\xbe\xef\xbf\xbd\x1f\x11\xef\xbf\xbd\xef\xbf\xbd\x12@\x01\xcb\xb5\' LIMIT 1 Function: MathRenderer::_recall Error: 1\nERROR: invalid byte sequence for encoding "UTF8": 0xebc3d'
Traceback (most recent call last):
  File "minimal.py", line 64, in <module>
    fetch(i)
  File "minimal.py", line 49, in fetch
    com = server.Bug.comments(kwargs)['bugs'][bugid]['comments']
  File "/usr/lib/python2.7/xmlrpclib.py", line 1224, in __call__
    return self.__send(self.__name, args)
  File "/usr/lib/python2.7/xmlrpclib.py", line 1578, in __request
    verbose=self.__verbose
  File "/usr/lib/python2.7/xmlrpclib.py", line 1264, in request
    return self.single_request(host, handler, request_body, verbose)
  File "/usr/lib/python2.7/xmlrpclib.py", line 1297, in single_request
    return self.parse_response(response)
  File "/usr/lib/python2.7/xmlrpclib.py", line 1467, in parse_response
    p.feed(data)
  File "/usr/lib/python2.7/xmlrpclib.py", line 557, in feed
    self._parser.Parse(data, 0)
xml.parsers.expat.ExpatError: not well-formed (invalid token): line 3, column 22
Comment 1 Andre Klapper 2014-08-19 16:29:24 UTC
Upstreamed as https://bugzilla.mozilla.org/show_bug.cgi?id=1055629
Comment 2 Andre Klapper 2014-08-19 23:00:46 UTC
Should drop some stupid chars like via
  $string =~ tr/\xea-\xef/-/;
somewhere before
  text       => $self->type('string', $comment->body_full),
in
  http://bzr.mozilla.org/bugzilla/4.4/view/head:/Bugzilla/WebService/Bug.pm#L296
I guess. 
Late uneducated comment that might be blatantly wrong tomorrow morning.
Comment 3 Andre Klapper 2014-08-22 14:11:58 UTC
[Mostly making comments here for myself.]

One problem here is that we have not 200% identified which actual chars are offending, we only guess.
Another problem is that I cannot easily create a local testcase.

Workaround in https://bugzilla.mozilla.org/show_bug.cgi?id=839023#c10 : Use
$initial =~ s/([\x01-\x08\x0b\x0c\x0f-\x1f])/sprintf "\\x%02x", ord($1)/ge;

http://perldoc.perl.org/perlebcdic.html#Quoted-Printable-encoding-and-decoding lists a similar example (also >x80 for stripping non-ascii entirely):
$qp_string =~ s/([=\x00-\x1F\x80-\xFF])/sprintf("=%02X",ord($1))/ge;

Above workaround is overkill though: if you replaced \x61 (letter: a) you'd end up with "Wrong/unsupported datatype 'boole\\x61n' specified" in the XMLRPC response. Hence slightly concerned about unwanted side effects, but above character range is nothing that should be used anyway.

So I tested the two-liner hack with the less commonly used letter \xc4\x8d (letter: č) in some comments, and the char replacement worked as expected in the XMLRPC response.

Helpful tables for conversion: http://www.utf8-chartable.de/unicode-utf8-table.pl?utf8=string-literal
Comment 4 Gerrit Notification Bot 2014-08-22 15:00:57 UTC
Change 155732 had a related patch set uploaded by Aklapper:
When exporting Bugzilla tickets via Chase's script we run into an API bug with specific Unicode letters for https://bugzilla.wikimedia.org/show_bug.cgi?id=9444#c0. This is applying a hackish upstream workaround described in https://bugzilla.mozilla.org/sh

https://gerrit.wikimedia.org/r/155732
Comment 5 Gerrit Notification Bot 2014-08-25 12:23:08 UTC
Change 156100 had a related patch set uploaded by Aklapper:
Work around Bugzilla XML RPC bug with special Unicode characters

https://gerrit.wikimedia.org/r/156100
Comment 6 Gerrit Notification Bot 2014-08-25 22:33:19 UTC
Change 155732 merged by Dzahn:
Create copy of upstream file (for followup custom change)

https://gerrit.wikimedia.org/r/155732
Comment 7 Gerrit Notification Bot 2014-08-28 21:27:41 UTC
Change 156100 merged by Dzahn:
Work around Bugzilla XML RPC bug with special Unicode characters

https://gerrit.wikimedia.org/r/156100
Comment 8 Andre Klapper 2014-08-28 22:47:56 UTC
Now a script querying the XML RPC API does not drop out anymore at ticket #9444, the XML also looks still valid, and I have not experienced any other explosions or incidents yet. 

Closing as FIXED, crossing fingers it'll stay like that.
Comment 9 Andre Klapper 2014-10-23 14:54:05 UTC
Note: As this workaround is applied to *any* output if also damages binary attachment data. See https://phabricator.wikimedia.org/T815

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links