Last modified: 2014-04-24 17:14:41 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T51342, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 49342 - Bot encoding messed up: unicode characters (åö etc.) broken
Bot encoding messed up: unicode characters (åö etc.) broken
Status: RESOLVED FIXED
Product: Wikimedia
Classification: Unclassified
wikibugs IRC bot (Other open bugs)
unspecified
All All
: Low normal (vote)
: ---
Assigned To: Merlijn van Deen (test)
:
: 64354 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-06-08 18:15 UTC by Nemo
Modified: 2014-04-24 17:14 UTC (History)
5 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Nemo 2013-06-08 18:15:24 UTC
For instance, from the logs:

[17:55:16] <wikibugs_>	 03(mod) Wrong language code for Norwegian Bokm�l (Android) - 10https://bugzilla.wikimedia.org/49340  +comment (10Niklas Laxström)
http://bots.wmflabs.org/~wm-bot/logs/%23mediawiki/20130608.txt

which however for Niklas (with automatic encoding detection) showed up as:

[20:54:42]  wikibugs_> (mod) Wrong language code for Norwegian Bokmål (Android) - https://bugzilla.wikimedia.org/49340 +comment (Niklas Laxström)
Comment 1 Nemo 2013-06-08 18:17:25 UTC
[18:15:27] <wikibugs_>	 03(NEW) Bot encoding messed up: �� unicode characters broken - 10https://bugzilla.wikimedia.org/49342 major; Wikimedia: wikibugs IRC bot; ()

so it doesn't need both summary and username to have non-ASCII characters as first suspected.
Comment 2 Bartosz Dziewoński 2013-06-08 21:56:27 UTC
The bot has been semi-randomly messing up Unicode since ever. The "ń" in my last name is sometimes mangled as well (but not always, for me at least). I have been unable to determine the cause of this behavior.
Comment 3 Nemo 2013-06-10 07:05:10 UTC
(In reply to comment #2)
> The bot has been semi-randomly messing up Unicode since ever. 

To me it seems mostly a recent thing.

> The "ń" in my
> last name is sometimes mangled as well (but not always, for me at least). I
> have been unable to determine the cause of this behavior.

The most obvious reason would be that summary and other headers have different encodings. The bot reads email, which is now HTML by default, so this may be the reason and would be fixed by upgrading to 4.4: https://bugzilla.mozilla.org/show_bug.cgi?id=777685
To test it, it may be enough for a bugzilla admin to change wikibugs' preferences so that it uses plain text notifications (unless that bug also affects those).
Comment 4 Merlijn van Deen (test) 2014-04-23 08:20:30 UTC
Scratch that, it's still broken due to Bugzilla not conforming to the e-mail RFC:

Wikimedia / wikibugs IRC bot: Bot encoding messed up: unicode characters (=?UTF-8?Q?=C3=A5=C3=B6=20etc=2E?=) broken 

(=?UTF-8?Q? is only allowed to start after whitespace, not after a '(', so the Python email parser fails to recognise it by default)
Comment 5 Andre Klapper 2014-04-24 07:30:33 UTC
Any testcase for such a Bugzilla notification? How common is that?
Comment 6 Nemo 2014-04-24 07:49:06 UTC
(In reply to Andre Klapper from comment #5)
> Any testcase for such a Bugzilla notification?

This bug report is designed to be the test case for itself. :) Just edit something and check its summary on IRC.
Comment 7 Merlijn van Deen (test) 2014-04-24 08:09:25 UTC
See also: http://bugs.python.org/issue21315
Comment 8 Andre Klapper 2014-04-24 08:15:43 UTC
Thanks. I'm stupid. :)

Subject: [Bug 49342] Bot encoding messed up: unicode characters
 (=?UTF-8?Q?=C3=A5=C3=B6=20etc=2E?=) broken

Either an issue in Perl's Email::MIME or somewhere around http://bzr.mozilla.org/bugzilla/4.4/view/head:/Bugzilla/Mailer.pm#L74

Upatream I only found https://bugzilla.mozilla.org/show_bug.cgi?id=387860 which is different and fixed.
Comment 9 Daniel Zahn 2014-04-24 08:21:22 UTC
this is about old wikibugs bot right? try how pywikibugs handles it
Comment 10 Merlijn van Deen (test) 2014-04-24 11:19:08 UTC
*** Bug 64354 has been marked as a duplicate of this bug. ***
Comment 11 Merlijn van Deen (test) 2014-04-24 11:25:00 UTC
No, this is currently an issue with pywikibugs -- Bugzilla sends non-RFC-compliant mails and the python 3.4 mail parser does not handle this gracefully. I special-cased the "?=UTF-8?Q?=20" case (replace with " ?=UTF-8?Q?"), but this doesn't take care of the (?=UTF-8?Q? and "?=UTF-8?Q? cases.
Comment 12 Merlijn van Deen (test) 2014-04-24 11:40:32 UTC
Fixed by fully monkey-patching get_unstructured; the patch at bugs.python.org adds the following lines:


if "=?" in tok and not tok.startswith("=?"):
    tok, rest = tok.split("=?", 1)
    remainder.insert(0, "=?" + rest)


which makes sure any "=?"'s are sure to be parsed.
Comment 13 Andre Klapper 2014-04-24 17:14:41 UTC
(In reply to Merlijn van Deen from comment #4)
> Scratch that, it's still broken due to Bugzilla not conforming to the e-mail
> RFC

FYI, I upstreamed it as https://bugzilla.mozilla.org/show_bug.cgi?id=1000988

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links