Last modified: 2013-02-19 03:40:27 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T41646, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 39646 - Scribunto needs sane Unicode string support
Scribunto needs sane Unicode string support
Status: RESOLVED FIXED
Product: MediaWiki extensions
Classification: Unclassified
Scribunto (Other open bugs)
unspecified
All All
: High enhancement with 4 votes (vote)
: ---
Assigned To: Nobody - You can work on this!
https://www.mediawiki.org/wiki/Extens...
: utf8
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-08-25 16:36 UTC by MZMcBride
Modified: 2013-02-19 03:40 UTC (History)
5 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description MZMcBride 2012-08-25 16:36:33 UTC
Scribunto's built-in string module works with bytestrings. So if you have something like "string.len('hüllo')", it will return 6. If you have something like "string.reverse('hüllo')", it will return "oll��h".

This is fine for a programming language, I guess, but particularly for a case like Scribunto (where template programmers are being targeted and there's Unicode everywhere), sane Unicode string handling _must_ come with the extension.

Victor Vasiliev has done some work on this already, I'm told, as a ustring module. There's a C part and a Lua part. I've no idea where the code is, but I'm told it's publicly available somewhere.
Comment 1 Victor Vasiliev 2012-08-25 17:56:14 UTC
C code is in SVN, right in luasandbox module. Lua code is in gerrit, but it needs more fixes.
Comment 3 MZMcBride 2012-08-26 02:35:27 UTC
Two points:

(1) http://scribunto.wmflabs.org has some version of a ustring module right now. Not sure how or why, though its function names are painfully abbreviated.

(2) Fran McCrory makes some very interesting points at <https://www.mediawiki.org/w/index.php?diff=575869&oldid=575863> about using u'foo' syntax and whether it might make sense to do away with bytestrings altogether.
Comment 4 Tim Starling 2012-08-26 10:05:03 UTC
The code Victor wrote had a completely different API to the stock Lua string functions, and it wasn't possible to simulate it in pure Lua. So I disabled it before I deployed it. It's better for the functionality to be temporarily missing than to be stuck with a bad interface forever.
Comment 5 MZMcBride 2012-08-26 13:04:08 UTC
(In reply to comment #4)
> The code Victor wrote had a completely different API to the stock Lua string
> functions, and it wasn't possible to simulate it in pure Lua. So I disabled it
> before I deployed it. It's better for the functionality to be temporarily
> missing than to be stuck with a bad interface forever.

Thank you for explaining. That's fine and I completely agree. But it would saved me a ton of confusion if this had been made clearer (cf. bug 39655).
Comment 6 Brad Jorsch 2013-02-19 03:40:27 UTC
We have mw.ustring now.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links