Last modified: 2013-02-19 03:40:27 UTC
Scribunto's built-in string module works with bytestrings. So if you have something like "string.len('hüllo')", it will return 6. If you have something like "string.reverse('hüllo')", it will return "oll��h". This is fine for a programming language, I guess, but particularly for a case like Scribunto (where template programmers are being targeted and there's Unicode everywhere), sane Unicode string handling _must_ come with the extension. Victor Vasiliev has done some work on this already, I'm told, as a ustring module. There's a C part and a Lua part. I've no idea where the code is, but I'm told it's publicly available somewhere.
C code is in SVN, right in luasandbox module. Lua code is in gerrit, but it needs more fixes.
https://www.mediawiki.org/wiki/Extension:Scribunto/API_specification#ustring_API
Two points: (1) http://scribunto.wmflabs.org has some version of a ustring module right now. Not sure how or why, though its function names are painfully abbreviated. (2) Fran McCrory makes some very interesting points at <https://www.mediawiki.org/w/index.php?diff=575869&oldid=575863> about using u'foo' syntax and whether it might make sense to do away with bytestrings altogether.
The code Victor wrote had a completely different API to the stock Lua string functions, and it wasn't possible to simulate it in pure Lua. So I disabled it before I deployed it. It's better for the functionality to be temporarily missing than to be stuck with a bad interface forever.
(In reply to comment #4) > The code Victor wrote had a completely different API to the stock Lua string > functions, and it wasn't possible to simulate it in pure Lua. So I disabled it > before I deployed it. It's better for the functionality to be temporarily > missing than to be stuck with a bad interface forever. Thank you for explaining. That's fine and I completely agree. But it would saved me a ton of confusion if this had been made clearer (cf. bug 39655).
We have mw.ustring now.