Last modified: 2013-05-14 10:36:24 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T50378, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 48378 - Why use gsub in mw.text.trim?
Why use gsub in mw.text.trim?
Status: UNCONFIRMED
Product: MediaWiki extensions
Classification: Unclassified
Scribunto (Other open bugs)
unspecified
All All
: Low enhancement (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-05-12 14:35 UTC by François
Modified: 2013-05-14 10:36 UTC (History)
3 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description François 2013-05-12 14:35:17 UTC
Hi!
I don't know why gsub is used in function mw.text.trim in mw.text.lua; I would have written that function as following:

function mwtext.trim( s, charset )
        charset = charset or '\t\r\n\f '
        return mw.ustring.match( s, '^[' .. charset .. ']*(.-)[' .. charset .. ']*$' )
end

Also I would have made a mw.text.trim using the string library, and a mw.text.utrim using ustring, since it would probably be used most of the time with 1-byte characters...
Comment 1 François 2013-05-12 14:39:02 UTC
Forgotten: I would have written "s or ''" in place of simply "s" so it would not fail if given nil.
Comment 2 Brad Jorsch 2013-05-12 15:39:28 UTC
(In reply to comment #0)
> I don't know why gsub is used in function mw.text.trim in mw.text.lua; 

No particular reason.

> I would have written that function as following:

Any reason? In some quick testing here, they're both about the same speed (180-200µs each).

> Also I would have made a mw.text.trim using the string library, and a
> mw.text.utrim using ustring, since it would probably be used most of the time
> with 1-byte characters...

OTOH, that would require callers to know whether they should call mw.text.trim or mw.text.utrim.
Comment 3 François 2013-05-12 21:20:38 UTC
(In reply to comment #2)
Thank you for your response!
> Any reason? In some quick testing here, they're both about the same speed
> (180-200µs each).
It is nice then! I was thinking that
- having 1 more argument
- and replacing the match in the string in place of simply returning it
would consume more resources. But it is probably optimised internally by Lua...
> 
> OTOH, that would require callers to know whether they should call
> mw.text.trim
> or mw.text.utrim.
Yes, like they have to choose between string or ustring library
Comment 4 Brad Jorsch 2013-05-13 15:57:32 UTC
(In reply to comment #3)
> (In reply to comment #2)
> Thank you for your response!
> > Any reason? In some quick testing here, they're both about the same speed
> > (180-200µs each).
> It is nice then! I was thinking that
> - having 1 more argument
> - and replacing the match in the string in place of simply returning it
> would consume more resources. But it is probably optimised internally by
> Lua...

Actually, all the replacing logic for mw.ustring in Scribunto is in PHP, which itself uses the PCRE library (in C) to handle most of it.

> > OTOH, that would require callers to know whether they should call
> > mw.text.trim
> > or mw.text.utrim.
> Yes, like they have to choose between string or ustring library

Which itself is unfortunate. At any rate, it's too late to make this sort of change to mw.text.trim now. But if there is a general need for a faster binary trimming function it would be possible to add mw.text.trimBytes (name to be bikeshedded later).
Comment 5 François 2013-05-14 10:36:24 UTC
(In reply to comment #4)
Ok, thank you!

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links