Last modified: 2012-12-02 08:01:32 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T33286, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 31286 - JavaScriptMinifier: Save bytes by normalising escaped unicode sequences
JavaScriptMinifier: Save bytes by normalising escaped unicode sequences
Status: RESOLVED WONTFIX
Product: MediaWiki
Classification: Unclassified
ResourceLoader (Other open bugs)
1.20.x
All All
: Lowest enhancement (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-09-30 19:20 UTC by db [inactive,noenotif]
Modified: 2012-12-02 08:01 UTC (History)
3 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description db [inactive,noenotif] 2011-09-30 19:20:19 UTC
brion at r98281:

"Note that identifiers using escapes don't get normalized to their UTF-8 form; this might be a nice thing to do as it saves a couple bytes, but currently there's no change made to output."

Please normalize escape sequences, when minifier javascript (not validate). Thanks.
Comment 1 Brion Vibber 2011-09-30 20:56:17 UTC
Bumping priority down since it's a fairly rare case, but it'd be nice to fix!

JSMin+ (JSMinPlus) was patched in r98281 to accept Unicode chars & escapes in identifiers for validation. A further tweak to have it retain the decoded form for the escapes would result in slightly smaller output from JSMin+.

However, we actually do our minification using the faster (but not quite as compressy) home-brewed JavaScriptMinifier -- so if we want that on MediaWiki we'll need to poke that side. :)

For minification purposes, this may actually make a bigger impact on string literals than identifiers: for instance WikiEditor's jquery.wikiEditor.toolbar.config.js contains a butt-ton of single Unicode characters as \uXXXX escapes -- every one is 6 bytes of source but would be 2-4 bytes if decoded.

That potentially saves a couple hundred bytes here. Not huge, but they add up.
Comment 2 Krinkle 2012-06-20 04:06:01 UTC
This would be nice to have in the minifier, indeed.

On the other hand, since UTF-8/Unicode characters are legal in JavaScript regardless (hence they may be decoded by a minifier), one might as well just put them decoded in the source file in the first place!

For third party/upstream resources this may not be an option, but at the very least it would be an upstream bug. It is simply not needed to encode them in JavaScript. And if using UTF-8 in a file is a problem, then one likely has bigger problems to deal with (anno 2012 one certainly should be able to deal with that).

The reason they are encoded in jquery.wikiEditor.* may be because PHP's json_encode() enforced that until PHP 5.4 (since 5.4 it is possible to disable that redundant encoding).
Comment 3 Krinkle 2012-12-02 08:01:32 UTC
From recent experience with jQuery, decoding things like this can lead to unexpected bugs.

Marking bugs that suggest altering the token stream in JavaScriptMinifier as
wontfix.

If we decide to go that way, it is probably best to use existing libraries such
as UglifyJS which are much more experienced with this sort of thing.

For now, we aren't using that yet because of the performance penalty involved with iterating over the token stream and reformatting it (since we do all of this on-demand on production servers).

If and when we do that, these bugs are redundant anyway as most if not all javascript reformatting applications have these features already.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links