Last modified: 2014-08-29 22:34:39 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T71481, the corresponding Phabricator task for complete and up-to-date bug report information.

Bug 69481 - Attempt to concatenate a nil value when using mw.html:css


Summary:	Attempt to concatenate a nil value when using mw.html:css

Status:	RESOLVED INVALID

Product:	MediaWiki extensions
Classification:	Unclassified
Component:	Scribunto (Other open bugs)
Version:	unspecified
Hardware:	PC Windows Server 2008

Importance:	Normal normal (vote)
Target Milestone:	---
Assigned To:	Nobody - You can work on this!

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:
	Show dependency tree / graph

Reported:	2014-08-13 16:54 UTC by Erich Gubler
Modified:	2014-08-29 22:34 UTC (History)
CC List:	4 users (show)

See Also:
Web browser:	---
Mobile Platform:	---
Assignee Huggle Beta Tester:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Description Erich Gubler 2014-08-13 16:54:07 UTC

Mediawiki Version 1.23.2
Scribunto Revision b4b099d
Lua 5.1.4

Using mw.html.create(...) to make a mw.html object, I am able to use most of the functions just fine. However, when I use the following code in a module from the example in the Lua Reference Manual...


>local div = mw.html.create( 'div' )
>div
>	:attr( 'id', 'testdiv' )
>	:css( 'width', '100%' )
>	:wikitext( "Hello, world! I am a \"" .. (args["type"] or "...nothing...") .. "\"" )
>	:tag( 'hr' )

...I get the following error:

"Lua error in mw.html.lua at line 113: attempt to concatenate a nil value."

With some exploration, I discovered the error is produced by this line (edited to stand alone):

> div:css( 'width', '100%' )

I also receive the same error with a table input:

> div:css( {['width'] = '100%'} )

...and with numbers:

> div:css( 'width', 100 )

Comment 1 Brad Jorsch 2014-08-13 17:17:11 UTC

I cannot reproduce this locally. Can you dig into mw.html.lua and try to determine what exactly is going wrong?

Comment 2 Jackmcbarn 2014-08-13 17:47:42 UTC

I am able to reproduce this. I'm investigating it now.

Comment 3 Jackmcbarn 2014-08-13 17:54:39 UTC

I tested this on Windows, since it appears that's where the reporter is, since the only place we provide 5.1.4 is Windows.

Of particular note is the following:

Warning: preg_replace_callback(): Compilation failed: invalid UTF-8 string at offset 14 in C:\xampp\htdocs\core\extensions\Scribunto\engines\LuaCommon\UstringLibrary.php on line 643

Backporting https://gerrit.wikimedia.org/r/#/c/145414/ fixes the issue for me (and makes that warning go away).

Comment 4 Jackmcbarn 2014-08-13 19:05:35 UTC

After further investigation, the problem seems to lie in the patternToRegex function. A simple test: The pattern "[

Comment 5 Jackmcbarn 2014-08-13 19:06:38 UTC

(and it seems Bugzilla doesn't like that character either. Where I say U+10FFFF below, I really mean that literal character.)

After further investigation, the problem seems to lie in the patternToRegex function. A simple test: The pattern "[U+10FFFF]" is incorrectly converted to "/\[U+10FFFF\]/us" instead of the correct "/[U+10FFFF]/us". (I'm not sure why this is happening, or why that makes PHP throw a warning.)

Comment 6 Brad Jorsch 2014-08-14 14:58:38 UTC

It seems clear that PHP-on-Windows is doing something screwed up there; I see no code path in engines/LuaCommon/UstringLibrary.php that should ever convert "[U+10FFFF]" to a regex with backslashed brackets like that.

If I had access to a machine where this was occurring, I'd be hacking patternToRegex() (in a local copy) to log various variables at various points in the function to see where exactly things are going wrong. Starting with whether $pattern is correct, and then whether $pat is getting set correctly.

BTW, the issue with bugzilla cutting off the comment appears to be https://bugzilla.mozilla.org/show_bug.cgi?id=405011 (which is marked fixed, but apparently only in their 5.0 branch?).

Comment 7 Erich Gubler 2014-08-14 15:03:10 UTC

Confirmed, this was reproduced on a Windows machine.

Comment 8 Jackmcbarn 2014-08-28 02:25:19 UTC

The difference happens when preg_split is called.

var_dump( preg_split( '//us', "[\xF4\x8F\xBF\xBF]", null, PREG_SPLIT_NO_EMPTY ) );

On Linux:
array(3) {
  [0] =>
  string(1) "["
  [1] =>
  string(4) "U+10FFFF"
  [2] =>
  string(1) "]"
}

On Windows:
array(1) {
  [0]=>
  string(6) "[U+10FFFF]"
}

Comment 9 Jackmcbarn 2014-08-28 02:37:08 UTC

That happens because U+10FFFF is considered an invalid character on Windows but not Linux. preg_match("/\xF4\x8F\xBF\xBF/u", "foobar"); produces "Warning: preg_match(): Compilation failed: invalid UTF-8 string at offset 0 in - on line 2" on Windows but not Linux.

Comment 10 Brad Jorsch 2014-08-28 15:08:25 UTC

Thanks for tracking that down. At this point it doesn't seem to be a bug Scribunto, since U+10FFFF is a valid Unicode code point.

Further questions:
* Does it affect other characters (e.g. U+10FFFE)? Ideally test every character, if that wouldn't take insanely long.
* Is the Windows environment is using an old version of PCRE? Would upgrading fix it?

Comment 11 Jackmcbarn 2014-08-28 15:37:20 UTC

Linux and Windows both say that U+D800 through U+DFFF are invalid. Linux doesn't complain about anything else, but Windows also complains about U+FDD0 through U+FDEF, as well as all codepoints that end in FFFE or FFFF.

Comment 12 Jackmcbarn 2014-08-28 15:39:43 UTC

Windows PCRE version: 8.32 2012-11-30
Linux PCRE version: 8.31 2012-07-06

Comment 13 Brad Jorsch 2014-08-28 16:10:21 UTC

(In reply to Jackmcbarn from comment #11)
> Linux and Windows both say that U+D800 through U+DFFF are invalid.

That's not terribly surprising. UTF-16 surrogates aren't needed in UTF-8.

> but Windows also complains about
> U+FDD0 through U+FDEF, as well as all codepoints that end in FFFE or FFFF.

So it's complaining about non-characters even though they're valid code points.

(In reply to Jackmcbarn from comment #12)
> Windows PCRE version: 8.32 2012-11-30
> Linux PCRE version: 8.31 2012-07-06

Looking through the PCRE changelog, I see the following for 8.33:

> 21. Unicode validation has been updated in the light of Unicode Corrigendum #9,
>    which points out that "non characters" are not "characters that may not
>    appear in Unicode strings" but rather "characters that are reserved for
>    internal use and have only local meaning".

This leads me to http://bugs.exim.org/show_bug.cgi?id=1340, which confirms that this was a change in 8.32 that was fixed in 8.33.

About the only thing we could do in Scribunto would be to blacklist PCRE 8.32, throwing an exception if that version is detected. I don't know that we really want to go to that extent. I did update the documentation at https://www.mediawiki.org/wiki/Extension:Scribunto#PCRE_version_compatibility to note this issue.

Comment 14 Jackmcbarn 2014-08-29 22:34:39 UTC

Since this is a bug, but not our bug, changing to INVALID.

As a workaround for this bug, you can cherry-pick https://gerrit.wikimedia.org/r/#/c/145414/

Running the command "git cherry-pick ee289c8045ea1cc260c56d447af3078d5f6075cd" from the Scribunto directory should do this, assuming it was checked out of git. Upgrading to 1.24 will include that change, so once you eventually do that, you won't have to worry about this particular instance of it anymore.

Wikimedia Bugzilla is closed!

Search

Personal tools

Navigation

Links