Last modified: 2014-02-12 23:35:56 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T48443, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 46443 - Sanitizer fails to include image tags if incorrectly written as closed/void tags <img src= />
Sanitizer fails to include image tags if incorrectly written as closed/void t...
Status: NEW
Product: MediaWiki
Classification: Unclassified
Parser (Other open bugs)
1.22.0
All All
: High normal (vote)
: ---
Assigned To: Nobody - You can work on this!
http://openid-wiki.instance-proxy.wmf...
: need-parsertest
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-03-22 00:43 UTC by T. Gries
Modified: 2014-02-12 23:35 UTC (History)
2 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description T. Gries 2013-03-22 00:43:34 UTC
Copied from the description of bug 35002 :

I noticed that the Sanitizer function fails in the cases where the image tag is written in XHMTL-style as a closed tag <img src='http://image-url' />".

The net is full of reports of such things (XHTML vs. HTML4.1 vs. HTML5)
read for example this http://tiffanybbrown.com/2011/03/23/html5-does-not-allow-self-closing-tags/ .


This is still a valid bug, and a big problem for my RSS extension if the adminstrator expressly allows the rendering of <img> tags. 

In this case the Sanitizer is called as usual but gets additional food:

   $extraInclude[] = "img";

==> but *fails* to allow the img, escapes it still, which is wrong in that case <===

In other words, and to make it unambiguously clear:

The sanitizer does still its work and sanitizes (fallback: it is still secure), even if you want it *not* to santize img tags.

Because I have no influence on the composition of images tags of the incoming source (RSS feed), and some of these source do not obey the rules for image tags, I ask the MediaWiki Sanitizer specialist to fix this specific problem of "closed image tags".



How to reproduce:
================================================
(excerpt and comnstructed example; part of E:RSS)


$extraInclude = array();
$extraExclude = array( "iframe" );

$extraInclude[] = "a";
$extraInclude[] = "img";

$text = '<img src="http://tctechcrunch2011.files.wordpress.com/2013/03/yodlee_logo_final_rgb_lrg.jpg">';

$ret = Sanitizer::removeHTMLtags( $text, null, array(), $extraInclude, $extraExclude );

wfDebug( "RSS: after Sanitizer::removeHTMLtags:text:" . print_r( $text, true ) . "\n" );
wfDebug( "RSS: after Sanitizer::removeHTMLtags:extraInclude: " . print_r( $extraInclude, true ) . "\n" );
wfDebug( "RSS: after Sanitizer::removeHTMLtags:extraExclude: " . print_r( $extraExclude, true ) . "\n" );
wfDebug( "RSS: after Sanitizer::removeHTMLtags:ret: " . print_r( $ret, true ) . "\n" );
Comment 1 T. Gries 2013-03-22 00:46:57 UTC
see openid-wiki.instance-proxy.wmflabs.org/wiki/Main_Page#RSS how it looks like - no images, but escaped image tags
Comment 2 T. Gries 2013-03-22 00:49:22 UTC
see http://openid-wiki.instance-proxy.wmflabs.org/wiki/Main_Page#RSS for an exmaple how it looks like (img tags are shown escaped, but not executed/rendered)

see http://feeds.feedburner.com/TechCrunch/ as example for such an RSS feed with closed img tags.
Comment 3 T. Gries 2013-03-22 00:57:32 UTC
(In reply to comment #0)
> How to reproduce:
> ================================================
....
> $extraInclude[] = "img";
> $text = '<img
> src="http://tctechcrunch2011.files.wordpress.com/2013/03/
> yodlee_logo_final_rgb_lrg.jpg">';
> 
> $ret = Sanitizer::removeHTMLtags( $text, null, array(), $extraInclude,
> $extraExclude );
> 

In my example above, the img tag was manually repaired, during my tests, and then I copied that too early to this bug.

What I meant as test case is:

$text = '<img
src="http://tctechcrunch2011.files.wordpress.com/2013/03/yodlee_logo_final_rgb_lrg.jpg" />';

i.e. with the closing />

The effect is immediately apparent: the "open" image <img src=... > will be rendered, the "closed" <img src=... /> will not. Again, I am only talking about the case $extraInclude[] = "img";
Comment 4 Daniel Friesen 2013-03-22 01:06:14 UTC
HTML5 has nothing to do with this. Of course HTML5 does not support self-closing `<span />`, that's XML. Such a feature has never existed in HTML and it's stupid to consider that HTML5 would change that when browsers have never supported that in HTML.

This is of course completely irrelevant to the bug report. Because besides this being an issue with the sanitizer and not the browser's HTML parser <img> IS a void tag and the browser parser would treat <img /> the same way as <img>.

----

Now for the bug, testing this myself:

> $extraInclude = array( 'a', 'img' );

> $text = '<img src="http://example.com/test.png">';
> var_dump( Sanitizer::removeHTMLtags( $text, null, array(), $extraInclude ) );
string(50) "<img src="http&#58;//example.com/test.png"></img>\n"

> $text = '<img src="http://example.com/test.png" />';
> var_dump( Sanitizer::removeHTMLtags( $text, null, array(), $extraInclude ) );
string(47) "&lt;img src="http://example.com/test.png" /&gt;"

I see that the real issue is we simply have no real support for void tags. Since none of the tags we usually whitelist are void tags. We'll have to add support for that to the sanitizer.

In the non-void case I think that the /> behaviour is probably correct. We don't support self closing <div />'s. And turning <div /> into <div> like the HTML parser does is unexpected to the user. The sanest behaviour is to completely reject it so the user clearly can see something is wrong.
Comment 5 T. Gries 2013-03-22 01:14:27 UTC
(In reply to comment #4)
I cannot change the composition of incoming feeds, see http://feeds.feedburner.com/TechCrunch/ as example for such an RSS feed with closed img tags.

People are complaining that the RSS extension is not working, or broken, but I fully rely on the Sanitizer! So that Sanitizer needs a correction - only allow the closing image tag also.

That's the message, and there is nothing to discuss.

if the Sanitizer is not fixed, the extension will not be working for feeds using closing image tags, basta.
Comment 6 T. Gries 2013-06-09 22:46:46 UTC
hello, I do need the function as reported with this bug: that the sanitizer accepts image tags also when they have a closing " />". when "img" tags are part of the $extraInclude , i.e. allowed.

I would be happy, if this finds its way into core.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links