Last modified: 2014-02-12 23:35:56 UTC
Copied from the description of bug 35002 : I noticed that the Sanitizer function fails in the cases where the image tag is written in XHMTL-style as a closed tag <img src='http://image-url' />". The net is full of reports of such things (XHTML vs. HTML4.1 vs. HTML5) read for example this http://tiffanybbrown.com/2011/03/23/html5-does-not-allow-self-closing-tags/ . This is still a valid bug, and a big problem for my RSS extension if the adminstrator expressly allows the rendering of <img> tags. In this case the Sanitizer is called as usual but gets additional food: $extraInclude[] = "img"; ==> but *fails* to allow the img, escapes it still, which is wrong in that case <=== In other words, and to make it unambiguously clear: The sanitizer does still its work and sanitizes (fallback: it is still secure), even if you want it *not* to santize img tags. Because I have no influence on the composition of images tags of the incoming source (RSS feed), and some of these source do not obey the rules for image tags, I ask the MediaWiki Sanitizer specialist to fix this specific problem of "closed image tags". How to reproduce: ================================================ (excerpt and comnstructed example; part of E:RSS) $extraInclude = array(); $extraExclude = array( "iframe" ); $extraInclude[] = "a"; $extraInclude[] = "img"; $text = '<img src="http://tctechcrunch2011.files.wordpress.com/2013/03/yodlee_logo_final_rgb_lrg.jpg">'; $ret = Sanitizer::removeHTMLtags( $text, null, array(), $extraInclude, $extraExclude ); wfDebug( "RSS: after Sanitizer::removeHTMLtags:text:" . print_r( $text, true ) . "\n" ); wfDebug( "RSS: after Sanitizer::removeHTMLtags:extraInclude: " . print_r( $extraInclude, true ) . "\n" ); wfDebug( "RSS: after Sanitizer::removeHTMLtags:extraExclude: " . print_r( $extraExclude, true ) . "\n" ); wfDebug( "RSS: after Sanitizer::removeHTMLtags:ret: " . print_r( $ret, true ) . "\n" );
see openid-wiki.instance-proxy.wmflabs.org/wiki/Main_Page#RSS how it looks like - no images, but escaped image tags
see http://openid-wiki.instance-proxy.wmflabs.org/wiki/Main_Page#RSS for an exmaple how it looks like (img tags are shown escaped, but not executed/rendered) see http://feeds.feedburner.com/TechCrunch/ as example for such an RSS feed with closed img tags.
(In reply to comment #0) > How to reproduce: > ================================================ .... > $extraInclude[] = "img"; > $text = '<img > src="http://tctechcrunch2011.files.wordpress.com/2013/03/ > yodlee_logo_final_rgb_lrg.jpg">'; > > $ret = Sanitizer::removeHTMLtags( $text, null, array(), $extraInclude, > $extraExclude ); > In my example above, the img tag was manually repaired, during my tests, and then I copied that too early to this bug. What I meant as test case is: $text = '<img src="http://tctechcrunch2011.files.wordpress.com/2013/03/yodlee_logo_final_rgb_lrg.jpg" />'; i.e. with the closing /> The effect is immediately apparent: the "open" image <img src=... > will be rendered, the "closed" <img src=... /> will not. Again, I am only talking about the case $extraInclude[] = "img";
HTML5 has nothing to do with this. Of course HTML5 does not support self-closing `<span />`, that's XML. Such a feature has never existed in HTML and it's stupid to consider that HTML5 would change that when browsers have never supported that in HTML. This is of course completely irrelevant to the bug report. Because besides this being an issue with the sanitizer and not the browser's HTML parser <img> IS a void tag and the browser parser would treat <img /> the same way as <img>. ---- Now for the bug, testing this myself: > $extraInclude = array( 'a', 'img' ); > $text = '<img src="http://example.com/test.png">'; > var_dump( Sanitizer::removeHTMLtags( $text, null, array(), $extraInclude ) ); string(50) "<img src="http://example.com/test.png"></img>\n" > $text = '<img src="http://example.com/test.png" />'; > var_dump( Sanitizer::removeHTMLtags( $text, null, array(), $extraInclude ) ); string(47) "<img src="http://example.com/test.png" />" I see that the real issue is we simply have no real support for void tags. Since none of the tags we usually whitelist are void tags. We'll have to add support for that to the sanitizer. In the non-void case I think that the /> behaviour is probably correct. We don't support self closing <div />'s. And turning <div /> into <div> like the HTML parser does is unexpected to the user. The sanest behaviour is to completely reject it so the user clearly can see something is wrong.
(In reply to comment #4) I cannot change the composition of incoming feeds, see http://feeds.feedburner.com/TechCrunch/ as example for such an RSS feed with closed img tags. People are complaining that the RSS extension is not working, or broken, but I fully rely on the Sanitizer! So that Sanitizer needs a correction - only allow the closing image tag also. That's the message, and there is nothing to discuss. if the Sanitizer is not fixed, the extension will not be working for feeds using closing image tags, basta.
hello, I do need the function as reported with this bug: that the sanitizer accepts image tags also when they have a closing " />". when "img" tags are part of the $extraInclude , i.e. allowed. I would be happy, if this finds its way into core.