Last modified: 2014-09-23 23:09:33 UTC
Created attachment 7932 [details] change the attribs regex to handle the quoted and unquoted attributes same The regex MW_ATTRIBS_REGEX in Sanitizer.php should change for some cases, because the quoted and unquoted attributes are not handled the same way: * attributes with empty content * attributes with < as content I have create a wikipage linked as url to make that visible. Thanks.
presumably the not reading of < in attributes of tag extensions is some sort of paranoia against XSS. It would perhaps make sense to make it not recognize <tag att=foo<bar > for consistency's sake. not recognizing <tag someAttribute= > Seems sane to me. I expect to be required to do <tag someAttribute=""> if i want to pass it the empty string.
Not reading < in attributes is b/c that is how the spec is written, IIRC. I don't think <> are allowed in attributes.
<elem attr=> is a bit weird, not sure if that should be supported. However <elem attr> should render as <elem attr=""> imho
I thought in html, <elem attr> was equivelent to <elem attr="attr">. It would be weird to do the opposite of html imho.
(In reply to comment #4) > I thought in html, <elem attr> was equivelent to <elem attr="attr">. I don't think so. But see http://www.w3.org/TR/html-markup/syntax.html#syntax-attributes for more info.
Hmm, maybe its an xhtml thing. I was reading http://www.w3.org/TR/xhtml1/#h-4.5 html is confusing.
Need to add test cases for the behavior that it's supposed to be changing, and clarify what is supposed to change and why. Patch seems to be forbidding quoted empty elements, which is definitely wrong.....? Appears to also remove '<' and '>' from the list of accepted chars for unquoted attribs. Not sure how those chars actually interact with the rest of the sanitizer stuff, but note the HTML 5 parser rules explicitly specify that '>' should close out the tag, while '<' is technically bogus but should be treated as part of the attribute value for consistent fallback behavior: http://dev.w3.org/html5/spec/Overview.html#attribute-value-unquoted-state
Umherirrender, I am adding the "reviewed" keyword to this bug since you received a review from Brion in comment 7 in February. Do you have time and interest in revising the patch in accordance with those suggestions? Thanks for the patch!
Comment on attachment 7932 [details] change the attribs regex to handle the quoted and unquoted attributes same I am not able to provide a new patch with parser tests or which have the right regex for the specifition. Marking patch as obsolete.
Echoing comment 7: parser tests need to be added, so that we can ensure that the PHP parser and Parsoid have the same behavior.