Last modified: 2014-10-16 11:29:50 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T72937, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 70937 - SVG upload get blocked on correct encoding (windows-1252, with wrong/unspecific warning)
SVG upload get blocked on correct encoding (windows-1252, with wrong/unspecif...
Status: NEW
Product: MediaWiki
Classification: Unclassified
Uploading (Other open bugs)
unspecified
All All
: Low normal (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-09-17 11:58 UTC by PRO
Modified: 2014-10-16 11:29 UTC (History)
7 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description PRO 2014-09-17 11:58:47 UTC
I got an ERROR unspecific warning: "This file contains HTML or script code that may be erroneously interpreted by a web browser."

For example this file is normally in encoding="ISO-8859-1" (or standard encoding="UTF-8") but the W3C says it should use "windows-1252" instead: [[File:Milch.svg]].
Comment 1 Tisza Gergő 2014-09-17 14:12:33 UTC
I am not sure I understand the role of encoding in this bug. Do you get the error with some encoding but not with another one?
Comment 2 Chris Steipp 2014-09-17 16:05:16 UTC
Currently, we only allow encodings:

$safeXmlEncodings = array(
	'UTF-8',
	'ISO-8859-1',
	'ISO-8859-2',
	'UTF-16',
	'UTF-32'
);

We had specific issues with UTF-7 (bug 47304), so we whitelisted encodings that were well supported. We would need to verify that xml parsing of windows-1252 on the server and clients is done the same way before we can open that up.
Comment 3 PRO 2014-09-17 16:14:18 UTC
(In reply to Tisza Gergő from comment #1)
Yes, the [[Windows-1252]] encoding is preferred by the W3C (validator) to encoding="ISO-8859-1", also in SVG, see at a test-file, the second warning: http://validator.w3.org/check?uri=https%3A%2F%2Fupload.wikimedia.org%2Fwikipedia%2Fcommons%2Farchive%2Fb%2Fbd%2F20140917155924%21Test.svg&charset=%28detect+automatically%29&doctype=Inline&ss=1&group=0&user-agent=W3C_Validator%2F1.3+http%3A%2F%2Fvalidator.w3.org%2Fservices#line-1

See also http://lists.w3.org/Archives/Public/www-validator/2013Mar/0054.html

So MediaWiki accepts ISO-8859-1 but not windows-1252.
Comment 4 Bawolff (Brian Wolff) 2014-09-19 00:25:28 UTC
(In reply to PRO from comment #3)
> (In reply to Tisza Gergő from comment #1)
> Yes, the [[Windows-1252]] encoding is preferred by the W3C (validator) to
> encoding="ISO-8859-1", also in SVG, see at a test-file, the second warning:
> http://validator.w3.org/check?uri=https%3A%2F%2Fupload.wikimedia.
> org%2Fwikipedia%2Fcommons%2Farchive%2Fb%2Fbd%2F20140917155924%21Test.
> svg&charset=%28detect+automatically%29&doctype=Inline&ss=1&group=0&user-
> agent=W3C_Validator%2F1.3+http%3A%2F%2Fvalidator.w3.org%2Fservices#line-1
> 
> See also http://lists.w3.org/Archives/Public/www-validator/2013Mar/0054.html
> 
> So MediaWiki accepts ISO-8859-1 but not windows-1252.

Its not like one is better than the other really. If the file is windows-1252, you should mark it as such....

If anything is preferred, realistically it would be to have documents be in utf-8, the most sane of all the encodings.

-----

windows-1252 is almost the same as ISO-8859-1 (just c0 and c1 control characters are different), and both are ascii compatible. Anything ascii compatible should not have bug 47304 type issues, so it should be safe to whitelist windows-1252
Comment 5 PRO 2014-09-19 13:29:05 UTC
(In reply to Bawolff (Brian Wolff) from comment #4)
> Its not like one is better than the other really. If the file is
> windows-1252, you should mark it as such....
> 
> If anything is preferred, realistically it would be to have documents be in
> utf-8, the most sane of all the encodings.
> 
> -----
> 
> windows-1252 is almost the same as ISO-8859-1 (just c0 and c1 control
> characters are different), and both are ascii compatible. Anything ascii
> compatible should not have bug 47304 type issues, so it should be safe to
> whitelist windows-1252

Here is a temporary example why the using of windows-1252 is preferred (partially because the using is not much suggestive):
https://upload.wikimedia.org/wikipedia/commons/thumb/archive/b/bd/20140917155924%21Test.svg/800px-Test.svg.png
Comment 6 Chris Steipp 2014-10-03 22:51:57 UTC
(In reply to Bawolff (Brian Wolff) from comment #4)
> windows-1252 is almost the same as ISO-8859-1 (just c0 and c1 control
> characters are different), and both are ascii compatible. Anything ascii
> compatible should not have bug 47304 type issues, so it should be safe to
> whitelist windows-1252

I agree. If someone wants to submit a patch to add that to the whitelist, I will +1.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links