Last modified: 2014-04-18 17:49:47 UTC
The extension doesn't work (gives fatal error) if PHP is compiled without multibyte string functions, specifically mb_convert_encoding. Here's a patch. In /includes/HtmlFormatter.php, in the root class, add: /* * Converts UTF-8 characters to HTML entities * @param string $string: The string to convert * @return string: The converted string */ public function convertToHtmlEntities( $string ) { if ( function_exists ( 'mb_convert_encoding' ) ) { return mb_convert_encoding( $string, 'HTML-ENTITIES', 'UTF-8' ); } return htmlspecialchars_decode( utf8_decode( htmlentities( $string ) ) ); } In getDoc function, replace $html = mb_convert_encoding( $this->html, 'HTML-ENTITIES', "UTF-8" ); with $html = $this->convertToHtmlEntities( $this->html ); And in fixLibXML, replace $html = mb_convert_encoding( $html, 'UTF-8', 'HTML-ENTITIES' ); with $html = $this->convertToHtmlEntities( $html );
Wouldn't utf8_decode() mangle everything not supported by ISO-8859-1?
Oh, yes. Can you tell me how do I see what does a string look like internally in PHP? Using echo, I get the same result for my test with and entirely without mb_convert_encoding.
Max what is the status of this bug?
Max..? Is this something we should be working on?
MediaWiki works horribly slow without proper Unicode support, I don't think we should support this use case.