Last modified: 2011-06-27 21:18:51 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T31514, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 29514 - PHP5 UTF-8
PHP5 UTF-8
Status: RESOLVED WONTFIX
Product: MediaWiki
Classification: Unclassified
Internationalization (Other open bugs)
unspecified
All All
: Low enhancement (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-06-21 15:43 UTC by Rin Nas
Modified: 2011-06-27 21:18 UTC (History)
3 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Rin Nas 2011-06-21 15:43:20 UTC
Hello.
My name is Rinat, I want to contribute by my class.

PHP5 UTF-8 is a UTF-8 aware library of functions mirroring PHP's own string functions

http://code.google.com/p/php5-utf8/source/browse/

This package is advance of http://sourceforge.net/projects/phputf8 (last updated in 2007).

*Features and benefits of using this class*
  * Compatibility with the interface standard PHP functions that deal with single-byte encodings
  * Ability to work without PHP extensions ICONV and MBSTRING, if any, that are actively used!
  * Useful features are missing from the ICONV and MBSTRING
  * The methods that take and return a string, are able to take and return null (useful for selects from a database)
  * Several methods are able to process arrays recursively
  * A single interface and encapsulation (you can inherit and override)
  * High performance, reliability and quality code
  * PHP> = 5.3.x

Example:
  $s = 'Hello, Привет';
  if (UTF8::is_utf8($s)) echo UTF8::strlen($s);
Comment 1 Roan Kattouw 2011-06-21 15:45:42 UTC
(In reply to comment #0)
>   * PHP> = 5.3.x
> 
Does this mean PHP 5.3 or higher is required? MediaWiki currently requires PHP 5.2.3 or higher.
Comment 2 Chad H. 2011-06-21 20:35:19 UTC
I'm also not quite sure what we're trying to solve here. Are there specific bugs in MediaWiki you think we could fix by using your code?
Comment 3 Mark A. Hershberger 2011-06-22 18:27:10 UTC
>  * High performance, reliability and quality code

Sounds good, but do you have anything to back up the “High performance” claim?  Benchmarks comparing your code with iconv would be helpful here.
Comment 4 Rin Nas 2011-06-27 09:07:00 UTC
Did you see source code? Try to look for "faster" and "speed" entries.

iconv and mbstring

* iconv() faster then mb_convert_encoding()
* mb_strlen() faster then strlen(utf8_decode())
* strlen(utf8_decode()) faster then iconv_strlen()
* mb_substr() faster then iconv_substr()
* mb_strpos() faster then iconv_strpos() 

Other (hacks)
* preg_match('~~suSX') much faster (up to 4 times), then mb_check_encoding($data, 'UTF-8')
* strtr() 2-3 times faster then mb_strtolower()
* preg_match_all('~.~suSX', $s, $m) faster then native on PHP

So PHP5 UTF-8 uses the fastest available method between mbstring, iconv and native/hacks.
Comment 5 Sam Reed (reedy) 2011-06-27 16:50:03 UTC
Can you provide us with some benchmarks to prove this? Rather than just giving random numbers?
Comment 6 Chad H. 2011-06-27 16:59:53 UTC
(In reply to comment #5)
> Can you provide us with some benchmarks to prove this? Rather than just giving
> random numbers?

Also, I'd like to see how/where it works with MediaWiki and what behavior it changes (a huge library change like this *must* come with unit tests!)

It's all fine and good to write a class that's really efficient at doing something, but if it doesn't give us any benefits then there's no reason to integrate it.
Comment 7 Mark A. Hershberger 2011-06-27 21:18:13 UTC
Closing this as WONTFIX.  Feel free to reopen when you can provide some benchmarks and unit tests that would make us much more comfortable integrating your library.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links