Last modified: 2013-12-23 08:57:28 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T60872, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 58872 - Set locale if system uses wrong default
Set locale if system uses wrong default
Status: RESOLVED INVALID
Product: Pywikibot
Classification: Unclassified
General (Other open bugs)
compat-(1.0)
All All
: Unprioritized major
: ---
Assigned To: Pywikipedia bugs
:
Depends on:
Blocks: 58181
  Show dependency treegraph
 
Reported: 2013-12-22 20:22 UTC by DrTrigon
Modified: 2013-12-23 08:57 UTC (History)
1 user (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description DrTrigon 2013-12-22 20:22:30 UTC
The grid engine on tool labs has another default locale setting than the console.

Grid engine:

>>> import locale
>>> print locale.localeconv()
{'mon_decimal_point': '', 'int_frac_digits': 127, 'p_sep_by_space': 127, 'frac_digits': 127, 'thousands_sep': '', 'n_sign_posn': 127, 'decimal_point': '.', 'int_curr_symbol': '', 'n_cs_precedes': 127, 'p_sign_posn': 127, 'mon_thousands_sep': '', 'negative_sign': '', 'currency_symbol': '', 'n_sep_by_space': 127, 'mon_grouping': [], 'p_cs_precedes': 127, 'positive_sign': '', 'grouping': []}
>>> print locale.getdefaultlocale()
(None, None)
>>> print locale.getlocale()
(None, None)
>>> print locale.getpreferredencoding()
ANSI_X3.4-1968

Console:

>>> import locale
>>> print locale.localeconv()
{'mon_decimal_point': '', 'int_frac_digits': 127, 'p_sep_by_space': 127, 'frac_digits': 127, 'thousands_sep': '', 'n_sign_posn': 127, 'decimal_point': '.', 'int_curr_symbol': '', 'n_cs_precedes': 127, 'p_sign_posn': 127, 'mon_thousands_sep': '', 'negative_sign': '', 'currency_symbol': '', 'n_sep_by_space': 127, 'mon_grouping': [], 'p_cs_precedes': 127, 'positive_sign': '', 'grouping': []}
>>> print locale.getdefaultlocale()
('en_US', 'UTF-8')
>>> print locale.getlocale()
(None, None)
>>> print locale.getpreferredencoding()
UTF-8

The one from console works with pywikibot, the other one not, see Bug 58181. Essentially the issue is that the locale on the grid engine is not set properly. But it is not important where this error comes from, the bots must not crash in such situations.

I propose to check 'locale.getdefaultlocale()' on startup and compare it to 'config.textfile_encoding' (may be also 'config.console_encoding') IFF they mismatch, the encoding has to be set according to config in order to use the correct one.
Comment 1 Merlijn van Deen (test) 2013-12-22 20:38:07 UTC
This is not a pywikibot issue, but an issue with your code - as I have explained before. Filenames should *never* be unicode strings -- always byte strings. It's just luck (or rather: a combination of factors that happens to be just right) that it works in the shell.
Comment 2 Marcin Cieślak 2013-12-22 20:42:13 UTC
In my case:

>>> locale.getdefaultlocale()
('pl_PL', 'UTF8')
>>> locale.getpreferredencoding()
'UTF-8'

why is en_US better?

If I create files automatically for example out of article names, I prefer to .encode("utf-8") unicode strings manually without resorting to locale module
Comment 3 DrTrigon 2013-12-23 08:57:28 UTC
(In reply to comment #1)
> This is not a pywikibot issue, but an issue with your code - as I have
> explained before. Filenames should *never* be unicode strings -- always byte
> strings. It's just luck (or rather: a combination of factors that happens to
> be
> just right) that it works in the shell.

As explained I am always very confused by this unicode vs. bytecode stuff - I know the details - I am just mixing it up all the time... so please be patient with me.

I did correct all those errors and issues within my scripts once and thus was not aware (and even more confused) that there are still bugs.

Since you make the impression to be "the expert" on such string issues I am desperately needed your help and might will need it again in future.

I was e.g. enormously confused by the fact that unicode (strings?) do also need an internal representation in python and I always assumed this has to be UTF (8, 16 or 32) thus I was mixing UTF and unicode conceptually. Now I learned about UCS [1] and should have sorted it out:

-(byte)string (ASCII, UTF or else)
-unicode (internally UCS)

encode: unicode -> bytestring
decode: bytestring -> unicode

[1] http://www.cmlenz.net/archives/2008/07/the-truth-about-unicode-in-python

Please correct me if I did say something wrong (again ;) ...

btw.: using 'UTF' locale on tool labs grid engine would not be the correct solution, but it would have helped and not do any harm anyway so I don't see the with that problem there... but this is not an issue anymore... ;)

Greetings

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links