Last modified: 2014-07-20 11:01:37 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T57145, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 55145 - weblinkchecker URL unicode problems
weblinkchecker URL unicode problems
Status: NEW
Product: Pywikibot
Classification: Unclassified
weblinkchecker.py (Other open bugs)
unspecified
All All
: Unprioritized normal
: ---
Assigned To: Pywikipedia bugs
:
: 55318 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-10-05 04:32 UTC by Kunal Mehta (Legoktm)
Modified: 2014-07-20 11:01 UTC (History)
1 user (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Kunal Mehta (Legoktm) 2013-10-05 04:32:03 UTC
Originally from: http://sourceforge.net/p/pywikipediabot/bugs/1613/
Reported by: valhallasw
Created on: 2013-04-13 19:55:05
Subject: weblinkchecker URL unicode problems
Original description:
As reported by Anima in https://sourceforge.net/tracker/?func=detail&aid=3602096&group\_id=93107&atid=603139

Weblinkchecker jumps through some strange unicode hoops. There is no such thing as a unicode URL - URLs are /always/ urlencoded UTF-8 strings, so:
>>> urllib.quote\(u"ö".encode\('utf-8'\)\)
'%C3%B6'

anything else is \*wrong\*, including things like asking what encoding the web server uses: that is only relevant for decoding the page \*text\*.

Basic test case:
>>> import weblinkchecker
>>> lc = weblinkchecker.LinkChecker\(u"http://svoya-igra.org/Райков Александр Вадимович/"\)
Contacting server svoya-igra.org to find out its default encoding...
Error retrieving server's default charset. Using ISO 8859-1.
Traceback \(most recent call last\):
File "<stdin>", line 1, in <module>
File "weblinkchecker.py", line 218, in \_\_init\_\_
self.changeUrl\(url\)
File "weblinkchecker.py", line 275, in changeUrl
self.path = unicode\(urllib.quote\(self.path.encode\(encoding\)\)\)
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 1-6: ordinal not in range\(256\)


valhallasw@lisilwen:~/src/pywikipedia/trunk/pywikipedia$ python version.py
Pywikipedia \[svn+ssh\] valhallasw@trunk/pywikipedia \(r11368, 2013/04/13, 08:16:45, ok\)
Python 2.7.3 \(default, Aug  1 2012, 05:14:39\)
\[GCC 4.6.3\]
config-settings:
use\_api = True
use\_api\_login = True
unicode test: ok
Comment 1 Ricordisamoa 2014-07-20 11:01:37 UTC
*** Bug 55318 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links