Last modified: 2014-07-20 11:01:37 UTC
Originally from: http://sourceforge.net/p/pywikipediabot/bugs/1613/ Reported by: valhallasw Created on: 2013-04-13 19:55:05 Subject: weblinkchecker URL unicode problems Original description: As reported by Anima in https://sourceforge.net/tracker/?func=detail&aid=3602096&group\_id=93107&atid=603139 Weblinkchecker jumps through some strange unicode hoops. There is no such thing as a unicode URL - URLs are /always/ urlencoded UTF-8 strings, so: >>> urllib.quote\(u"ö".encode\('utf-8'\)\) '%C3%B6' anything else is \*wrong\*, including things like asking what encoding the web server uses: that is only relevant for decoding the page \*text\*. Basic test case: >>> import weblinkchecker >>> lc = weblinkchecker.LinkChecker\(u"http://svoya-igra.org/Райков Александр Вадимович/"\) Contacting server svoya-igra.org to find out its default encoding... Error retrieving server's default charset. Using ISO 8859-1. Traceback \(most recent call last\): File "<stdin>", line 1, in <module> File "weblinkchecker.py", line 218, in \_\_init\_\_ self.changeUrl\(url\) File "weblinkchecker.py", line 275, in changeUrl self.path = unicode\(urllib.quote\(self.path.encode\(encoding\)\)\) UnicodeEncodeError: 'latin-1' codec can't encode characters in position 1-6: ordinal not in range\(256\) valhallasw@lisilwen:~/src/pywikipedia/trunk/pywikipedia$ python version.py Pywikipedia \[svn+ssh\] valhallasw@trunk/pywikipedia \(r11368, 2013/04/13, 08:16:45, ok\) Python 2.7.3 \(default, Aug 1 2012, 05:14:39\) \[GCC 4.6.3\] config-settings: use\_api = True use\_api\_login = True unicode test: ok
*** Bug 55318 has been marked as a duplicate of this bug. ***