Last modified: 2012-09-19 21:00:51 UTC
Hello everybody! Since 7th (may be 6th) of June my bot is not able to write pages on dewiki larger than ~130KB. The bot code was not changed but now occurs MaxTriesExceeded errors with <urlopen error timed out> all the time. Before it was able to write pages >600KB. It looks like the uplink has slowed down (bottleneck?) or the timeout was reduced... I am using pywikipedia framework and know of at least 1 other user having the same issue. Any idea what might be the problem here? Thanks a lot and greetings! DrTrigon ps: This was originally reported to http://lists.wikimedia.org/pipermail/toolserver-l/2012-June/005013.html
Simple test script, based on http://nl.wikipedia.org/wiki/Lijst_van_alle_Radio_2_Top_2000's (1,189,202 bytes) . These were run from willow.toolserver.org. /* ---------------------- import wikipedia import datetime p_get = wikipedia.Page('nl', "Lijst_van_alle_Radio_2_Top_2000's") p_put = wikipedia.Page('nl', 'Gebruiker:Valhallasw/lange pagina') text = p_get.get() print len(text) text = datetime.datetime.now().isoformat() + "\n\n" + p_get.get() p_put.put(text) ---------------------- */ Under IPv6 (default), the output is the following: /* -------------------- (...snip...) >>> print len(text) 1189202 >>> text = datetime.datetime.now().isoformat() + "\n\n" + p_get.get() >>> p_put.put(text) Sleeping for 3.8 seconds, 2012-06-13 21:50:10 Updating page [[Gebruiker:Valhallasw/lange pagina]] via API <urlopen error timed out> WARNING: Could not open 'http://nl.wikipedia.org/w/api.php'. Maybe the server or your connection is down. Retrying in 1 minutes... -------------------- */ Under IPv4 (with the patch shown below), the output is the following: /* -------------------- (...snip...) >>> print len(text) 1189202 >>> text = datetime.datetime.now().isoformat() + "\n\n" + p_get.get() >>> p_put.put(text) Sleeping for 4.0 seconds, 2012-06-13 21:48:27 Updating page [[Gebruiker:Valhallasw/lange pagina]] via API (302, 'OK', {u'pageid': 2846006, u'title': u'Gebruiker:Valhallasw/lange pagina', u'newtimestamp': u'2012-06-13T21:49:21Z', u'result': u'Success', u'oldrevid': 31455180, u'newrevid': 31455194}) -------------------- */ The hack to test this is the following: Index: families/wikipedia_family.py =================================================================== --- families/wikipedia_family.py (revision 10117) +++ families/wikipedia_family.py (working copy) @@ -44,7 +44,7 @@ if family.config.SSL_connection: self.langs = dict([(lang, None) for lang in self.languages_by_size]) else: - self.langs = dict([(lang, '%s.wikipedia.org' % lang) for lang in self.languages_by_size]) + self.langs = dict([(lang, '91.198.174.225') for lang in self.languages_by_size]) # Override defaults self.namespaces[1]['ja'] = [u'ノート', u'トーク'] Index: wikipedia.py =================================================================== --- wikipedia.py (revision 10117) +++ wikipedia.py (working copy) @@ -5437,6 +5437,7 @@ 'User-agent': useragent, 'Content-Length': str(len(data)), 'Content-type':contentType, + 'Host': 'nl.wikipedia.org', } if cookies: headers['Cookie'] = cookies Index: pywikibot/comms/http.py =================================================================== --- pywikibot/comms/http.py (revision 10117) +++ pywikibot/comms/http.py (working copy) @@ -54,6 +54,7 @@ headers = { 'User-agent': useragent, + 'Host': 'nl.wikipedia.org', #'Accept-Language': config.mylang, #'Accept-Charset': config.textfile_encoding, #'Keep-Alive': '115', Note, however, that this could also be a bug in the python http stack...
These issues sound a lot like MTU (network) problems. I've just made some (manual) changes on the servers involved. Could you please check if the situation is different now?
Yes, this has improved the situation. The behaviour over IPv4 and IPv6 are now comparable: (IPv6) Time to get page: 1.133143 s Time to put page: 63.166557 s (IPv4) Time to get page: 1.369060 s Time to put page: 57.909367 s Although the transfer rate (1.1MB in 60s = 19kB/s) is not very spectacular - at least it's consistent for the two, and there is no timeout.
(In reply to comment #2) > These issues sound a lot like MTU (network) problems. I've just made some > (manual) changes on the servers involved. Could you please check if the > situation is different now? This seems to solve the get issues, my bot did not complain anymore since moring of 15th! Thanks so far! (I did not check what max. page size is limitting now)
(In reply to comment #3) > Yes, this has improved the situation. The behaviour over IPv4 and IPv6 are now > comparable: > > > (IPv6) > Time to get page: 1.133143 s > Time to put page: 63.166557 s > > (IPv4) > Time to get page: 1.369060 s > Time to put page: 57.909367 s > > > Although the transfer rate (1.1MB in 60s = 19kB/s) is not very spectacular - at > least it's consistent for the two, and there is no timeout. Indeed looks good (or better ;) now! Could it be the case that the get times increased too, e.g. compared to May?
This is fixed, isn't it.