Last modified: 2012-09-19 21:00:51 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T39536, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 37536 - Write attempts to wiki API from toolserver timeout
Write attempts to wiki API from toolserver timeout
Status: RESOLVED FIXED
Product: Wikimedia
Classification: Unclassified
General/Unknown (Other open bugs)
wmf-deployment
All Solaris
: Unprioritized major (vote)
: ---
Assigned To: Nobody - You can work on this!
: ipv6, ops
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-06-13 07:44 UTC by DrTrigon
Modified: 2012-09-19 21:00 UTC (History)
8 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description DrTrigon 2012-06-13 07:44:04 UTC
Hello everybody!

Since 7th (may be 6th) of June my bot is not able to write pages on
dewiki larger than ~130KB. The bot code was not changed but now occurs
MaxTriesExceeded errors with <urlopen error timed out> all the time.
Before it was able to write pages >600KB. It looks like the uplink
has slowed down (bottleneck?) or the timeout was reduced... I am
using pywikipedia framework and know of at least 1 other user having
the same issue. Any idea what might be the problem here?

Thanks a lot and greetings!
DrTrigon

ps: This was originally reported to http://lists.wikimedia.org/pipermail/toolserver-l/2012-June/005013.html
Comment 1 Merlijn van Deen (test) 2012-06-13 21:53:59 UTC
Simple test script, based on http://nl.wikipedia.org/wiki/Lijst_van_alle_Radio_2_Top_2000's (1,189,202 bytes) . These were run from willow.toolserver.org.

/* ----------------------

import wikipedia
import datetime

p_get = wikipedia.Page('nl', "Lijst_van_alle_Radio_2_Top_2000's")
p_put = wikipedia.Page('nl', 'Gebruiker:Valhallasw/lange pagina')
text = p_get.get()
print len(text)
text = datetime.datetime.now().isoformat() + "\n\n" + p_get.get()
p_put.put(text)

   ---------------------- */

Under IPv6 (default), the output is the following:

/* --------------------
(...snip...)
>>> print len(text)
1189202
>>> text = datetime.datetime.now().isoformat() + "\n\n" + p_get.get()
>>> p_put.put(text)
Sleeping for 3.8 seconds, 2012-06-13 21:50:10
Updating page [[Gebruiker:Valhallasw/lange pagina]] via API
<urlopen error timed out>
WARNING: Could not open 'http://nl.wikipedia.org/w/api.php'. Maybe the server or
 your connection is down. Retrying in 1 minutes...
   -------------------- */

Under IPv4 (with the patch shown below), the output is the following:

/* --------------------
(...snip...)
>>> print len(text)
1189202
>>> text = datetime.datetime.now().isoformat() + "\n\n" + p_get.get()
>>> p_put.put(text)
Sleeping for 4.0 seconds, 2012-06-13 21:48:27
Updating page [[Gebruiker:Valhallasw/lange pagina]] via API
(302, 'OK', {u'pageid': 2846006, u'title': u'Gebruiker:Valhallasw/lange pagina', u'newtimestamp': u'2012-06-13T21:49:21Z', u'result': u'Success', u'oldrevid': 31455180, u'newrevid': 31455194})
   -------------------- */

The hack to test this is the following:

Index: families/wikipedia_family.py
===================================================================
--- families/wikipedia_family.py        (revision 10117)
+++ families/wikipedia_family.py        (working copy)
@@ -44,7 +44,7 @@
         if family.config.SSL_connection:
             self.langs = dict([(lang, None) for lang in self.languages_by_size])
         else:
-            self.langs = dict([(lang, '%s.wikipedia.org' % lang) for lang in self.languages_by_size])
+            self.langs = dict([(lang, '91.198.174.225') for lang in self.languages_by_size])

         # Override defaults
         self.namespaces[1]['ja'] = [u'ノート', u'トーク']
Index: wikipedia.py
===================================================================
--- wikipedia.py        (revision 10117)
+++ wikipedia.py        (working copy)
@@ -5437,6 +5437,7 @@
             'User-agent': useragent,
             'Content-Length': str(len(data)),
             'Content-type':contentType,
+            'Host': 'nl.wikipedia.org',
         }
         if cookies:
             headers['Cookie'] = cookies
Index: pywikibot/comms/http.py
===================================================================
--- pywikibot/comms/http.py     (revision 10117)
+++ pywikibot/comms/http.py     (working copy)
@@ -54,6 +54,7 @@

     headers = {
         'User-agent': useragent,
+        'Host': 'nl.wikipedia.org',
         #'Accept-Language': config.mylang,
         #'Accept-Charset': config.textfile_encoding,
         #'Keep-Alive': '115',


Note, however, that this could also be a bug in the python http stack...
Comment 2 Mark Bergsma 2012-06-14 16:07:37 UTC
These issues sound a lot like MTU (network) problems. I've just made some (manual) changes on the servers involved. Could you please check if the situation is different now?
Comment 3 Merlijn van Deen (test) 2012-06-15 18:11:07 UTC
Yes, this has improved the situation. The behaviour over IPv4 and IPv6 are now comparable:


(IPv6)
Time to get page: 1.133143 s
Time to put page: 63.166557 s

(IPv4)
Time to get page: 1.369060 s
Time to put page: 57.909367 s


Although the transfer rate (1.1MB in 60s = 19kB/s) is not very spectacular - at least it's consistent for the two, and there is no timeout.
Comment 4 DrTrigon 2012-06-17 12:44:31 UTC
(In reply to comment #2)
> These issues sound a lot like MTU (network) problems. I've just made some
> (manual) changes on the servers involved. Could you please check if the
> situation is different now?

This seems to solve the get issues, my bot did not complain anymore since moring of 15th! Thanks so far! (I did not check what max. page size is limitting now)
Comment 5 DrTrigon 2012-06-17 12:47:23 UTC
(In reply to comment #3)
> Yes, this has improved the situation. The behaviour over IPv4 and IPv6 are now
> comparable:
> 
> 
> (IPv6)
> Time to get page: 1.133143 s
> Time to put page: 63.166557 s
> 
> (IPv4)
> Time to get page: 1.369060 s
> Time to put page: 57.909367 s
> 
> 
> Although the transfer rate (1.1MB in 60s = 19kB/s) is not very spectacular - at
> least it's consistent for the two, and there is no timeout.

Indeed looks good (or better ;) now!

Could it be the case that the get times increased too, e.g. compared to May?
Comment 6 Nemo 2012-08-24 07:54:53 UTC
This is fixed, isn't it.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links