Last modified: 2014-07-24 12:49:24 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T57219, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 55219 - Timeout when updating complex pages
Timeout when updating complex pages
Status: NEW
Product: Pywikibot
Classification: Unclassified
network (Other open bugs)
unspecified
All All
: High normal
: ---
Assigned To: Pywikipedia bugs
:
: 55162 56884 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-10-05 04:45 UTC by Kunal Mehta (Legoktm)
Modified: 2014-07-24 12:49 UTC (History)
10 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Kunal Mehta (Legoktm) 2013-10-05 04:45:16 UTC
Originally from: http://sourceforge.net/p/pywikipediabot/bugs/1399/
Reported by: malafaya
Created on: 2012-01-17 00:22:50
Subject: Updating complex pages
Original description:
When updating complex pages, it's common to get a Timeout, because the Wikimedia server does not process and return the page within the expected time. In suchs cases \(when a timeout exception is thrown\), my suggestion si that pywikipedia should try to fetch the page again and check if there are any differences against the new page to be saved. If not, then it should proceed and not block indefinitely in such pages.
Comment 1 Kunal Mehta (Legoktm) 2013-10-05 04:45:18 UTC
This is the way the bot works. It trys to put the page for several times which is given by maxretries in the \(user\_\)config.py. Edit conflicts are detected \(by the mw api\) except you are using your bot account for multiple edits on the same page in the same time.
Comment 2 Kunal Mehta (Legoktm) 2013-10-05 04:45:20 UTC
- **status**: open --> pending
Comment 3 Kunal Mehta (Legoktm) 2013-10-05 04:45:21 UTC
Hmmm, I'm not sure you understood. I'm not updating the page more than once simultaneoulsy. It's just one bot run. As the page is a complicated one, the server does not respond on time \(you can try \[\[Europa\]\] at pt.wiktionary\). The bot then tries again, but obviously the same happens. The difference is that the page has already been updated in the first try, even if the server has not responded. In operations such as replace.py, where it's common to edit long pages, you get in a long loop.
Comment 4 Kunal Mehta (Legoktm) 2013-10-05 04:45:23 UTC
- **status**: pending --> open
Comment 5 Kunal Mehta (Legoktm) 2013-10-05 04:45:25 UTC
I'm talking about this error:

Updating page \[\[Sri Lanka\]\] via API
HTTPError: 504 Gateway Time-out

The page to be updated is quite big so the server does not reply on time.
1\) Is there a way to increase the timeout? I believe this is controlled by the server, not the HTTP client...
2\) The page was updated on the first try but as the page is not refreshed between retries, the bot doesn't know and will try to update it "forever"
Comment 6 xqt 2013-11-11 06:23:26 UTC
*** Bug 56884 has been marked as a duplicate of this bug. ***
Comment 7 2013-11-14 10:51:49 UTC
Checking this morning with Faebot, 1.6% of get/put transactions have failed out of a sample of more than 1,000. These were small size category changes rather than file uploads or large page edits. I believe most failures have been on putting pages rather than getting them, but I have seen getting pages causing this failure.

As everyone appears affected, not just API users, I have asked for feedback at the Village pump (http://commons.wikimedia.org/w/index.php?title=Commons:Village_pump&diff=prev&oldid=109634734).

I am not convinced that this is a pywikipediabot specific problem, it does not relate to any changes in pywikipediabot which has never before had this problem with this frequency, so the bug report (1399) above may well be a dead end.
Comment 8 zhuyifei1999 2013-11-14 10:53:39 UTC
503 is also happening:

Sleeping for 7.9 seconds, 2013-11-13 11:20:55

Updating page \[\[File:Русский энциклопедический словарь Березина 4.2 077.jpg\]\] via API

Result: 503 Service Unavailable

Traceback (most recent call last):
(hidden)
  File "(hidden)/pywikipedia/wikipedia.py", line 2242, in put
    sysop=sysop, botflag=botflag, maxTries=maxTries)
  File "(hidden)/pywikipedia/wikipedia.py", line 2339, in _putPage
    back_response=True)
  File "(hidden)/pywikipedia/pywikibot/support.py", line 121, in wrapper
    return method(*__args, **__kw)
  File "(hidden)/pywikipedia/query.py", line 138, in GetData
    site.cookies(sysop=sysop))
  File "(hidden)/pywikipedia/wikipedia.py", line 6977, in postForm
    cookies=cookies)
  File "(hidden)/pywikipedia/wikipedia.py", line 7021, in postData
    f = MyURLopener.open(request)
  File "/usr/lib/python2.7/urllib2.py", line 406, in open
    response = meth(req, response)
  File "/usr/lib/python2.7/urllib2.py", line 519, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib/python2.7/urllib2.py", line 444, in error
    return self._call_chain(*args)
  File "/usr/lib/python2.7/urllib2.py", line 378, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.7/urllib2.py", line 527, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 503: Service Unavailable
Comment 9 iDangerMouse 2013-11-14 11:09:07 UTC
The problem is still ongoing
Comment 10 Morten Wang 2013-11-18 19:56:55 UTC
I'd like to second this.  When saving large complex pages, I frequently get 503 responses.  As Daniel Schwen notes in bug 56884, it would be great to be able to tell Pywikibot to _not_ retry and instead manually check if the edit went through.
Comment 11 Morten Wang 2013-11-18 20:11:41 UTC
I patched my local copy of Pywikibot core, adding a max_retries parameter to editpage() to only allow it to attempt an edit once.  No changes to other files appear necessary since Page.save() passes on any additional parameters.  Should I propose that as a patch?  If so, what format is preferred?
Comment 12 Merlijn van Deen (test) 2013-11-18 20:15:27 UTC
If you could upload it to gerrit (either via git directly, or via the patch uploader at https://tools.wmflabs.org/gerrit-patch-uploader/ ), that would be really nice. 

I'm a bit confused however, as data.api.Request seems to get max_retries from the config file. Does it get passed another value of max_retries somewhere? I can't find where that would be...
Comment 13 Morten Wang 2013-11-18 20:45:46 UTC
data.api.Request does kwargs.pop(), so if it gets instantiated with a max_retries parameter it will use that value, otherwise it reads the config parameter.

In my case I found that I can just set pywikibot.config.max_retries instead of passing it as a parameter to Page.save().  Arguably nicer than passing a parameter around, which requires some way of handling a default value.  Sorry about not figuring that out earlier.
Comment 14 Merlijn van Deen (test) 2013-11-18 20:49:24 UTC
I'm still a bit confused by Daniel's comment:

> Now pywikipediabot tries again by itself an apparently infinite amount of times
> Despite having set max_retries to 2 in my user-config.py

but this does seem to work for me (at least: setting max_retries in user-config.py sets pywikibot.config.max_retries). Strange.
Comment 15 Daniel Schwen 2013-11-18 21:15:46 UTC
Ahhrgh! I changed the max_retries setting in ./user-config.py but core reads ~/.pywikibot/user-config.py 

Sorry. Will try again with the new setting.
Comment 16 Bawolff (Brian Wolff) 2013-12-11 02:56:31 UTC
On the wikimedia side see also bug 57026. (Not a dupe since Pywikipedia should also handle these situations gracefully.)
Comment 17 Ricordisamoa 2014-04-16 03:01:09 UTC
*** Bug 55162 has been marked as a duplicate of this bug. ***
Comment 18 Amir Ladsgroup 2014-07-24 12:36:40 UTC
*** Bug 55179 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links