Last modified: 2014-01-29 17:34:38 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T57160, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 55160 - Page._getVersionHistory returns only a part of a history
Page._getVersionHistory returns only a part of a history
Status: RESOLVED FIXED
Product: Pywikibot
Classification: Unclassified
General (Other open bugs)
unspecified
All All
: Unprioritized normal
: ---
Assigned To: Pywikipedia bugs
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-10-05 04:35 UTC by Kunal Mehta (Legoktm)
Modified: 2014-01-29 17:34 UTC (History)
2 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Kunal Mehta (Legoktm) 2013-10-05 04:35:18 UTC
Originally from: http://sourceforge.net/p/pywikipediabot/bugs/1546/
Reported by: dixond
Created on: 2012-11-28 13:00:50
Subject: Page._getVersionHistory returns only a part of a history
Assigned to: xqt
Original description:
There is a bug in Page.\_getVersionHistory. It doesn't load the whole history it it is large. The problem in here \(wikipedia.py\):
if len\(result\['query'\]\['pages'\].values\(\)\[0\]\['revisions'\]\) < revCount:
thisHistoryDone = True

I believe it should be as following:
if not getAll and len\(result\['query'\]\['pages'\].values\(\)\[0\]\['revisions'\]\) >= revCount:
thisHistoryDone = True

Version.py:
Pywikipedia trunk/pywikipedia/ \(r10745, 2012/11/20, 13:03:05\)
Python 2.7.3 \(default, Apr 10 2012, 23:31:26\) \[MSC v.1500 32 bit \(Intel\)\]
config-settings:
use\_api = True
use\_api\_login = True
unicode test: ok
Comment 1 Kunal Mehta (Legoktm) 2013-10-05 04:35:20 UTC
- **priority**: 5 --> 8
Comment 2 Kunal Mehta (Legoktm) 2013-10-05 04:35:22 UTC
- **priority**: 8 --> 5
Comment 3 Kunal Mehta (Legoktm) 2013-10-05 04:35:23 UTC
Are you sure that you have set getAll=True while invoking that method?
Comment 4 Kunal Mehta (Legoktm) 2013-10-05 04:35:25 UTC
- **assigned_to**: nobody --> xqt
Comment 5 Kunal Mehta (Legoktm) 2013-10-05 04:35:27 UTC
Yes, of course. It is quite obvious that the following code won't allow to load the rest of revisions by setting thisHistoryDone to True:
if len\(result\['query'\]\['pages'\].values\(\)\[0\]\['revisions'\]\) < revCount:
thisHistoryDone = True

Am I missing anything?
Comment 6 Kunal Mehta (Legoktm) 2013-10-05 04:35:29 UTC
first of all \_getVersionHistory\(\) is an internal method and you shouldn't use it directly. Use getVersionHistory\(\) instead. The the condition is quite right. Try the following statements:

import pywikibot as pwb
p = pwb.Page\('de', 'user talk:xqt'\)
h = p.getVersionHistory\(getAll=True\)
len\(h\)

which gives 4250 entries \(yet\).

Changing the condition will return 500 entries only.
Comment 7 Kunal Mehta (Legoktm) 2013-10-05 04:35:31 UTC
Changing the condition still returns 4250 entries for me \(have you missed the "not getAll and " part in my code?\)

But if I use fullVersionHistory instead of getVersionHistory, it returns only 192 entries for me. I.e. try the following code:

import wikipedia as pywikibot
p = pywikibot.Page\('de', 'user talk:xqt'\)
h = p.fullVersionHistory\(getAll=True\)
print len\(h\)
Comment 8 Kunal Mehta (Legoktm) 2013-10-05 04:35:32 UTC
Any updates? Are you able to reproduce this issue?
Comment 9 Gerrit Notification Bot 2014-01-05 22:32:01 UTC
Change 105619 had a related patch set uploaded by Mpaa:
(bug 55160) Page._getVersionHistory returns only a part of a history

https://gerrit.wikimedia.org/r/105619
Comment 10 Mpaa 2014-01-05 22:37:18 UTC
(In reply to comment #9)
> Change 105619 had a related patch set uploaded by Mpaa:
> (bug 55160) Page._getVersionHistory returns only a part of a history
> 
> https://gerrit.wikimedia.org/r/105619

h = p.getVersionHistory(getAll=True) returns the full history.

h = p.fullVersionHistory(getAll=True) returns 192 entries (now more ...).
Reason is that result might not be 'revCount' long also when 'query-continue' is returned, due to:
    {u'result':{u'*': u'This result was truncated because it would otherwise be larger than the limit of 12582912 bytes'}}

So it is not enough to check only that len() < revCount to declare that thisHistoryDone = True.
Comment 11 Gerrit Notification Bot 2014-01-29 16:14:11 UTC
Change 105619 merged by jenkins-bot:
(bug 55160) Page._getVersionHistory returns only a part of a history

https://gerrit.wikimedia.org/r/105619

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links