Last modified: 2014-09-09 02:23:28 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T57165, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 55165 - Wikia returns cached pages for get.py editarticle.py
Wikia returns cached pages for get.py editarticle.py
Status: NEW
Product: Pywikibot
Classification: Unclassified
Other scripts (Other open bugs)
compat-(1.0)
All All
: Unprioritized normal
: ---
Assigned To: Pywikipedia bugs
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-10-05 04:36 UTC by Kunal Mehta (Legoktm)
Modified: 2014-09-09 02:23 UTC (History)
3 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Kunal Mehta (Legoktm) 2013-10-05 04:36:10 UTC
Originally from: http://sourceforge.net/p/pywikipediabot/bugs/1537/
Reported by: throwy
Created on: 2012-11-07 12:36:40
Subject: Wikia returns cached pages for get.py  editarticle.py
Original description:
get.py and editarticle.py use a method of page fetching that results in cached pages from Wikia
replace.py uses the pagegenerator method, which fetches the latest version of pages from Wikia

The issue is probably a Wikia issue, but it would be nice to implement a workaround in pywikipediabot.

Steps to reproduce:
Create or edit a page on a Wikia wiki. Fetch the page with editarticle.py or get.py . The bot should fetch a cached version. Edit the page with replace.py and the bot should fetch the most recent version, which is the expected behavior.

Comments:
Someone had already solved this issue for me on \#pywikipediabot on freenode. It requires very little alteration to get.py and editarticle.py. Unfortunately I did not back up or document the changes before updating pywikipediabot from SVN and the changes were lost.

\----

$ python version.py
Pywikipedia \[http\] trunk/pywikipedia \(r10663, 2012/11/04, 19:53:31\)
Python 2.7.3 \(v2.7.3:70274d53c1dd, Apr  9 2012, 20:52:43\) 
\[GCC 4.2.1 \(Apple Inc. build 5666\) \(dot 3\)\]
config-settings:
use\_api = True
use\_api\_login = True
unicode test: ok
Comment 1 Kunal Mehta (Legoktm) 2013-10-05 04:36:12 UTC
Some remarks:
I changed the hostname\(\) in family file to "mlp.wikia.com" and used the following statements:

import wikipedia as wp
s = wp.getSite\('wikia', 'wikia'\)
p = wp.Page\(s, 'Template:Date/doc'\)
t = p.get\(force=True\)

result:
Traceback \(most recent call last\):
File "<pyshell\#69>", line 1, in <module>
t = p.get\(force=True\)
File "wikipedia.py", line 699, in get
expandtemplates = expandtemplates\)
File "wikipedia.py", line 800, in \_getEditPage
"Page does not exist. In rare cases, if you are certain the page does exist, look into overriding family.RversionTab"\)
NoPage: \(wikia:wikia, u'\[\[wikia:Template:Date/doc\]\]', 'Page does not exist. In rare cases, if you are certain the page does exist, look into overriding family.RversionTab'\)

the query param dict was:
\{'inprop': \['protection', 'subjectid'\], 'rvprop': \['content', 'ids', 'flags', 'timestamp', 'user', 'comment', 'size'\], 'prop': \['revisions', 'info'\], 'titles': u'Template:Date/doc', 'rvlimit': 1, 'action': 'query'\}

the result data dict was:
\{u'query': \{u'pages': \{u'-1': \{u'protection': \[\], u'ns': 10, u'missing': u'', u'title': u'Template:Date/doc'\}\}\}\}

and last the url is:
/api.php?inprop=protection%7Csubjectid&format=json&rvprop=content%7Cids%7Cflags%7Ctimestamp%7Cuser%7Ccomment%7Csize&prop=revisions%7Cinfo&titles=Template%3ADate/doc&rvlimit=1&action=query

which gives the right result via browser e.g.:
http://mlp.wikia.com/api.php?inprop=protection%7Csubjectid&format=json&rvprop=content%7Cids%7Cflags%7Ctimestamp%7Cuser%7Ccomment%7Csize&prop=revisions%7Cinfo&titles=Template%3ADate/doc&rvlimit=1&action=query&format=jsonfm
Comment 2 Kunal Mehta (Legoktm) 2013-10-05 04:36:14 UTC
I found the patched editarticle.py on pastebin, woohoo\!

<pre>33a34
> import pagegenerators
157c158
<         self.page = pywikibot.Page\(site, pageTitle\)
\---
> 	self.page = iter\(pagegenerators.PreloadingGenerator\(\[pywikibot.Page\(site, pageTitle\)\]\)\).next\(\)</pre>
Comment 3 Kunal Mehta (Legoktm) 2013-10-05 04:36:16 UTC
- **milestone**:  --> trunk
Comment 4 Kunal Mehta (Legoktm) 2013-10-05 04:36:18 UTC
diff of editarticle.py with working pagegenerators fetching
Comment 5 Kunal Mehta (Legoktm) 2013-10-05 04:36:19 UTC
diff of get.py with working pagegenerators fetching
Comment 6 Kunal Mehta (Legoktm) 2013-10-05 04:36:21 UTC
Yes this path retrieves the page content via special:import instead of API because API bulk call is not approved for the trunk release. Thus this patch wouldn't work for rewrite branch.

Anyway it is not clear for me, why the api returns the data by browser call but not via bot frameworks query.
Comment 7 Kunal Mehta (Legoktm) 2013-10-05 04:36:23 UTC
changed the hostname() in family file to "mlp.wikia.com" and used the following statements:

import wikipedia as wp
s = wp.getSite('wikia', 'wikia')
p = wp.Page(s, 'Template:Date/doc')
t = p.get(force=True)

works for me.
Comment 8 Kunal Mehta (Legoktm) 2013-10-05 04:36:25 UTC
adding version info:
Pywikipedia [https] r/pywikibot/compat (r10308, a208b54, 2013/09/24, 09:51:19, ok)
Python 2.7.3 (default, Apr 10 2012, 23:24:47) [MSC v.1500 64 bit (AMD64)]
config-settings:
use_api = True
use_api_login = True
unicode test: ok
Comment 9 Nemo 2014-07-19 08:58:51 UTC
Was this ever reported to Wikia? If not please write to community AT wikia.com

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links