Last modified: 2014-03-11 10:45:53 UTC
Presently, robots.txt has Disallow: /w/ Thus, the raw wikitext of pages isn't accessible via the Internet Archive; see e.g. https://web.archive.org/web/20140307111730/http://en.wikipedia.org/w/index.php?title=Main_Page&action=raw This is in contrast to sites like WikiIndex, which allow it: https://web.archive.org/web/20131021230044/http://wikiindex.org/index.php?title=Welcome&action=raw We should allow the Internet Archiver to index these pages so that the raw wikitext will be available for future generations, even if the page goes away. See [[mw:Manual:Robots.txt#Allow_indexing_of_raw_pages_by_the_Internet_Archiver]].
(In reply to Nathan Larson from comment #0) > We should allow the Internet Archiver to index these pages so that the raw > wikitext will be available for future generations, even if the page goes > away. We already regularly dump the DBs and push those dumps to Internet Archive. What else does action=raw get us?
(In reply to jeremyb from comment #1) > We already regularly dump the DBs and push those dumps to Internet Archive. > > What else does action=raw get us? I guess it depends; is there a way to get wikitext of individual pages without downloading the whole dump, assuming the page is no longer on-wiki?
(In reply to jeremyb from comment #1) > We already regularly dump the DBs and push those dumps to Internet Archive. > > What else does action=raw get us? Integration in Wayback machine. Not sure it's worth it though. (In reply to Nathan Larson from comment #2) > I guess it depends; is there a way to get wikitext of individual pages > without downloading the whole dump, assuming the page is no longer on-wiki? Server-side bzgrep? :P Probably no.