Last modified: 2014-09-14 06:05:33 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T46756, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 44756 - [Errors 301] wikistats: Update entry URLs for wikis that redirect elsewhere or no longer exist
[Errors 301] wikistats: Update entry URLs for wikis that redirect elsewhere o...
Status: RESOLVED FIXED
Product: Wikimedia Labs
Classification: Unclassified
wikistats (Other open bugs)
unspecified
All All
: Normal normal
: ---
Assigned To: Daniel Zahn
http://wikistats.wmflabs.org/display....
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-02-07 16:08 UTC by Nemo
Modified: 2014-09-14 06:05 UTC (History)
5 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
HTTP 301 entries (45.70 KB, text/csv)
2013-02-07 16:08 UTC, Nemo
Details

Description Nemo 2013-02-07 16:08:38 UTC
Created attachment 11746 [details]
HTTP 301 entries

There are currently 121 wikis which return HTTP error 301 (moved permanently): they mostly are wikis which changed URL structure slightly; some work in a browser like http://www.wikilou.com/lou/api.php?action=query&meta=siteinfo&maxlag=5 and some don't because the redirection is not entirely correct, anyway they should be updated.
They are currently visible at page 8 (!) as in the URL; CSV export attached.

(The errors 302 are green in the table but are not the correct way to redirect and are in fact mostly dead wikis.)
Comment 1 Krinkle 2013-09-18 03:19:16 UTC
Comment on attachment 11746 [details]
HTTP 301 entries

Fix mime type.
Comment 2 Nemo 2013-09-18 05:11:31 UTC
(I need the HTTP error in summary to quickly tell from summary what is what.)
Comment 3 Daniel Zahn 2014-09-13 01:31:22 UTC
as of today, number of wikis with status 301 is 540.

the update.php has a feature called "fixit" which is going through wikis with 301 or 302 and tries to get the new location, then offers it to the user who can semi-automatically update the URLs.

let me try at least reducing the number for now
Comment 4 Daniel Zahn 2014-09-13 02:10:21 UTC
actually, started with "302"s. there were 370 of them.

running this also finds some that are just not wikis at all anymore (deleting them on sight) and some duplicates that are showing up only if we try to change the URL to the redirect target.

i deleted a couple like "suspended page", ones where only the 'image' path was left but all else deleted.. and so on..

a lot of the ones that are legimate changes and still active are just "http->https", updating those.

also, what is this "wiki.smu.edu.sg" with the weird URLs and so many of them? we have 52 entries in the table. all of them were "http->https" so that fixed a few

--

down from 370 to 280 times 302 .. will continue later
Comment 5 Daniel Zahn 2014-09-13 18:27:40 UTC
went through a lot more 302's, many are 200 now (mostly http->https changes) and fixed, also many are deleted (mostly suspended domains, also stuff like moved to other type of wiki, just broken and whatnot)

there are now only 27 left (and some of them are "method 7" without any API URL).

wanna check those manually?
Comment 6 Daniel Zahn 2014-09-13 20:10:21 UTC
now to the "301s". when starting there were 548 of them (across all methods). I adjusted the update.php and functions.php to make the "fixit" method work with 301s instead of 302s.

after doing some manually i added PHP code for an "autofix" mode that attempts to detect all the ones where only the protocol differs and fix all the http->https ones

then even went a step further and had it delete the ones that would be duplicates when updating them..

down to 336 from 548 ...
Comment 7 Daniel Zahn 2014-09-13 21:37:16 UTC
..let it update all the remaining URLs automatically and ran another round of updates..  down to just 48 x 301. those would be left for manual inspection. 

also note how on each update run that number changes slightly.

can't really invest much more time in this. there will always be some of them left at any given moment. it just needs a cleanup like this every once in a while

i'd like to close the bug at this point unless you guys want to check more manually. i have done what i could with reasonable effort. the rest can be done via "wsa" or you give me specific lists which URLs to change or delete.
Comment 8 Nemo 2014-09-14 06:05:33 UTC
(In reply to Daniel Zahn from comment #7)
> can't really invest much more time in this. there will always be some of
> them left at any given moment. it just needs a cleanup like this every once
> in a while
> 
> i'd like to close the bug at this point unless you guys want to check more
> manually. i have done what i could with reasonable effort. the rest can be
> done via "wsa" or you give me specific lists which URLs to change or delete.

Yes, you've been wonderful. I'll do another round of checks in a couple weeks when I have more time.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links