Last modified: 2012-05-31 13:10:34 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T38181, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 36181 - Prevent search engines from indexing the user namespace in German Wikipedia
Prevent search engines from indexing the user namespace in German Wikipedia
Status: RESOLVED FIXED
Product: Wikimedia
Classification: Unclassified
Site requests (Other open bugs)
unspecified
All All
: High enhancement (vote)
: ---
Assigned To: Nobody - You can work on this!
: ops, shell
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-04-23 15:59 UTC by Jonathan Haas
Modified: 2012-05-31 13:10 UTC (History)
5 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Jonathan Haas 2012-04-23 15:59:56 UTC
Following community discussion (see http://de.wikipedia.org/wiki/Wikipedia:Meinungsbilder/Indizierung_von_Benutzerseiten ) please disallow search engines from indexing the user namespace for the German Wikipedia by adding

NS_USER => 'noindex,follow'

to $wgNamespaceRobotPolicies accordingly.
Comment 1 Beau 2012-04-23 16:03:44 UTC
You can add entries to a page: MediaWiki:Robots.txt
There are already similar lines:
# Benutzerdiskussionsseiten 
Disallow: /wiki/Benutzer_Diskussion:
Disallow: /wiki/Benutzer_Diskussion%3A
Disallow: /wiki/User_talk:
Disallow: /wiki/User_talk%3A
Comment 2 Max Semenik 2012-04-23 16:04:38 UTC
Please reopen of you need help with this.
Comment 3 Jonathan Haas 2012-04-23 16:11:15 UTC
This will prevent them from being indexed by adding the magic word __INDEX__ to the page source, right? Changing $wgNamespaceRobotPolicies seems to be generally favored by the community and was also done in similar cases (for example see bug 16247)

(Can't reopen for some reason)
Comment 4 Max Semenik 2012-04-23 16:17:20 UTC
No, this will alter robots.txt for dewiki.
Comment 5 Jonathan Haas 2012-04-23 16:26:00 UTC
So altering robots.txt (or MediaWiki:Robots.txt which is the same as I know) will still allow individual pages to be indexed by adding __INDEX__?
Comment 6 Max Semenik 2012-04-23 16:50:31 UTC
Right, if you want to whitelist separate pages it won't work. Reopened.
Comment 7 Mark A. Hershberger 2012-04-23 18:03:41 UTC
(In reply to comment #6)
> Right, if you want to whitelist separate pages it won't work. Reopened.

Couldn't you add whitelisted pages to MediaWiki:Robots.txt?

http://en.wikipedia.org/wiki/Robots_exclusion_standard#Allow_directive

I realize this is not as scalable as __INDEX__, but maybe this feature could be added?
Comment 8 Mark A. Hershberger 2012-04-23 18:05:29 UTC
(In reply to comment #7)
> I realize this is not as scalable as __INDEX__, but maybe this feature could be
> added?

Actually, that is probably something a bot could do, right?  Watch for new __INDEX__ uses and add them to MW:Robots.txt
Comment 9 Jonathan Haas 2012-04-23 19:07:10 UTC
Sure, we could (although we would probably need to give a bot admin rights and I'm not sure we want that). But why not use $wgNamespaceRobotPolicies directly? Is there some technical problem I should know of? According to http://noc.wikimedia.org/conf/highlight.php?file=InitialiseSettings.php (not sure if I'm looking at the right file) there are already a lot of namespaces there in addition to the robots.txt one.
Comment 10 Umherirrender 2012-05-14 18:34:21 UTC
Please add NS_USER => 'noindex,follow' to the dewiki part of wgNamespaceRobotPolicies in InitialiseSettings.php.

That is the easiert way and already used by other namespaces on dewiki and other wikis. There is no reason to use a harder other technical way, when there is this easy way.

Thanks.
Comment 11 Alexander Karnstedt 2012-05-15 10:06:57 UTC
Just to make this clear: the community decision has been made under the premise that it is possible to opt in indexing (e.g. by __INDEX__)

It is not acceptable to implement any solution that does not provide this requirement!
Comment 12 Umherirrender 2012-05-20 18:48:48 UTC
Next week is gone, please give a comment about the status.
Thanks.
Comment 13 Umherirrender 2012-05-30 19:48:59 UTC
Next week is over, please, shell user or operator, add a comment or change the status of this bug, if nobody is there, to fix it or you think, that is this already fixed. Thanks for a response.
Comment 14 Raimond Spekking 2012-05-30 20:10:22 UTC
Line added to InitialiseSettings.php with https://gerrit.wikimedia.org/r/#/c/9469/

Now it needs someone to merge and deploy.

(In reply to comment #11)
> Just to make this clear: the community decision has been made under the premise
> that it is possible to opt in indexing (e.g. by __INDEX__)
> 
> It is not acceptable to implement any solution that does not provide this
> requirement!

Overriding with __INDEX__ is still possible. Tested on my local wiki.
Comment 15 Raimond Spekking 2012-05-31 13:10:34 UTC
Deployed by Reedy today. Thanks :)

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links