Last modified: 2014-06-27 13:33:47 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T54617, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 52617 - Respect $wgNoFollowLinks and $wgNoFollowDomainExceptions
Respect $wgNoFollowLinks and $wgNoFollowDomainExceptions
Status: NEW
Product: Parsoid
Classification: Unclassified
DOM (Other open bugs)
unspecified
All All
: Normal normal
: ---
Assigned To: Gabriel Wicke
:
Depends on:
Blocks: 66289
  Show dependency treegraph
 
Reported: 2013-08-07 21:11 UTC by C. Scott Ananian
Modified: 2014-06-27 13:33 UTC (History)
14 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description C. Scott Ananian 2013-08-07 21:11:00 UTC
Parsoid doesn't add rel='nofollow' on links.  Someday maybe it should, if Parsoid HTML is going to be crawled directly.

If we add this, we should also parse the $wgNoFollowLinks and $wgNoFollowDomainExceptions configuration properties from mediawiki, and honor them.  There are already parser tests for these.
Comment 1 C. Scott Ananian 2013-08-07 21:12:56 UTC
(See also the discussion on https://gerrit.wikimedia.org/r/77984 for this issue on [[Image:...|link=...]] links.)
Comment 2 Gabriel Wicke 2013-08-08 01:57:46 UTC
We can always add nofollow for mw:ExtLink links, but should check with the VE folks whether this is handled properly. Compression should keep the additional overhead minimal.

In practice nofollow won't matter to anybody until our HTML is used for regular page views (hence the low priority). The Google KG team will crawl our HTML, but uses a custom pipeline in any case.
Comment 3 Gabriel Wicke 2013-08-11 04:08:19 UTC
Yesterday here at Wikimania Yong-Gang Wang of Google mentioned that their general crawling pipeline has a rule that disregards rel="nofollow" on all MediaWiki-powered sites. I would not be surprised if other engines had similar rules.

This suggests that adding rel="nofollow" has become largely pointless in MediaWiki. Blame all those high-quality external links that are hard to pass up for search engines.

Lowering priority to 'lowest' to reflect this.
Comment 4 C. Scott Ananian 2013-08-12 15:29:32 UTC
That explains all the link spam I get on my small mediawikis. :(
Comment 5 Gabriel Wicke 2013-08-15 20:45:14 UTC
Since the spam prevention effect is unlikely to still be significant we should deprecate $wgNoFollowLinks and $wgNoFollowDomainExceptions in core instead. Repurposing this bug to track that instead.
Comment 6 Tyler Romeo 2013-11-18 19:35:44 UTC
What's the argument for this bug? rel=nofollow still serves a purpose, and helps to deter spam. I would imagine it'd be quite useful depending on the circumstances.
Comment 7 Nemo 2013-11-19 12:25:48 UTC
(In reply to comment #6)
> What's the argument for this bug?

It seems its scope was reversed by comment 5.

> rel=nofollow still serves a purpose, and
> helps to deter spam. I would imagine it'd be quite useful depending on the
> circumstances.

Indeed; unless it was generally deprecated in standards and in all real world uses. As for the specific case of Google, we're waiting for clarifications: http://lists.wikimedia.org/pipermail/wikitech-l/2013-November/073175.html
Comment 8 James Forrester 2013-11-19 12:28:10 UTC
(In reply to comment #6)
> What's the argument for this bug? rel=nofollow still serves a purpose, and
> helps to deter spam. I would imagine it'd be quite useful depending on the
> circumstances.

Read comment 3, which claims that (a) no it doesn't, (b) no it doesn't, and (c) no it isn't. :-)
Comment 9 Gabriel Wicke 2013-11-19 20:48:46 UTC
I now received an answer from my contact at Google:

    Google will not follow rel=nofollow links, and not flow pagerank
    through them.  That includes Wiki{m,p}edia sources.

So the information I got at Wikimania was either not correct or the
result of a misunderstanding on my part. Another possibility is that
this detail of how pagerank works is considered too sensitive for
publication.

It should not be too hard to verify this independently by setting up a
fresh page with an unguessable URL and linking it from a wiki page with
rel=nofollow. If googlebot visits that page (or it turns up in search
results), then rel=nofollow was ignored.

Moving this bug back to Parsoid for further investigation.
Comment 10 Nemo 2013-11-19 21:04:47 UTC
(In reply to comment #9)
> It should not be too hard to verify this independently by setting up a
> fresh page with an unguessable URL and linking it from a wiki page with
> rel=nofollow. If googlebot visits that page (or it turns up in search
> results), then rel=nofollow was ignored.

But you should also ensure the link is not included anywhere else and that nobody accesses it faking their user-agent (i.e. you need to check it's a Google IP and hope it's not a Google employee trying to deceive you ;) ).
Comment 11 Gabriel Wicke 2014-06-06 22:47:21 UTC
Production doesn't add nofollow for links to these domains:

'wgNoFollowDomainExceptions' => array(
	'default' => array(
		# Original list 20111110 - bug 32309
		'mediawiki.org',
		'wikibooks.org',
		'wikimediafoundation.org',
		'wikimedia.org',
		'wikinews.org',
		'wikipedia.org',
		'wikiquote.org',
		'wikisource.org',
		'wikiversity.org',
		'wiktionary.org',
		'wikivoyage.org',
		'wikidata.org',
		'tools.wmflabs.org',
		'etherpad.wmflabs.org',
	),
),


This is done in Parser::getExternalLinkRel.
Comment 12 spage 2014-06-06 23:30:02 UTC
Flow may choose to do this itself in its Parsoid content fixing, so I filed bug 66289.
Comment 13 Nemo 2014-06-07 06:35:47 UTC
If this blocks bug 66289, it's not unconfirmed.
Comment 14 Gabriel Wicke 2014-06-09 20:21:17 UTC
(In reply to Nemo from comment #13)
> If this blocks bug 66289, it's not unconfirmed.

The unconfirmed bit referred to whether rel=nofollow still works or not. We still haven't tested this.
Comment 15 Nemo 2014-06-09 20:37:41 UTC
(In reply to Gabriel Wicke from comment #14)
> The unconfirmed bit referred to whether rel=nofollow still works or not. We
> still haven't tested this.

Until bug 66289 is marked as a valid bug, Parsoid should support that expectation.
Flow probably wants to do what core does though.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links