Last modified: 2014-09-22 14:42:39 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T70300, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 68300 - [tracking] Block spider / web crawler on tool labs
[tracking] Block spider / web crawler on tool labs
Status: UNCONFIRMED
Product: Wikimedia Labs
Classification: Unclassified
tools (Other open bugs)
unspecified
All All
: Unprioritized normal
: ---
Assigned To: Marc A. Pelletier
: tracking
Depends on: 71120
Blocks: tracking
  Show dependency treegraph
 
Reported: 2014-07-20 19:02 UTC by metatron
Modified: 2014-09-22 14:42 UTC (History)
2 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description metatron 2014-07-20 19:02:35 UTC
Tracking ticket to block aggressive spiders and web crawlers on tool labs (tools.wmflabs.org/*)

These spiders should be blocked at network or proxy level rather than in individual lighttpd-configs or even applications to avoid a waste of resources.
Comment 1 metatron 2014-07-20 19:13:15 UTC
Aggressive species:

-SeznamBot
-SputnikBot
-Sogou web spider
-TweetmemeBot
-kinshoobot
-CCBot
-Scrapy
-Baiduspider
-Yahoo! Slurp

User agents:
"Mozilla/5.0 (compatible; SeznamBot/3.2; +http://fulltext.sblog.cz/)"
"Mozilla/5.0 (compatible; SputnikBot/2.3; +http://corp.sputnik.ru/webmaster)
"Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)"
"Mozilla/5.0 (compatible; TweetmemeBot/3.0; +http://tweetmeme.com/)"
"kinshoobot (/global; amd64 Linux 3.10.23-xxxx-std-ipv6-64; java 1.8.0_05; Europe/fr) http://kinshoo.net/bot.html"
"CCBot/2.0 (http://commoncrawl.org/faq/)"
"Scrapy/0.22.0 (+http://scrapy.org)"
"Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"
"Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)"
Comment 2 metatron 2014-07-21 21:21:36 UTC
- 360Spider

User Agent:
"Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.89 Safari/537.1; 360Spider"

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links