Last modified: 2014-09-22 14:42:39 UTC
Tracking ticket to block aggressive spiders and web crawlers on tool labs (tools.wmflabs.org/*) These spiders should be blocked at network or proxy level rather than in individual lighttpd-configs or even applications to avoid a waste of resources.
Aggressive species: -SeznamBot -SputnikBot -Sogou web spider -TweetmemeBot -kinshoobot -CCBot -Scrapy -Baiduspider -Yahoo! Slurp User agents: "Mozilla/5.0 (compatible; SeznamBot/3.2; +http://fulltext.sblog.cz/)" "Mozilla/5.0 (compatible; SputnikBot/2.3; +http://corp.sputnik.ru/webmaster) "Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)" "Mozilla/5.0 (compatible; TweetmemeBot/3.0; +http://tweetmeme.com/)" "kinshoobot (/global; amd64 Linux 3.10.23-xxxx-std-ipv6-64; java 1.8.0_05; Europe/fr) http://kinshoo.net/bot.html" "CCBot/2.0 (http://commoncrawl.org/faq/)" "Scrapy/0.22.0 (+http://scrapy.org)" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)" "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)"
- 360Spider User Agent: "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.89 Safari/537.1; 360Spider"