Last modified: 2013-09-27 11:31:26 UTC
Links like http://www.google.com/url?sa=t&rct=j&q=public%20law%20105-298&source=web&cd=1&ved=0CB4QFjAA&url=http%3A%2F%2Fwww.redtube.com&ei=vmahTvikEoib-gadiZGuBQ&usg=AFQjCNH95AzJoEKz83KrtpLkLXENeJ3Njw&sig2=I_64kGBITluwmGNvw619Cg that should be matched by the sbl (\bredtube\.com\b is blocked at meta) are not blocked by the sbl. See http://en.wikipedia.org/w/index.php?title=Wikipedia_talk:External_links&diff=prev&oldid=456671911 and see http://meta.wikimedia.org/wiki/Talk:Spam_blacklist#Google_redirect_spam
The sbl extension searches for /https?:\/\/+[a-z0-9_\-.]*(\bexample\.com\b) That means sbl entries always start with a domain part of a url. Actually that's ok, because google-links like the above mentioned also include full urls. The problem is that those urls are encoded (see [[w:en:Percent-encoding]]) and the sbl extension does no decoding. So ...?url=http%3A%2F%2Fwww.example.com is not resolved as ...?url=http://www.example.com Solutions could be either 1. letting the regexp pattern start not with /https?:\/\/+[a-z0-9_\-.]*(/ but with /https?(?i::|%3a)(?i:\/|%2f){2,}[a-z0-9_\-.]*(/ or 2. decoding urls before doing the regexp matching. (The second option is better for it is more general.)
Review the patch here https://gerrit.wikimedia.org/r/57904
Review the patch here https://gerrit.wikimedia.org/r/#/c/57935/
anubhav: Mentioning the bug number in the commit message is highly welcome.