Last modified: 2013-07-16 21:32:44 UTC
At the moment, it's very hard to code a bot that can filter out blacklisted URLs whilst leaving other URLs. A simple thing to help with this would be to have the API pass back all problematic URLs on the page, thus reducing the number of attempts to two (try -> filter -> try again) rather than a loop of changing one domain every time,a s at present. Thanks.
(In reply to comment #0) > At the moment, it's very hard to code a bot that can filter out blacklisted > URLs whilst leaving other URLs. > > A simple thing to help with this would be to have the API pass back all > problematic URLs on the page, thus reducing the number of attempts to two (try > -> filter -> try again) rather than a loop of changing one domain every time,a > s at present. > > Thanks. I don't actually think this is an api bug case EditPage::AS_SPAM_ERROR: $this->dieUsageMsg( array( 'spamdetected', $result['spam'] ) ); It's what gets returned back by the editpage... Might be something looking to fix as part of bug 29246
Much as I hate gratuitous spam, I'd completely forgotten about this one. Reedy: any further thoughts? It's still a feature which would be genuinely useful, although there might be an existing pywikipedia, in which case probably less so.
Created attachment 9624 [details] Functional patch v1 (doesn't apply) A patch (couldn't get it to apply) but at least illustrative of the solution to this problem (the changes have the desired effect, have tested). Essentially, the tradeoff is that "spammers" get the full report of what they have tried to add that they cannot, in return for a slightly longer wait. This should be of benefit to any automated tools or content adders in dubious areas, who no longer have to query and requery: they can get a single report, remove all of the offending links, and then be certain that their next attempt will succeed.
(In reply to comment #4) > Created attachment 9624 [details] > Functional patch v1 (doesn't apply) > > A patch (couldn't get it to apply) but at least illustrative of the solution to > this problem (the changes have the desired effect, have tested). Essentially, > the tradeoff is that "spammers" get the full report of what they have tried to > add that they cannot, in return for a slightly longer wait. This should be of > benefit to any automated tools or content adders in dubious areas, who no > longer have to query and requery: they can get a single report, remove all of > the offending links, and then be certain that their next attempt will succeed. How did you create it? Start with a working copy of MediaWiki from svn, make the changes, then create the patch (svn diff > bug30332.diff)
(In reply to comment #5) > How did you create it? > > Start with a working copy of MediaWiki from svn, make the changes, then create > the patch (svn diff > bug30332.diff) Well, I was trying to create a patch against the 'installed' version of the SpamBlacklist extension. I think that was the problem (since the 'installed' version is not versioned).
Created attachment 9797 [details] First part of functional patch (for core)
Created attachment 9798 [details] Second part of functional patch (for /extensions/) Sorry, I have core and extensions separate, so I've had to create two patches. These ones should apply and can be tested by copying from the extensions repo into the installed extensions folder (and making sure that SpamBlacklist is included!). Tested on local installation and works perfectly.
Looks like you attached the same patch twice.
Created attachment 9829 [details] First part of functional patch (for core) (corrected) So I did. Think this should be the correct one.
Okay, so again, won't apply cleanly. On my to-do list to update (again).
Created attachment 9933 [details] Second part of functional patch (for /extensions/) (updated for bitrot) Updated patch so it applies cleanly again
Jarry, there's been a bit of a delay in the review of patches here -- as we prepare to get a new version out, we're in a "code slush" during which we concentrate on reviewing code that has already been committed to our source code repository (you might have already seen the details at http://thread.gmane.org/gmane.science.linguistics.wikipedia.technical/57950 ). But we'll try to respond to your contribution soon. My apologies for the wait.
Patches resubmitted under the new system: https://gerrit.wikimedia.org/r/3740 https://gerrit.wikimedia.org/r/3747 I hope they can both be reviewed soon.
Review and merged, closing as FIXED