Last modified: 2013-03-27 16:28:19 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T44284, the corresponding Phabricator task for complete and up-to-date bug report information.

Bug 42284 - More AFTv5 abuse filters


Summary:	More AFTv5 abuse filters

Status:	RESOLVED FIXED

Product:	MediaWiki extensions
Classification:	Unclassified
Component:	ArticleFeedbackv5 (Other open bugs)
Version:	unspecified
Hardware:	All All

Importance:	Highest normal (vote)
Target Milestone:	---
Assigned To:	Matthias Mullie

URL:
Whiteboard:
Keywords:

Duplicates:	43417 (view as bug list)
Depends on:
Blocks:
	Show dependency tree / graph

Reported:	2012-11-20 07:32 UTC by Fabrice Florin
Modified:	2013-03-27 16:28 UTC (History)
CC List:	4 users (show)

See Also:
Web browser:	---
Mobile Platform:	---
Assignee Huggle Beta Tester:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Description Fabrice Florin 2012-11-20 07:32:28 UTC

Implement new abuse filters to discourage more questionable comments:

* Repeating characters
Disallow posts where the same character is 5 times or more in a row.
See this edit filter #135, which could be adapted for this feedback filter:
https://en.wikipedia.org/wiki/Special:AbuseFilter/135

* No punctuation 
Disallow comments that have no commas, periods, colons, question marks or exclamation marks -- often a telltale sign of irrelevant contributions, according to Stack Overflow founder Jeff Anderson. 

* Shouting
Give a warning when most of the comment is ALL CAPS (or 90% of chars.), which is usually a telltale sign of questionable feedback. This filter was implemented in April 2012 as filter #458, but disabled by King of Hearts -- and repurposed for short posts by Sole Shoe in August. I recommend we start a new feedback filter for this function, but implement it as a Warning (not Disallow) -- and only for posts that are at least 90% all-caps. FF

* More bad words
We should be able to filter out more offensive words than the small list we now disallow. I can do more research in coming days and email a larger list to our developer, rather than posting these swear words here.

As a result, I hope we can filter more irrelevant posts than we do now (about 10% of total feedback is now filtered through this tool, and it may be possible to increase that number with a few more reliable filters).

Read more about abuse filter for article feedback on our feature requirements page:
http://www.mediawiki.org/wiki/Article_feedback/Version_5/Feature_Requirements#Abuse.2FSpam_Filters

See also the proposed filters in the 'Under consideration' section of this abuse filter spreadsheet: https://docs.google.com/a/wikimedia.org/spreadsheet/ccc?key=0AiGAdIp7VYlbdDdKUm9naXhxOXVweWZ5YkU3Wk5lSlE#gid=0

Comment 1 Andre Klapper 2012-11-20 17:56:25 UTC

Fabrice: Should this really be highest priority (urgent to fix within the next days)? Sounds more like an enhancement (that could have high prio though).

Comment 2 Fabrice Florin 2012-11-20 23:11:14 UTC

Hi Andrew, thanks for your note.

From the standpoint of Article Feedback, this feature is our highest priority and it attempts to solve a major problem, which is that a large number of comments are inappropriate and can be effectively filtered through this tool.

So I continue to view this ticket as 'major'. It is definitely more than an 'enhancement' from our perspective. But as a compromise, I have adjusted the importance to 'normal', to show that I'm not an unreasonable man. ;o)

I understand that you have different labeling standards for tracking other applications through Bugzilla. If our labeling system is a serious issue for you, we coud consider moving to another bug tracking system, so we don't interfere with your ongoing processes. 

For example, we now use Trello for E2 project management, and could look into migrating out of Bugzilla into Trello over time, if it will make it easier for you. Please let me know if we should consider that option.

Comment 3 Andre Klapper 2012-11-21 09:24:00 UTC

I don't see much difference in labeling standards here, actually. :)
Improving/clarifying the semantics of "highest" vs "high" priority shouldn't be such a blocker that it forces you to migrate to a different system.

Comment 4 Andre Klapper 2012-11-21 09:29:51 UTC

(In reply to comment #2)
> From the standpoint of Article Feedback, this feature is our highest priority
> and it attempts to solve a major problem, which is that a large number of
> comments are inappropriate and can be effectively filtered through this tool.

Bug 37579 and bug 42057 are also highest priority for AFT5. The question boils down to "how many highest priorities can you have at the same time" and "what does highest priority mean in comparison to high priority". Also see http://lists.wikimedia.org/pipermail/wikitech-l/2012-November/064531.html

> So I continue to view this ticket as 'major'. It is definitely more than an
> 'enhancement' from our perspective. But as a compromise, I have adjusted the
> importance to 'normal', to show that I'm not an unreasonable man. ;o)

If it's "major" priority in the sense of "Major loss of function in an important area." feel free to set priority to "major". If it provides a new functionality that was not available before in your project, it's an "enhancement" by definition.

Comment 5 Kunal Mehta (Legoktm) 2012-12-16 09:30:13 UTC

Is the AbuseFilter really the right way to go about this?

Why can't AFTv5 check the spam blacklist rather than [[Special:AbuseFilter/502]]?

Using the AbuseFilter seems ok as an interim solution, but AFTv5 should have its own built-in abuse prevention methods, rather than depending on another extension.

Comment 6 Matthias Mullie 2012-12-17 10:21:16 UTC

Legoktm: why do you feel AbuseFilter should only be an interim solution?

AbuseFilter accepts different "categories", so AFT entries are separate from regular text. Using AbuseFilter to filter spam has 2 distinct advantages over building something into AFT (not to mention the additional work to build it):
- AbuseFilter does not require WMF intervention to add/fix/deploy new rules
- Community is familiar with AbuseFilter already, so more people can contribute

In addition, AFT also checks $wgSpamRegex and SpamBlacklist already.

Comment 7 Kunal Mehta (Legoktm) 2012-12-18 10:48:58 UTC

I think I was mainly concerned about it not having build-in abuse prevention, but I wasn't aware of it checking $wgSpamRegex and SpamBlacklist (probably should have done some more reading), however I think your point about the community being more familiar with AF makes a lot of sense.

However now I think using the AbuseFilter for this purpose is definitely an advantage.

Comment 8 Matthias Mullie 2013-01-10 08:36:39 UTC

*** Bug 43417 has been marked as a duplicate of this bug. ***

Comment 9 Matthias Mullie 2013-01-10 08:39:26 UTC

Repeating characters: https://en.wikipedia.org/wiki/Special:AbuseFilter/473
Was originally created by rsterbin; re-enabled.

No punctuation or spaces: https://en.wikipedia.org/wiki/Special:AbuseFilter/520
Added, but not yet enabled; will only enable after the commit that'll display
the name of the filter that rejected the feedback
(https://gerrit.wikimedia.org/r/#/c/32208/) is merged

Shouting: https://en.wikipedia.org/wiki/Special:AbuseFilter/521
Added & enabled.

More bad words: https://en.wikipedia.org/wiki/Special:AbuseFilter/460
I have merged all "common vandalism" filters containing foul words into this
original filter & renamed it to "foul words" (to provide clearer feedback to
user whose feedback is rejected - this commit has not yet been merged)
Please provide a list of additional foul words if you want to expand upon the
existing list.

Short posts: https://en.wikipedia.org/wiki/Special:AbuseFilter/458
I have re-enabled the filter now that the threshold has been upped.

Extremely long words: https://en.wikipedia.org/wiki/Special:AbuseFilter/502
Just completing the list of currently active filters for AFT ;)

Email address: https://en.wikipedia.org/wiki/Special:AbuseFilter/463
Just completing the list of currently active filters for AFT ;)

Comment 10 Fabrice Florin 2013-01-10 17:54:19 UTC

Thank you, Matthias, much appreciated!

Do you need help testing this en-wiki? 

Perhaps Chris McMahon could help us, if he has time.

Wikimedia Bugzilla is closed!

Search

Personal tools

Navigation

Links