Last modified: 2014-01-03 15:51:14 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T45665, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 43665 - Spam filter not filtering majority of spam to Junk folder
Spam filter not filtering majority of spam to Junk folder
Status: RESOLVED FIXED
Product: Wikimedia
Classification: Unclassified
OTRS (Other open bugs)
wmf-deployment
All All
: Normal normal (vote)
: ---
Assigned To: Jeff Green
: ops
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-01-06 01:25 UTC by Ryan (Rjd0060)
Modified: 2014-01-03 15:51 UTC (History)
13 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
Patch against our SVN repo of quilt patches to add X-Spam-Score header support (1.52 KB, patch)
2013-01-12 22:08 UTC, Ori Livneh
Details

Description Ryan (Rjd0060) 2013-01-06 01:25:05 UTC
This problem has been happening (again) for some time (years).  It appears (by
viewing the email headers) that (some) messages are still being assigned a spam
score, but it does not appear anything happens beyond that.  A lot of spam
(even scored as "possible spam", etc., items are still being directed to the
main queues rather than the junk folder.

OTRS admins can set filters, which typically could divert any message with an
X-Spam-Score higher than X value to the junk filter, but X-Spam-Score does not
exist on our list of search options.  We have X-Spam-Flag, X-Spam-Level and
X-Spam-Status - none of these are even in the email headers.
Comment 1 Andrew Gray 2013-01-06 23:41:40 UTC
Some example headers from one such ticket:

***

X-Spam-Score: 1.4 (+)
X-Spam-Report: Spam detection software, running on the system "mchenry.wikimedia.org", has identified this incoming email as possible spam. If you have any questions, see the administrator of that system for details. Content analysis details:   (1.4 points, 4.0 required) pts rule name              description ---- ---------------------- -------------------------------------------------- 0.0 HTML_MESSAGE           BODY: HTML included in message 2.2 TVD_SPACE_RATIO        BODY: TVD_SPACE_RATIO -2.6 BAYES_00               BODY: Bayesian spam probability is 0 to 1% [score: 0.0000] 1.8 MISSING_SUBJECT        Missing Subject: header

***

Fixing OTRS (on past experience) looks pretty unlikely to happen - would it be possible to get the WMF spam-filter to add a duplicate header when it adds X-Spam-Score - perhaps the same value but in X-Spam-Level?
Comment 2 Ori Livneh 2013-01-12 22:08:56 UTC
Created attachment 11620 [details]
Patch against our SVN repo of quilt patches to add X-Spam-Score header support

Patch against <http://svn.wikimedia.org/svnroot/mediawiki/trunk/otrs>. Adds a patch to our quilt patch series which adds "X-Spam-Score" to the list of e-mail headers that are scannable. Submitted upstream at <http://bugs.otrs.org/show_bug.cgi?id=9042>.
Comment 3 Ori Livneh 2013-01-12 22:14:21 UTC
Assigning to Tim Starling for review, since the SVN log indicates that he wrote all previous OTRS patches. (Sorry if that's presumptuous.)
Comment 4 Ori Livneh 2013-01-14 15:19:22 UTC
Upstream merged: http://bugs.otrs.org/show_bug.cgi?id=9042.
Comment 5 Ryan (Rjd0060) 2013-02-11 23:40:51 UTC
Any update here?  Spam continues to come in directly to the main queues extremely heavy.
Comment 6 Andre Klapper 2013-02-21 17:04:42 UTC
+CCing Jeff Green, maybe he could deploy that patch?
Comment 7 Sumana Harihareswara 2013-02-22 11:57:07 UTC
Philippe: now that bug 22622 is moving forward (to upgrade our installation of OTRS), please let us know whether you would prefer to simply wait for the upgrade, or to deploy this particular patch immediately to cut down on spam.  Thanks!
Comment 8 Philippe Beaudette 2013-02-22 12:36:33 UTC
OK to deploy, unless Jeff Green has any hesitations.
Comment 9 Andre Klapper 2013-02-25 22:04:55 UTC
Jeff: Any objections? 
Jeff / Tim: If not, could you please deploy this?
Comment 10 Andre Klapper 2013-03-11 18:42:45 UTC
I've asked for patch deployment in RT #4713.
Comment 11 Jeff Green 2013-03-12 18:19:21 UTC
The SVN OTRS repository is deprecated and locked so I requested that be moved into git, and manually deployed the patch in the mean-time. I will check it in to git once that's available.
Comment 12 Andre Klapper 2013-03-14 10:37:29 UTC
This patch was deployed by Jeff, so hopefully working in OTRS is a bit less noisy now for everybody.

Keeping this open for the SVN -> git codebase migration part, cannot see the request listed on http://www.mediawiki.org/wiki/Git/New_repositories/Requests though.
Comment 13 Andre Klapper 2013-03-21 18:02:46 UTC
All done: https://gerrit.wikimedia.org/r/gitweb?p=operations/software/otrs.git;a=log;h=refs/heads/master

Closing as FIXED. Thanks everybody!
Comment 14 Ryan (Rjd0060) 2013-04-10 04:35:55 UTC
(In reply to comment #13)
> All done:
> https://gerrit.wikimedia.org/r/gitweb?p=operations/software/otrs.git;a=log;
> h=refs/heads/master
> 
> Closing as FIXED. Thanks everybody!

The patch implemented (I believe it is RT #4713 according to above) does not appear to have worked.  We (the OTRS admins) still do not have an 'X-Spam-Score' option on the dropdown menu when creating PostMaster Filters.

I know there was other work done with regards to spam filtering on OTRS, so not sure if this SpamAssassin scoring is superseded by that and this bug is moot, or not - so re-opening.
Comment 15 Andre Klapper 2013-04-15 12:15:45 UTC
Jeff: Any comments on comment 14?
Comment 16 Jeff Green 2013-04-15 15:01:15 UTC
The one additional thing I did re. spam filtering was to nuke the auto-whitelist database so spamassassin would start fresh. I checked logs and it looks to me as though spamassassin+otrs is generally working as expected. Beyond that I'll talk to Martin about how we can improve spam filtering with the upgrade.
Comment 17 Ryan (Rjd0060) 2013-05-28 13:27:49 UTC
We're still getting a lot of spam in the main queues, even though SpamAssassin is recognizing it as likely spam...one recent example, ticket 2013052810003205.

"X-Spam-Score: 2.6 (++)
X-Spam-Report: Spam detection software, running on the system "mchenry.wikimedia.org", has identified this incoming email as possible spam. If you have any questions, see the administrator of that system for details..."

This was delivered to a regular Wikiquote queue, rather than the junk queue.
Comment 18 Andre Klapper 2013-08-09 06:48:25 UTC
A new version of OTRS was recently made available (see bug 22622) and SpamAssassin is still available for it ( https://gerrit.wikimedia.org/r/#/c/77391/ ) so an update on this bug report would be very welcome:

* Is this still a problem?
* Is the information provided in this bug report still correct?
Comment 19 jeremyb 2013-08-09 06:52:17 UTC
https://rt.wikimedia.org/Ticket/Display.html?id=5557#txn-125935 says:
>  - junk queue -> mbox -> sa-learn
Comment 20 jeremyb 2013-08-09 06:53:15 UTC
(In reply to comment #19)
> https://rt.wikimedia.org/Ticket/Display.html?id=5557#txn-125935 says:
> >  - junk queue -> mbox -> sa-learn

(err, left out: that's under "left to do")
Comment 21 Ryan (Rjd0060) 2013-08-12 17:31:34 UTC
(In reply to comment #18)
> A new version of OTRS was recently made available (see bug 22622) and
> SpamAssassin is still available for it (
> https://gerrit.wikimedia.org/r/#/c/77391/ ) so an update on this bug report
> would be very welcome:
> 
> * Is this still a problem?
> * Is the information provided in this bug report still correct?


This issue still appears to be present.  The above information that I've supplied in the bug info is correct, as far as I know.  To summarize:

The spam filtering software that 'mchenry.wikimedia.org' is using is flagging messages as potential spam.  Despite this, the messages are still being delivered to the main queues, rather than routed to the 'Junk' queue.  This is happening with messages with high "spam scores", as well as low ones.  

Here is an example message, from a ticket received today, #2013081210006772, that despite receiving a high score, was still delivered to the info-en Quality queue, as it was addressed to:

X-Spam-Score: 5.9 (+++++)
X-Spam-Report: Spam detection software, running on the system "mchenry.wikimedia.org", has identified this incoming email as possible spam. If you have any questions, see the administrator of that system for details. Content analysis details:   (5.9 points, 4.0 required) pts rule name              description ---- ---------------------- -------------------------------------------------- -0.0 SPF_PASS               SPF: sender matches SPF record 1.5 URIBL_WS_SURBL         Contains an URL listed in the WS SURBL blocklist [URIs: pechkin-mail.ru] 2.7 URI_UNSUBSCRIBE        URI: URI contains suspicious unsubscribe link 0.0 HTML_MESSAGE           BODY: HTML included in message 1.1 MPART_ALT_DIFF_COUNT   BODY: HTML and text parts are different -2.6 BAYES_00               BODY: Bayesian spam probability is 0 to 1% [score: 0.0000] 1.4 MIME_QP_LONG_LINE      RAW: Quoted-printable line longer than 76 chars 0.2 SARE_SUB_ENC_UTF8      Message uses character set often used in spam 1.7 SARE_UNSUB13           SARE_UNSUB13 -0.0 AWL                    AWL: From: address is in the auto white-list
Comment 22 Jeff Green 2013-08-26 20:09:28 UTC
I think I've finally fixed this.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links