7 messages in net.sourceforge.lists.courier-usersRe: [courier-users] Automatic SPAM le...
FromSent OnAttachments
Soeren D. SchulzeJun 23, 2007 11:55 am 
Jérôme BlionJun 23, 2007 12:10 pm 
Alessandro VeselyJun 24, 2007 2:22 am 
Soeren D. SchulzeJun 24, 2007 4:03 am 
moussJun 24, 2007 11:40 am 
Alessandro VeselyJun 24, 2007 3:41 pm 
Soeren D. SchulzeJun 25, 2007 1:48 am 
Actions with this message:
Paste this link in email or IM:
Paste this link in email or IM:
Atom feed for this thread
Paste this URL into your reader:
Subject:Re: [courier-users] Automatic SPAM learningActions...
From:mouss (mlis@free.fr)
Date:Jun 24, 2007 11:40:49 am
List:net.sourceforge.lists.courier-users

Soeren D. Schulze wrote:

Hello,

I found the following patch:

http://da.andaka.org/Doku/imapspamfilter.html

To describe it briefly, it automatically trains the SPAM filter when the user moves messages to a SPAM or HAM folder.

First, what do you think about this in principal?

I see two design issues: 1. The user does not have the chance to use his own preferred settings, as everything is controlled by an environment variable.

what would be the benefit of that? too much flexibility kills usability!

Here is the setup I use:

- maildrop puts spam in .Junk/ folder, if this folder exists (so user can "optout" by deleteing this folder). - False positives: user can move a message to .Junk.Error/ (false positives) - False negatives: user can confirm spam or puts undetected spam message in .Junk.Trash/ - sa-learn is called on FPs and FNs. the messages are then moved to an "invisible" folder (no leading dot) to avoid having them around. They may be deleted or kept for use as a "corpus".

if user doesn't move messages, no sa-learn. If user complains about errors, he gets a recommendation to move messages.

2. The server freezes until the SPAM learner has done its job.

This is bad indeed. he could hard link the file and run a bg process on the new link.

Personally, I would solve it by specifying a new column (or more than one) in the user database which includes the SPAM policy. The learning would be done in the background without the server waiting for the process to finish.

I am ready to do the coding, but as I am quite new to Courier, I would like to hear about your opinion.

I'm not convinced of the value of this compared to a periodic job. after all, the user doesn't sort his mail at delivery time, so why hurry? regarding the "reprocess" issue, it is enough to move the processed messages out of the way. If they must stay in place, then an "ln" may be done one all the files and if it fails, the message is not processed (ln cur/$f domedo/$f && do_process domedo/$f). This still requires reading the whole directory, but what? "large" folders cause performance issues anyway...