Soeren D. Schulze wrote:
Hello,
I found the following patch:
http://da.andaka.org/Doku/imapspamfilter.html
To describe it briefly, it automatically trains the SPAM filter when the
user moves messages to a SPAM or HAM folder.
First, what do you think about this in principal?
I see two design issues:
1. The user does not have the chance to use his own preferred settings,
as everything is controlled by an environment variable.
what would be the benefit of that? too much flexibility kills usability!
Here is the setup I use:
- maildrop puts spam in .Junk/ folder, if this folder exists (so user
can "optout" by deleteing this folder).
- False positives: user can move a message to .Junk.Error/ (false positives)
- False negatives: user can confirm spam or puts undetected spam message
in .Junk.Trash/
- sa-learn is called on FPs and FNs. the messages are then moved to an
"invisible" folder (no leading dot) to avoid having them around. They
may be deleted or kept for use as a "corpus".
if user doesn't move messages, no sa-learn. If user complains about
errors, he gets a recommendation to move messages.
2. The server freezes until the SPAM learner has done its job.
This is bad indeed. he could hard link the file and run a bg process on
the new link.
Personally, I would solve it by specifying a new column (or more than
one) in the user database which includes the SPAM policy. The learning
would be done in the background without the server waiting for the
process to finish.
I am ready to do the coding, but as I am quite new to Courier, I would
like to hear about your opinion.
I'm not convinced of the value of this compared to a periodic job. after
all, the user doesn't sort his mail at delivery time, so why hurry?
regarding the "reprocess" issue, it is enough to move the processed
messages out of the way. If they must stay in place, then an "ln" may be
done one all the files and if it fails, the message is not processed (ln
cur/$f domedo/$f && do_process domedo/$f). This still requires reading
the whole directory, but what? "large" folders cause performance issues
anyway...