-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On Tue, Aug 07, 2007 at 06:18:18PM -0400, Eric d'Alibut wrote:
On 8/7/07, Todd Lyons <tly...@ivenue.com> wrote:?
If you're running spamassassin, you can use the CVS version of FuzzyOcr
which extracts and detects these spams.
Thanks for this steer.
A quick glance at the package gives me the impression that the focus
is on image files, gif's, jpeg's, etc. Am I right in thinking that
these images are the payloads carried by all those spam pdf
attachments? So that FuzzyOcr works with pdfs?
The spam pdf payload is the pdf itself. The original FuzzyOcr
incarnation ran an ocr against the gif/jpeg/etc to extract text from it,
then detection algorithms score the extracted text. The pdf spams
simply get pdftotxt run against them and then the text is searched with
the same detection algorithms.
- --
Regards... Todd
Exponential problems need logarithmic solutions. --Eddy Dreger
Linux kernel 2.6.17-6mdv 1 user, load average: 0.80, 0.74, 0.57
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)
iD8DBQFGuPl+Y2VBGxIDMLwRAoHwAJ4qyHtZD8t/O7SU5LBJSkTleSSSngCdH7Qb
4jbCXJYA1uUm/nVVwmCkziA=
=L6+8
-----END PGP SIGNATURE-----