Tobias DiPasquale on 16 Mar 2004 18:35:03 -0000


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] repository of sample spam messages?


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Jason Costomiris wrote:
| Anyone know of a source to obtain a large mbox of spam messages,
| suitable for training a Bayesian filter?
|
| I've got one, but it's only a couple of hundred messages.  I'd like to
| train with a few thousand to increase accuracy.
|

That would not increase accuracy for you in particular unless:

a) You had a commensurate amount of ham to train with

and

b) You vetted every single message from the spam repository to make sure
that it was actually spam

You'd be better off simply using your spam corpus with a couple hundred
representative ham messages to train your filter as opposed to trying to
use a bunch of messages from a spam archive.

If you still want to train on possibly non-spammy spam, check out
http://www.spamarchive.org/.

- --
Tobias DiPasquale
202A 04C4 2CE6 B985 8520  88D6 CD25 1A6C B9B5 1595
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFAV0jOzSUabLm1FZURAoIHAKCh2s83gj1u18KpyFKS33KzMPB5AQCggWgc
452liGMukJQ23d+enmP/WM8=
=ViUf
-----END PGP SIGNATURE-----
___________________________________________________________________________
Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug