Tobias DiPasquale on 16 Mar 2004 18:35:03 -0000 |
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Jason Costomiris wrote: | Anyone know of a source to obtain a large mbox of spam messages, | suitable for training a Bayesian filter? | | I've got one, but it's only a couple of hundred messages. I'd like to | train with a few thousand to increase accuracy. | That would not increase accuracy for you in particular unless: a) You had a commensurate amount of ham to train with and b) You vetted every single message from the spam repository to make sure that it was actually spam You'd be better off simply using your spam corpus with a couple hundred representative ham messages to train your filter as opposed to trying to use a bunch of messages from a spam archive. If you still want to train on possibly non-spammy spam, check out http://www.spamarchive.org/. - -- Tobias DiPasquale 202A 04C4 2CE6 B985 8520 88D6 CD25 1A6C B9B5 1595 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFAV0jOzSUabLm1FZURAoIHAKCh2s83gj1u18KpyFKS33KzMPB5AQCggWgc 452liGMukJQ23d+enmP/WM8= =ViUf -----END PGP SIGNATURE----- ___________________________________________________________________________ Philadelphia Linux Users Group -- http://www.phillylinux.org Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce General Discussion -- http://lists.phillylinux.org/mailman/listinfo/plug
|
|