Tobias DiPasquale on 22 Oct 2004 14:26:02 -0000


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] Spam programs


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Oct 22, 2004, at 9:17 AM, Jeff Abrahamson wrote:
But I'm curious why this should be so.  It's usually possible to reach
a decision with more confidence if one has less data.  More data adds
nuance to decisions.  Why should Bayesian filters (or Markovian or...)
work worse if there's more data?

Your second sentence here explains it quite well ;-) Basically, it creates a higher signal-to-noise ratio w/r/t the tokens in your corpus. A medium level of solid tokens is much better than a huge array of who-knows-what tokens.


- --
Tobias DiPasquale
202A 04C4 2CE6 B985 8520  88D6 CD25 1A6C B9B5 1595

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (Darwin)

iD8DBQFBeRhBzSUabLm1FZURAsImAKCQvgnlntkYSQTW/cSrEtRMI6Qe9ACfcNKN
kwuSWODHc9sKrkclmh17Lnk=
=VKUC
-----END PGP SIGNATURE-----

___________________________________________________________________________
Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug