Re: [PLUG] Spam programs

Jeff Abrahamson on 22 Oct 2004 13:18:02 -0000

[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] Spam programs

From: Jeff Abrahamson <jeff@purple.com>

To: plug@lists.phillylinux.org

Subject: Re: [PLUG] Spam programs

Date: Fri, 22 Oct 2004 09:17:13 -0400

Mail-followup-to: plug@lists.phillylinux.org

Reply-to: plug@lists.phillylinux.org

Sender: plug-admin@lists.phillylinux.org

User-agent: Mutt/1.5.6+20040722i

On Fri, Oct 22, 2004 at 08:27:13AM -0400, Tobias DiPasquale wrote: > [28 lines, 133 words, 1091 characters] Top characters: _asni-ot > > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On Oct 22, 2004, at 7:53 AM, Jeff Abrahamson wrote: > > I am running bogofilter with a database of 23,272 spams and 43034 > > non-spam messages. > > I would recommend using a database composed of an order of magnitude > less spam and ham (on the order of 2000 apiece). This has proven to be > the most accurate in terms of individual precision. Pick the 2000 > spammiest spams and 2000 hammiest hams and create a database using > those and see if your false negative rate doesn't go down. The problem is that that requires manual selection. Perhaps I should use grepmail to select all spam and ham from the past N months, where N is chosen to make the final numbers work out. But I'm curious why this should be so. It's usually possible to reach a decision with more confidence if one has less data. More data adds nuance to decisions. Why should Bayesian filters (or Markovian or...) work worse if there's more data? -- Jeff Jeff Abrahamson <http://www.purple.com/jeff/> +1 215/837-2287 GPG fingerprint: 1A1A BA95 D082 A558 A276 63C6 16BF 8C4C 0D1D AE4B A cool book of games, highly worth checking out: http://www.amazon.com/exec/obidos/ASIN/1931686963/purple-20
Attachment: signature.asc
Description: Digital signature

Follow-Ups:

Re: [PLUG] Spam programs
From: Tobias DiPasquale <toby@cbcg.net>

References:

Re: [PLUG] Spam programs
From: Doug Crompton <doug@crompton.com>

Re: [PLUG] Spam programs
From: Tobias DiPasquale <toby@cbcg.net>

Re: [PLUG] Spam programs
From: Jeff Abrahamson <jeff@purple.com>

Re: [PLUG] Spam programs
From: Tobias DiPasquale <toby@cbcg.net>

Prev by Date: Re: [PLUG] Spam programs

Next by Date: Re: [PLUG] Spam programs

Previous by thread: Re: [PLUG] Spam programs

Next by thread: Re: [PLUG] Spam programs

Index(es):

Date

Thread