Jonathan Tran on 31 Jul 2008 11:04:04 -0700


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: collective intelligence - bayes theorem help

  • From: "Jonathan Tran" <jonnytran@gmail.com>
  • To: philly-lambda@googlegroups.com
  • Subject: Re: collective intelligence - bayes theorem help
  • Date: Thu, 31 Jul 2008 14:03:57 -0400
  • Authentication-results: mx.google.com; spf=pass (google.com: domain of jonnytran@gmail.com designates 74.125.46.154 as permitted sender) smtp.mail=jonnytran@gmail.com; dkim=pass (test mode) header.i=@gmail.com
  • Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=beta; h=domainkey-signature:received:received:x-sender:x-apparently-to :received:received:received-spf:authentication-results:received :dkim-signature:domainkey-signature:received:received:message-id :date:from:to:subject:in-reply-to:mime-version:content-type :content-transfer-encoding:content-disposition:references:reply-to :sender:precedence:x-google-loop:mailing-list:list-id:list-post :list-help:list-unsubscribe:x-beenthere; bh=4dcTNKMc0/T2lhJyo14bOOzOnwmHd40xUZEsRVOsDTE=; b=xm+y7bMrI6zgyrZ6ICIOBIcf8p9gCSuLBXE0nwrNtAO1923YyIWgHQ73ZEWEO/RhOU g47m9Cz1908k7G6YnL8Ysfo/hpgucgt/6xiq23frPJXqafcoO+QwhOGd+VFgYBydlq0F eeONGvYv//vYCaVh1nm8HO0llV8pyUqoMuY+g=
  • Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to :subject:in-reply-to:mime-version:content-type :content-transfer-encoding:content-disposition:references; bh=qOy1Onep3TLyCjt0kcJh8ks118MYfcoN7qvckQ+Ng7U=; b=SWi9e3aQ3NaxHxewrVGbjZUbH2F9BEwuF6RLZ8i4WUgS329Qxds+fTdD3B5X8bW3n5 Ee4pC6kca6zL35msdoTyUCDXoU5tZorSWOvVfmgMIWxhoPPy6t3dBVFsaFQBdgrtrndO chqKAQD6lYnFZn11T3jWbtzJcI3Hj4hYmSYAg=
  • Mailing-list: list philly-lambda@googlegroups.com; contact philly-lambda+owner@googlegroups.com
  • Reply-to: philly-lambda@googlegroups.com
  • Sender: philly-lambda@googlegroups.com

On Wed, Jul 30, 2008 at 9:24 PM, Steve Eichert <steve.eichert@gmail.com> wrote:
> A = Person X will identify Person Y
> B = Person X is in the Philly Lambda user group
>
> However, in order to take this approach I believe I would need to know the
> probability that person X will identify person Y, which is what I'm trying
> to figure out.

I'm no probability expert either, but have you tried solving the
conditional probability formula for P(A)?  As in...

P(A|B) = P(B|A)*P(A) / P(B)

P(A) = P(A|B)*P(B) / P(B|A)

ASCII text may be a little misleading.  Your events are actually
parameterized over X and Y.  Normally this would be written B_X (TeX),
as in B with subscript X, to represent the event that Person X is in
the Philly Lambda user group.

The reason I bring this up is because to figure out P(A|B), we would
take the number of people in Philly Lambda who identify person Y, and
divide by the number of people in Philly Lambda.  And this would be
different for each person Y.

The weird thing, which I think may be the cause of your confusion, is
that for some people, we don't know who they identify.  We can't
really compute P(A|B) as I described.  So do we include them in the
total number of people in Philly Lambda?  ... You see what I mean?
Because they are unknown for who they identify, we can't really say.

A simplification might be to exclude the people who did not respond
from the dataset.  Use that dataset to compute the probabilities.
Then predict the ones who didn't respond from that.  I think this
makes sense because it's like spam filtering.  You use all the emails
you've seen before to create the probability predictors.  Then you use
those predictors to classify new email, which you really don't know
whether they are spam or not.