Steve Eichert on 31 Jul 2008 11:37:03 -0700


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: collective intelligence - bayes theorem help

  • From: "Steve Eichert" <steve.eichert@gmail.com>
  • To: philly-lambda@googlegroups.com
  • Subject: Re: collective intelligence - bayes theorem help
  • Date: Thu, 31 Jul 2008 14:36:55 -0400
  • Authentication-results: mx.google.com; spf=pass (google.com: domain of steve.eichert@gmail.com designates 74.125.44.29 as permitted sender) smtp.mail=steve.eichert@gmail.com; dkim=pass (test mode) header.i=@gmail.com
  • Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=beta; h=domainkey-signature:received:received:x-sender:x-apparently-to :received:received:received-spf:authentication-results:received :dkim-signature:domainkey-signature:received:received:message-id :date:from:to:subject:in-reply-to:mime-version:content-type :references:reply-to:sender:precedence:x-google-loop:mailing-list :list-id:list-post:list-help:list-unsubscribe:x-beenthere; bh=bRljVkhKUEYSTGRyeKyUxk79qW8zWJKaqvPTSlxSK48=; b=ePDuxVFZq4/yhSfxSxx5yoMUDhRkOe96LXg6pp0fcUyEvotMESkornturOvjLXMRSX QPAk3ME7uU0vci44WmCwKKlc6w19Y4QtbtqOi1bm+YxqCOurl0blLf7tIzOHNTstGgGW 4akBKRAK9WWqlkYtjC0u0BrWygD9xYimJow6g=
  • Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to :subject:in-reply-to:mime-version:content-type:references; bh=QIX++vmKhsU12fsSVVxC5FCy0WMrlzUm0S5kmbeBomg=; b=LIRMBnxZvLolf4xAEJIt1aMjNPWvZnsUttNpB7pCFG452Ycg8qmFaI9lIDhyIStxeJ LkqiH4yM5b1dlcOHRsTICJvCKzAJ4FN/byfCREWZYUl4g8H+xp0k8wC7BQy2JIdFAvXY tpWairPH1+uS9MHPhZpcXKFDTqGc7O33xy9hU=
  • Mailing-list: list philly-lambda@googlegroups.com; contact philly-lambda+owner@googlegroups.com
  • Reply-to: philly-lambda@googlegroups.com
  • Sender: philly-lambda@googlegroups.com

I'm no probability expert either, but have you tried solving the
conditional probability formula for P(A)?  As in...

P(A|B) = P(B|A)*P(A) / P(B)

P(A) = P(A|B)*P(B) / P(B|A)

Not yet, but I had someone else make this suggestion as well, so I'll give that a whirl.
 
ASCII text may be a little misleading.  Your events are actually
parameterized over X and Y.  Normally this would be written B_X (TeX),
as in B with subscript X, to represent the event that Person X is in
the Philly Lambda user group.

ASCII text is actually probably better for me since throwing in all the mathmatical symbols tends to only confusion the situation further for me. 
 
The reason I bring this up is because to figure out P(A|B), we would
take the number of people in Philly Lambda who identify person Y, and
divide by the number of people in Philly Lambda.  And this would be
different for each person Y.

The weird thing, which I think may be the cause of your confusion, is
that for some people, we don't know who they identify.  We can't
really compute P(A|B) as I described.  So do we include them in the
total number of people in Philly Lambda?  ... You see what I mean?
Because they are unknown for who they identify, we can't really say.

This is definitely the point at which I started to get confused and loose a hold of what I was trying to figure out.
 
A simplification might be to exclude the people who did not respond
from the dataset.  Use that dataset to compute the probabilities.
Then predict the ones who didn't respond from that.  I think this
makes sense because it's like spam filtering.  You use all the emails
you've seen before to create the probability predictors.  Then you use
those predictors to classify new email, which you really don't know
whether they are spam or not.

Makes sense.  I'll take a stab at this approach and see how I make out.  Thanks for the thoughts.