yegg on 1 Aug 2008 06:14:13 -0700


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: collective intelligence - bayes theorem help

  • From: yegg <gabriel.weinberg@gmail.com>
  • To: Philly Lambda <philly-lambda@googlegroups.com>
  • Subject: Re: collective intelligence - bayes theorem help
  • Date: Fri, 1 Aug 2008 06:14:09 -0700 (PDT)
  • Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=beta; h=domainkey-signature:received:received:x-sender:x-apparently-to :mime-version:received:date:in-reply-to:x-ip:references:user-agent :x-http-useragent:message-id:subject:from:to:content-type :content-transfer-encoding:reply-to:sender:precedence:x-google-loop :mailing-list:list-id:list-post:list-help:list-unsubscribe :x-beenthere; bh=cEqpfMJs9xgee+Mcq6fVjzU1tHl0Ll8/Xt1RmNPOyxA=; b=dVzHwD/8bGfssQ0zfOY9a3YMseUZxQ3DvKO62iF+juRRNUEConlOJD46v/3gDnw1yl LSJdRu2XA+eqkztP9hPGBXibOo/OgWpKpbNUQ/QhUeTTxEdGJeqhL0xRj+3Yv6E7C04g H380G57W9PWyvRwcSUOfMBeQYc5Umdu/7h2NE=
  • Mailing-list: list philly-lambda@googlegroups.com; contact philly-lambda+owner@googlegroups.com
  • Reply-to: philly-lambda@googlegroups.com
  • Sender: philly-lambda@googlegroups.com
  • User-agent: G2/1.0

It doesn't impact the probability at all.  In your case, the answer is
100%.  Everyone in your sample follows Toby.  So no other factors
matter.  Take NJ.  Everyone who has both NJ and Lambda in your sample
(Aaron) follows Toby.  Once you know they are in lambda, that's the
end of it.

I can understand why you don't want to, but I really think it helps to
try to write out the equations and, in so doing, define your universe
explicitly.  In my original email, I defined it as the set of Twitter
users, which may or may not be what you want.  But note that is way
different than the set of lambda subscribers, i.e.

P(NJ|lambda) ~ 25%?
P(NJ|twitter) ~ 3%?
P(toby|lambda) = 1;
P(toby|twitter) = 144/# of twitter users

It may help you to just write out all the combinations (extending the
above) and see what you know and don't know.  Then you can try to
apply Bayes theorem and other formulas to get a sense for what is
going on.


On Jul 31, 5:11 pm, Steve Eichert <steve.eich...@gmail.com> wrote:
> Right, and what if in my example, state  doesn't impact the  
> probability at all.  If belonging to Philly lambda is the key  
> determining factor then taking state into account only throws us out  
> of whack.
>
> I'll have to take a look at what's available in excel, perhaps it will  
> help me understand.
>
> Steve
>
> On Jul 31, 2008, at 4:15 PM, yegg <gabriel.weinb...@gmail.com> wrote:
>
>
>
> >> So there would be a 50% chance that Jonathan follows Toby given  
> >> that he's
> >> from NJ.  So from what I understand, in order to find the  
> >> probability that
> >> Jonathan follows Toby given that he's in Philly Lambda, and he's  
> >> from NJ I
> >> would multiple the probabilities of each together.
>
> >> 1 * .5 = 50%
>
> >> So I think that I could say that there's a 50% chance that a person  
> >> from NJ
> >> and in Philly Lambda follows Toby.  Is that correct given this  
> >> simplistic
> >> approach, or am I doing something wrong?
>
> > This assumes the attributes are completely independent of each other.
> > Take the case of language attributes, e.g. who uses Perl and Lisp.
> > Suppose both were 50% (of the people who follow Toby).  By this logic,
> > the final probability would be 25%.  But what if the exact same people
> > who use Lisp also use Perl, then the real answer would be 50% because
> > the additional attribute tells you nothing.  It would only be 25% if
> > they were completely independent.
>
> >> I don't know anything about the other stuff you mentioned (Bayes  
> >> classifier,
> >> regression analysis) so I'll have to try and read a bit about them  
> >> and see
> >> how I may be able to use them.
>
> > You can do this in Excel.  The help is helpful.  Basic linear
> > regression is built in.  To do more advanced stuff, do Tools->Add Ins,
> > add Analysis and Solver.  Then you can do Tools->Data Analysis.
>
>