yegg on 31 Jul 2008 13:15:37 -0700


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: collective intelligence - bayes theorem help

  • From: yegg <gabriel.weinberg@gmail.com>
  • To: Philly Lambda <philly-lambda@googlegroups.com>
  • Subject: Re: collective intelligence - bayes theorem help
  • Date: Thu, 31 Jul 2008 13:15:30 -0700 (PDT)
  • Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=beta; h=domainkey-signature:received:received:x-sender:x-apparently-to :mime-version:received:date:in-reply-to:x-ip:references:user-agent :x-http-useragent:message-id:subject:from:to:content-type :content-transfer-encoding:reply-to:sender:precedence:x-google-loop :mailing-list:list-id:list-post:list-help:list-unsubscribe :x-beenthere; bh=g5OmPhnFluBVwtzkfy0D7DoSJWUUgpxVteIymspgVdI=; b=HzGQfVQ5LXgPqOblQaya3FaMjZ53tKKDLFKkAFr53jEBg4JaKXD2bLQMARlsFNL2ha ck7SKJ5axFTkRfD3/JiMggx7Py0J0i/ZYHwHesx8CCNdWoo+5oiIPF7SXXh4rgSkzFDV uuboGgtRNaHMPmjvOafaN7OZmPbxWHsNM024c=
  • Mailing-list: list philly-lambda@googlegroups.com; contact philly-lambda+owner@googlegroups.com
  • Reply-to: philly-lambda@googlegroups.com
  • Sender: philly-lambda@googlegroups.com
  • User-agent: G2/1.0

> So there would be a 50% chance that Jonathan follows Toby given that he's
> from NJ.  So from what I understand, in order to find the probability that
> Jonathan follows Toby given that he's in Philly Lambda, and he's from NJ I
> would multiple the probabilities of each together.
>
> 1 * .5 = 50%
>
> So I think that I could say that there's a 50% chance that a person from NJ
> and in Philly Lambda follows Toby.  Is that correct given this simplistic
> approach, or am I doing something wrong?

This assumes the attributes are completely independent of each other.
Take the case of language attributes, e.g. who uses Perl and Lisp.
Suppose both were 50% (of the people who follow Toby).  By this logic,
the final probability would be 25%.  But what if the exact same people
who use Lisp also use Perl, then the real answer would be 50% because
the additional attribute tells you nothing.  It would only be 25% if
they were completely independent.


>
> I don't know anything about the other stuff you mentioned (Bayes classifier,
> regression analysis) so I'll have to try and read a bit about them and see
> how I may be able to use them.  

You can do this in Excel.  The help is helpful.  Basic linear
regression is built in.  To do more advanced stuff, do Tools->Add Ins,
add Analysis and Solver.  Then you can do Tools->Data Analysis.