Steve Eichert on 30 Jul 2008 18:24:26 -0700


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

collective intelligence - bayes theorem help

  • From: "Steve Eichert" <steve.eichert@gmail.com>
  • To: philly-lambda@googlegroups.com
  • Subject: collective intelligence - bayes theorem help
  • Date: Wed, 30 Jul 2008 21:24:18 -0400
  • Authentication-results: mx.google.com; spf=pass (google.com: domain of steve.eichert@gmail.com designates 74.125.46.28 as permitted sender) smtp.mail=steve.eichert@gmail.com; dkim=pass (test mode) header.i=@gmail.com
  • Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=beta; h=domainkey-signature:received:received:x-sender:x-apparently-to :received:received:received-spf:authentication-results:received :dkim-signature:domainkey-signature:received:received:message-id :date:from:to:subject:mime-version:content-type:reply-to:sender :precedence:x-google-loop:mailing-list:list-id:list-post:list-help :list-unsubscribe:x-beenthere; bh=xBh6wxme9qGoO2XpLztutsc546FeuQXBCDl57tAQwi8=; b=OxvRuXFlA0hrFHA/F7KYiIQL5x8CX0teq+VsC6QLNH7wezUQg4MKEU6PlwrYG2OoxH VQmXET6g7+9DyDrC/xF943woTPIfCXABV4QjU0NxJa91PbCVu722nMvaD9Uu41OHk/WJ KlMpdAc3UH7nc8gVuAbl2U+iGrp5/Hx/QrC2E=
  • Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to :subject:mime-version:content-type; bh=R4jDUCL5F+LcxMzeKNIRDQltEKfYLnXDw79auRDZKzg=; b=rMMyyyDx6bt++JGCc0rT2L84bcZWytx3AQOIBxf9qcQ9plEk59Mj5S4YG3sE7EMfK0 VpD6/rAP7+17bCdBpHK6X/dxA/lK9EbXPkUNY2Y+5r5D5pWedzfUs3SnJHTSjPa+FznR C6qxrdkr+Gd1xO++Cio2vMWARluNQgtIujBtc=
  • Mailing-list: list philly-lambda@googlegroups.com; contact philly-lambda+owner@googlegroups.com
  • Reply-to: philly-lambda@googlegroups.com
  • Sender: philly-lambda@googlegroups.com

Hey All,

I recently read Collective Intelligence and it sparked a lot of interest for me in machine learning.  I'm having some trouble figuring out how to make the leap from what's discussed in the book to other real world examples.  This is a contrived example but humor me :)  I'd love some help from those in the group in understanding the different methods discussed in CI since I'm not making out that well on my own.

So onto my contrived example.  Lets say I have a list of people along with some attributes (city, state, UG affiliation) about the people.  A sample is below in CSV format

Name, City, State, Primary User Group Affiliation
Steve, Jenkintown, PA, Philly Lambda
Kyle, Somewhere, PA, Philly Lambda
Geoge, Elsewhere, NJ, Philly .NET
Randy, Landsdale, PA, Philly on Rails
Aaron, Collingswood, NJ, Philly Lambda
Toby, Topsecretville, PA, Philly Lambda
Jonathan, PLPatternville, NJ, Philly Lambda

I've asked all these people who they follow on Twitter.  I hear back from some people and not others.  The data I did receive is below:

Person, Followers (pipe separated)
Steve, Toby|Kyle|Aaron
Kyle, Toby|Jonathan|Andrew
George, Blah
Aaron, Toby

Again please forgive the contrived example.  What I would like to be able to do is figure out the probability that someone who didn't respond would follow a person followed by one of the people who did respond.  The theory is that by looking at the common attributes of the people who are following a particular person, you may be able to assume that someone else with the same, or similar, attributes would also follow that person.

For example, in the example dataset, we see that Steve, Kyle, and Aaron all belong to Philly Lambda and they all follow Toby on Twitter.  Given this, how could we calculate the probability/likelihood that Jonathan follows Toby on twitter, given that he also listed Philly Lambda as his primary user group.  Taking this to the next step, given all the attributes that we have (city, state, ug) how can we figure out the overall probability given all the attributes.  And secondarily, how could we identify the best attribute for predicting whether or not someone would follow someone else on Twitter?

I was originally experimenting with Bayes theorem (http://en.wikipedia.org/wiki/Bayes'_theorem), but after spending a little bit of time I'm either not smart enough to know how it could be applied (very likely), or its not a good candidate.  How would you go about solving/figuring out this?

With Bayes I was trying to take the following approach:

formula: P(A|B) = P(B|A)*P(A)/P(B)

What's the probability that Person X would identify Person Y, given that Person X is in the Philly Lambda user group?

A = Person X will identify Person Y
B = Person X is in the Philly Lambda user group

However, in order to take this approach I believe I would need to know the probability that person X will identify person Y, which is what I'm trying to figure out.  I'm pretty clueless, and this whole experience has made me realize that whatever math skills I previously had have gone down the tubes.  It would be noble of you to help me get on the path towards getting them back!

I realize this is pretty off topic so if people would prefer the discussion happen off list let me know.  I wanted to drop a note here because I figured out of the different groups which I have connections with, this would be the most likely to have someone who may be able to assist.  I also realize that this is probably pretty straight forward to some of you, however, for someone (me) who has had their brain melted by monotonous data collection applications it requires someone with more intellect to help :)

Cheers,
Steve