Steve Eichert on 30 Jul 2008 18:24:26 -0700 |
Hey All,
I recently read Collective Intelligence and it sparked a lot of interest for me in machine learning. I'm having some trouble figuring out how to make the leap from what's discussed in the book to other real world examples. This is a contrived example but humor me :) I'd love some help from those in the group in understanding the different methods discussed in CI since I'm not making out that well on my own. So onto my contrived example. Lets say I have a list of people along with some attributes (city, state, UG affiliation) about the people. A sample is below in CSV format Name, City, State, Primary User Group Affiliation Steve, Jenkintown, PA, Philly Lambda Kyle, Somewhere, PA, Philly Lambda Geoge, Elsewhere, NJ, Philly .NET Randy, Landsdale, PA, Philly on Rails Aaron, Collingswood, NJ, Philly Lambda Toby, Topsecretville, PA, Philly Lambda Jonathan, PLPatternville, NJ, Philly Lambda I've asked all these people who they follow on Twitter. I hear back from some people and not others. The data I did receive is below: Person, Followers (pipe separated) Steve, Toby|Kyle|Aaron Kyle, Toby|Jonathan|Andrew George, Blah Aaron, Toby Again please forgive the contrived example. What I would like to be able to do is figure out the probability that someone who didn't respond would follow a person followed by one of the people who did respond. The theory is that by looking at the common attributes of the people who are following a particular person, you may be able to assume that someone else with the same, or similar, attributes would also follow that person. For example, in the example dataset, we see that Steve, Kyle, and Aaron all belong to Philly Lambda and they all follow Toby on Twitter. Given this, how could we calculate the probability/likelihood that Jonathan follows Toby on twitter, given that he also listed Philly Lambda as his primary user group. Taking this to the next step, given all the attributes that we have (city, state, ug) how can we figure out the overall probability given all the attributes. And secondarily, how could we identify the best attribute for predicting whether or not someone would follow someone else on Twitter? I was originally experimenting with Bayes theorem (http://en.wikipedia.org/wiki/Bayes'_theorem), but after spending a little bit of time I'm either not smart enough to know how it could be applied (very likely), or its not a good candidate. How would you go about solving/figuring out this? With Bayes I was trying to take the following approach: formula: P(A|B) = P(B|A)*P(A)/P(B) What's the probability that Person X would identify Person Y, given that Person X is in the Philly Lambda user group? A = Person X will identify Person Y B = Person X is in the Philly Lambda user group However, in order to take this approach I believe I would need to know the probability that person X will identify person Y, which is what I'm trying to figure out. I'm pretty clueless, and this whole experience has made me realize that whatever math skills I previously had have gone down the tubes. It would be noble of you to help me get on the path towards getting them back! I realize this is pretty off topic so if people would prefer the discussion happen off list let me know. I wanted to drop a note here because I figured out of the different groups which I have connections with, this would be the most likely to have someone who may be able to assist. I also realize that this is probably pretty straight forward to some of you, however, for someone (me) who has had their brain melted by monotonous data collection applications it requires someone with more intellect to help :) Cheers, Steve
|
|