Steve Eichert on 31 Jul 2008 11:57:06 -0700 |
I'm sure my terms are quite confusing, since the example is somewhat contrived. Let me restate my goal to see if it clarifies anything.
I want to determine the probability that one person (Jonathan) will follow another person (Toby) on Twitter by examining the attributes of those people who we know follow Toby (Steve, Kyle, Aaron). I think the simplest way to go about doing this without getting wrapped up in theorems and such is to figure out what the probability is that a person with a particular attribute follows Toby. So to do this, I would get a count of all the people who follow Toby with that particular attribute, and divide it by the total number of people that responded with that attribute (regardless of whether they follow Toby) to determine the probability of someone with that attribute following Toby. In our sample that would be: Review of data (with responded attribute added) Name, City, State, Primary User Group Affiliation,Responded Steve, Jenkintown, PA, Philly Lambda,Y Kyle, Somewhere, PA, Philly Lambda,Y Geoge, Elsewhere, NJ, Philly .NET,Y Randy, Landsdale, PA, Philly on Rails,Y Aaron, Collingswood, NJ, Philly Lambda,Y Toby, Topsecretville, PA, Philly Lambda,Y Jonathan, PLPatternville, NJ, Philly Lambda,N Person, Followers Steve, Toby|Kyle|Aaron Kyle, Toby|Jonathan|Andrew George, Blah Aaron, Toby 3 people are in lambda who follow Toby / 4 total people are in lambda who responded Since 1 of the 4 is Toby himself, and he can't follow himself I think we'd have: 3/3 = 1 = 100% Given this, if we want to know the probability that Jonathan would follow Toby, given that he's in lambda, we'd say given our existing data the probability is 1 or 100% likely. So, I'd also want to figure this out for the other attributes we have (city, state). So if I want to know the probability of Jonathan following Toby given that he's from NJ I'd look at the total number of people following Toby from NJ and divide that by the total number of people who responded that are from NJ. 1 person from NJ who follows Toby (Aaron) / 2 people responded who live in NJ. 1/2 = .5 So there would be a 50% chance that Jonathan follows Toby given that he's from NJ. So from what I understand, in order to find the probability that Jonathan follows Toby given that he's in Philly Lambda, and he's from NJ I would multiple the probabilities of each together. 1 * .5 = 50% So I think that I could say that there's a 50% chance that a person from NJ and in Philly Lambda follows Toby. Is that correct given this simplistic approach, or am I doing something wrong? I don't know anything about the other stuff you mentioned (Bayes classifier, regression analysis) so I'll have to try and read a bit about them and see how I may be able to use them. I also didn't really answer your questions, but hopeful that helps give you some more background regarding where I was approaching the problem from. It's basically at this point that I start getting lost in how to apply bayes (and other algorithms) to my problem. I think some of what everyone has said thus far has sunk in a bit so hopefully with further reflection / focus I'll find my way. Thanks! - Steve On Thu, Jul 31, 2008 at 11:49 AM, yegg <gabriel.weinberg@gmail.com> wrote:
|
|