On Thu, Jul 31, 2008 at 2:03 PM, Jonathan Tran <
jonnytran@gmail.com> wrote:
>
> On Wed, Jul 30, 2008 at 9:24 PM, Steve Eichert <
steve.eichert@gmail.com> wrote:
>> A = Person X will identify Person Y
>> B = Person X is in the Philly Lambda user group
>>
>> However, in order to take this approach I believe I would need to know the
>> probability that person X will identify person Y, which is what I'm trying
>> to figure out.
>
> I'm no probability expert either, but have you tried solving the
> conditional probability formula for P(A)? As in...
>
> P(A|B) = P(B|A)*P(A) / P(B)
>
> P(A) = P(A|B)*P(B) / P(B|A)
>
> ASCII text may be a little misleading. Your events are actually
> parameterized over X and Y. Normally this would be written B_X (TeX),
> as in B with subscript X, to represent the event that Person X is in
> the Philly Lambda user group.
>
> The reason I bring this up is because to figure out P(A|B), we would
> take the number of people in Philly Lambda who identify person Y, and
> divide by the number of people in Philly Lambda. And this would be
> different for each person Y.
>
> The weird thing, which I think may be the cause of your confusion, is
> that for some people, we don't know who they identify. We can't
> really compute P(A|B) as I described. So do we include them in the
> total number of people in Philly Lambda? ... You see what I mean?
> Because they are unknown for who they identify, we can't really say.
>
> A simplification might be to exclude the people who did not respond
> from the dataset. Use that dataset to compute the probabilities.
> Then predict the ones who didn't respond from that. I think this
> makes sense because it's like spam filtering. You use all the emails
> you've seen before to create the probability predictors. Then you use
> those predictors to classify new email, which you really don't know
> whether they are spam or not.
>