Bonnie Aumann on 21 May 2009 06:19:02 -0700


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: Fuzzy Matching resources?

  • From: Bonnie Aumann <aumannb@gmail.com>
  • To: philly-lambda@googlegroups.com
  • Subject: Re: Fuzzy Matching resources?
  • Date: Thu, 21 May 2009 09:18:48 -0400
  • Authentication-results: gmr-mx.google.com; spf=pass (google.com: domain of aumannb@gmail.com designates 209.85.132.244 as permitted sender) smtp.mail=aumannb@gmail.com; dkim=pass (test mode) header.i=@gmail.com
  • Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=beta; h=domainkey-signature:received:received:x-sender:x-apparently-to :received:received:received-spf:authentication-results:received :dkim-signature:domainkey-signature:mime-version:received :in-reply-to:references:date:message-id:subject:from:to:content-type :content-transfer-encoding:reply-to:sender:precedence:x-google-loop :mailing-list:list-id:list-post:list-help:list-unsubscribe :x-beenthere-env:x-beenthere; bh=FSR2+8N6LxbyVQ6H9UxEMQxRUQxmXlDKH9mWo/Uw7X4=; b=UVecWD8ItUoWet0SfYwwWlx3gSPswfqE4Ti9dyV33T/mcJ9s8gD4PRNZwyNSBDqrwx 3aT//pYiNCE3XDnD47rHHTKuGiICTbaXw1awbDP7F4MdWFuLQOlAuyVmJZRv4nyEfOQg bAWW8FTEUnJz6lyoPbxa2tRDIez4w5mcBA2kQ=
  • Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=lR1iV1o1F7UNtXodTeo8hf7Y+U+6bkW0hR8ez/AbCog=; b=srboPbDrk5EiIXoFvbPAizqYTSy+LQNrYmhxa7FIJpOOudqsTxlIF/E2V/cKiAgCD0 CK8jQFBg02XlQMT+juZWJ9jm8Yg0XcRnO1Uzu9YlFtB6Wfx83Wz5J0XMB9IInJuxhu1W U/h4PpZX1d86yMMz4XENyDzLt8HuhT7DPcoS4=
  • Mailing-list: list philly-lambda@googlegroups.com; contact philly-lambda+owner@googlegroups.com
  • Reply-to: philly-lambda@googlegroups.com
  • Sender: philly-lambda@googlegroups.com

> For Perl, some CPAN modules to look into are:
>
> String::Approx http://search.cpan.org/dist/String-Approx/Approx.pm
> Text::Levenshtein
> http://search.cpan.org/~jgoldberg/Text-Levenshtein-0.05/Levenshtein.pm
> Text::Brew http://search.cpan.org/~kcivey/Text-Brew-0.02/lib/Text/Brew.pm
> String::Nysiis http://search.cpan.org/dist/String-Nysiis/
> Text::Soundex http://search.cpan.org/~markm/Text-Soundex-3.03/Soundex.pm
> Text::DoubleMetaphone
> http://search.cpan.org/~maurice/Text-DoubleMetaphone-0.07/DoubleMetaphone.pm
>
> Nysiis, Soundex and DoubleMetaphone can be used both to perform [very]
> fuzzy comparisons and to create an index to use as a basis for other
> fuzzy matching.
>
> The Approx and Levenshtein (which is edit distance) can be used to
> count the # of edits, and to calculate a similarity percentage ( 1 -
> #edits / length of longer string).
>
> The approx (adist) and Levenshtein may be what you're after.
>
> Is this the kind of info you were after?
>
>
> Regards,
>
> Kyle
>

Thanks Kyle - I think you mentioned that you did a talk about fuzzy
matching as well, which would help me figure out how to use the
modules you've suggested.

Thanks,

Bonnie