Kristian Erik Hermansen on 20 Feb 2008 12:59:03 -0800


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] Lingua::Identify languages


On Wed, Feb 20, 2008 at 8:37 AM, Walt Mankowski <waltman@pobox.com> wrote:
> During my talk at Plug West Monday night, one of the Perl NLP modules
>  I talked about was Lingua::Identify.  This is an interesting module
>  that tries to guess what language a given text string is.
>
>  Lingua::Identify exports a function called langof().  If you call
>  langof() in scalar context it returns the most likely language, but if
>  you call it in list context it returns a list of languages paired with
>  its estimated probability of the text being that language.
>
>  As an example I passed in the text of the GPL (Version 1, if anyone's
>  interested).  Its top 3 guesses were:
>
>   English    26.7%
>   French      6.7%
>   Romanian    4.3%

I actually wrote code to do this for my Artificial Intelligence course
at Harvard.  I did it in Python though.  Now, my code wasn't as good
as it could be.  With proper training, supposedly, you can identify a
language with high probability in as little as a few characters.  If
anyone wants the code, I can post it if you are interested...
-- 
Kristian Erik Hermansen
--
"It has been just so in all my inventions. The first step is an
intuition--and comes with a burst, then difficulties arise. This thing
gives out and then that--'Bugs'--as such little faults and
difficulties are called--show themselves and months of anxious
watching, study and labor are requisite before commercial success--or
failure--is certainly reached" -- Thomas Edison in a letter to
Theodore Puskas on November 18, 1878
___________________________________________________________________________
Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug