Re: [PLUG] Lingua::Identify languages

On Wed, Feb 20, 2008 at 8:37 AM, Walt Mankowski <> wrote:
> During my talk at Plug West Monday night, one of the Perl NLP modules
>  I talked about was Lingua::Identify.  This is an interesting module
>  that tries to guess what language a given text string is.
>  Lingua::Identify exports a function called langof().  If you call
>  langof() in scalar context it returns the most likely language, but if
>  you call it in list context it returns a list of languages paired with
>  its estimated probability of the text being that language.
>  As an example I passed in the text of the GPL (Version 1, if anyone's
>  interested).  Its top 3 guesses were:
>   English    26.7%
>   French      6.7%
>   Romanian    4.3%

I actually wrote code to do this for my Artificial Intelligence course
at Harvard.  I did it in Python though.  Now, my code wasn't as good
as it could be.  With proper training, supposedly, you can identify a
language with high probability in as little as a few characters.  If
anyone wants the code, I can post it if you are interested...
Kristian Erik Hermansen
"It has been just so in all my inventions. The first step is an
intuition--and comes with a burst, then difficulties arise. This thing
gives out and then that--'Bugs'--as such little faults and
difficulties are called--show themselves and months of anxious
watching, study and labor are requisite before commercial success--or
failure--is certainly reached" -- Thomas Edison in a letter to
Theodore Puskas on November 18, 1878
