|
[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]
|
Re: [PLUG] Lingua::Identify languages
|
On Wed, Feb 20, 2008 at 8:37 AM, Walt Mankowski <waltman@pobox.com> wrote:
> During my talk at Plug West Monday night, one of the Perl NLP modules
> I talked about was Lingua::Identify. This is an interesting module
> that tries to guess what language a given text string is.
>
> Lingua::Identify exports a function called langof(). If you call
> langof() in scalar context it returns the most likely language, but if
> you call it in list context it returns a list of languages paired with
> its estimated probability of the text being that language.
>
> As an example I passed in the text of the GPL (Version 1, if anyone's
> interested). Its top 3 guesses were:
>
> English 26.7%
> French 6.7%
> Romanian 4.3%
I actually wrote code to do this for my Artificial Intelligence course
at Harvard. I did it in Python though. Now, my code wasn't as good
as it could be. With proper training, supposedly, you can identify a
language with high probability in as little as a few characters. If
anyone wants the code, I can post it if you are interested...
--
Kristian Erik Hermansen
--
"It has been just so in all my inventions. The first step is an
intuition--and comes with a burst, then difficulties arise. This thing
gives out and then that--'Bugs'--as such little faults and
difficulties are called--show themselves and months of anxious
watching, study and labor are requisite before commercial success--or
failure--is certainly reached" -- Thomas Edison in a letter to
Theodore Puskas on November 18, 1878
___________________________________________________________________________
Philadelphia Linux Users Group -- http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion -- http://lists.phillylinux.org/mailman/listinfo/plug
|
|