Kristian Erik Hermansen on 20 Feb 2008 12:59:03 -0800 |
On Wed, Feb 20, 2008 at 8:37 AM, Walt Mankowski <waltman@pobox.com> wrote: > During my talk at Plug West Monday night, one of the Perl NLP modules > I talked about was Lingua::Identify. This is an interesting module > that tries to guess what language a given text string is. > > Lingua::Identify exports a function called langof(). If you call > langof() in scalar context it returns the most likely language, but if > you call it in list context it returns a list of languages paired with > its estimated probability of the text being that language. > > As an example I passed in the text of the GPL (Version 1, if anyone's > interested). Its top 3 guesses were: > > English 26.7% > French 6.7% > Romanian 4.3% I actually wrote code to do this for my Artificial Intelligence course at Harvard. I did it in Python though. Now, my code wasn't as good as it could be. With proper training, supposedly, you can identify a language with high probability in as little as a few characters. If anyone wants the code, I can post it if you are interested... -- Kristian Erik Hermansen -- "It has been just so in all my inventions. The first step is an intuition--and comes with a burst, then difficulties arise. This thing gives out and then that--'Bugs'--as such little faults and difficulties are called--show themselves and months of anxious watching, study and labor are requisite before commercial success--or failure--is certainly reached" -- Thomas Edison in a letter to Theodore Puskas on November 18, 1878 ___________________________________________________________________________ Philadelphia Linux Users Group -- http://www.phillylinux.org Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce General Discussion -- http://lists.phillylinux.org/mailman/listinfo/plug
|
|