Pure statistical, or Natural Language Processing engine?

LingPipe is probably worth a look as complete NLP tool.

LingPipe is probably worth a look as complete NLP tool. However, if all you need to do is find verbs and nouns and stem them, then you could just 1) tokenize text 2) run a POS tagger 3) run a stemmer The Stanford tools can do this for multiple languages I believe, and NLTK would be a quick way to try it out. However, you want to be careful of just going after verbs and nouns- what do you do about noun phrases and multiword nouns?

Ideally an nlp package can handle this, but a lot of it depends on the domain you are working in. Unfortunately a lot of NLP is how good your data is.

You're probably looking for the Snowball project, which has developed stemmers for a number of different languages.

If you're looking for Java code, I can recommend Stanford's set of tools. Their POS tagger works for English, German, Chinese and Arabic (though I only used it for English) and includes an (English-only) lemmatizer. These tools are all free, accuracy is pretty high and the speed is not too bad for a Java-based solution; the main problems are sometimes flaky APIs and high memory use.

I had good experience with TreeTagger: ims.uni-stuttgart.de/projekte/corplex/Tr... It's easy to use, faster than the Stanford's one, and belongs to the "good" stemmers/taggers out there. It does all operations at once: tokenization/stemming/tagging.

Interesting, but it has a commercial license. I was hoping for something free. – Inge Henriksen Jul 10 at 16:07.

I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.

Related Questions