Simple Natural Language Processing Startup for Java?

You say that you need to 'parse' each sentence. You probably already know this, but just to be explicit, in NLP, the term 'parse' usually means to recover some hierarchical syntactic structure. The most common types are constituent structure (e.g. , via a context-free grammar) and dependency structure.

You say that you need to 'parse' each sentence. You probably already know this, but just to be explicit, in NLP, the term 'parse' usually means to recover some hierarchical syntactic structure. The most common types are constituent structure (e.g. , via a context-free grammar) and dependency structure.

If you need hierarchical structure, I'd recommend you consider just starting with a parser. Most parsers I'm aware of include POS tagging during parsing, and may provide higher accuracy tagging than finite-state POS taggers (Caveat - I'm much more familiar with constituent parsers than with dependency parsers.It's possible some or most dependency parsers would require POS tags as input). The big downside to parsing is the time complexity.

Finite-state POS taggers often run at thousands of words per second. Even greedy dependency parsers are considerably slower, and constituent parsers generally run at 1-5 sentences per second. So if you don't need hierarchical structure, you probably want to stick with a finite-state POS tagger for efficiency.

If you do decide you need parse structure, a few recommendations: I think the Stanford parser suggested by @aab includes both a constituent parser and a dependency parser. The Berkeley Parser ( http://code.google.com/p/berkeleyparser/ ) is a pretty well-known PCFG constituent parser, achieves state-of-the-art accuracy (equal or superior to the Stanford parser, I believe), and is reasonably efficient (~3-5 sentences per second). The BUBS Parser ( http://code.google.com/p/bubs-parser/ ) can also run with the high-accuracy Berkeley grammar, and improves efficiency to around 15-20 sentences/second.

Full disclosure - I'm one of the primary researchers working on this parser. Warning: both of these parsers are research code, with all the problems that engenders. But I'd love to see people actually using BUBS, so if it's of use to you, give it a try and contact me with problems, comments, suggestions, etc. And a couple Wikipedia references for background if needed: Context-free grammars: http://en.wikipedia.org/wiki/Stochastic_context-free_grammar Dependency grammars: http://en.wikipedia.org/wiki/Dependency_grammar.

I don't need parsing now.. I just need to POStag – shababhsiddique Apr 29 at 18:16 If you don't need parse structure, then definitely stick with a finite-state tagger. It should be much faster and simpler, and pretty comparable in accuracy (at least if you can find a tagging model trained on comparable text). The Stanford POS Tagger is probably a good bet.

– AaronD Apr 29 at 18:24 I am really troubled by so many tools.. I don't have a good internet connection so it would take time to download a new one(stanford). It would be nice if you can help me to do it with openNLP. As I have gone a little further - stackoverflow.Com/questions/5836148/… I just now need to use it from a Java application – shababhsiddique Apr 29 at 18:57.

Generally you'd do these two tasks in the other order: Do part-of-speech tagging Run a parser using the POS tags as input OpenNLP's documentation isn't that thorough and some of it's gotten hard to find due to the switch to apache. Some (potentially slightly out-of-date) tutorials are available in the old SF wiki. You might want to take a look at the Stanford NLP tools, in particular the Stanford POS Tagger and the Stanford Parser.

Both have downloads that include pre-trained model files and they also have demo files in the top-level directory that show how to get started with the API and short shell scripts that show how to use the tools from the command-line. LingPipe might be another good toolkit to check out. A quick search here will lead you to a number of similar questions with links to other alternatives, too!

Stanford CoreNLP or The Stanford Parser or The Stanford POS Tagger – shababhsiddique Apr 29 at 15:16 It depends on what you want/need to do. CoreNLP includes the other two tools plus other annotators, so if you're just experimenting with different kinds of annotation, CoreNLP would be a good place to start. From this question and your related questions, it sounds like you might benefit from reading more about computational linguistics before you get started with your task.

I'd suggest Speech and Language Processing by Jurafsky and Martin: cs.colorado. Edu/~martin/slp. Html – aab May 2 at 8:00.

The most popular are: GATE: easy to use and fairly quick to start with UIMA: slow learning curve but more efficient and more generic.

– shababhsiddique Apr 29 at 18:31 I suggest you start with using the GATE GUI by following the user guide. There's also a quick start guide. This will allow you to get a grip on GATE basics.

Then you may use the API (there are Javadocs and code examples). – Robert Bossy May 3 at 9:41.

I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.

Related Questions