The task of determining the proper part of speech for a word in a text is called Part of Speech Tagging The Brill tagger for example, uses a mixture of dictionary(vocabulary) words and contextual rules. I believe that some of the important initial dictionary words for this task are the stop words. Once you have (mostly correct) parts of speech for your words, you can start building larger structures This industry-oriented book differentiates between recognizing noun phrases (NPs) and recognizing named entities.
About textbooks: Allen's Natural Language Understanding is a good, but a bit dated, book Foundations of Statistical Natural Language Processing is a nice introduction to statistical NLP Speech and Language Processing is a bit more rigorous and maybe more authoritative The Association for Computational Linguistics is a leading scientific community on computational linguistics.
The task of determining the proper part of speech for a word in a text is called Part of Speech Tagging. The Brill tagger, for example, uses a mixture of dictionary(vocabulary) words and contextual rules. I believe that some of the important initial dictionary words for this task are the stop words.
Once you have (mostly correct) parts of speech for your words, you can start building larger structures. This industry-oriented book differentiates between recognizing noun phrases (NPs) and recognizing named entities. About textbooks: Allen's Natural Language Understanding is a good, but a bit dated, book.
Foundations of Statistical Natural Language Processing is a nice introduction to statistical NLP. Speech and Language Processing is a bit more rigorous and maybe more authoritative. The Association for Computational Linguistics is a leading scientific community on computational linguistics.
Thanks for the resources. – VirtuosiMedia Mar 4 '09 at 10:15.
Besides the dictionary-based approach, two others come to my mind: Pattern-based approaches (in a simple form: anything that is capitalized is a proper noun) Machine learning approaches (mark proper nouns in a training corpus and train a classifier) The field is mostly called named-entity extraction and often considered a subfield of information extraction. A good starting point for the different fields of NLP is usually the according chapter in the Oxford Handbook of Computational Linguistics.
Ah, thanks for the "named-entity extraction" term. Sometimes figuring out the correct terms is the hardest part when you're just starting to learn about something. – VirtuosiMedia Mar 4 '09 at 1:08.
Try searching for "named entity recognition"--that's the term that's used in the NLP literature for this sort of thing.
It depends on what you mean by dictionary-based. For example, one strategy would be to take things that aren't in a dictionary and try to proceed on the assumption that they're proper nouns. If this leads to a sensible parse, consider the assumption provisionally validated and keep going, otherwise conclude that they aren't.
Other ideas: In subject position, any simple subject without a determiner is a good candidate. Ditto in prepositional phrases In any position, the basis of a possessive determiner (e.g. Bob in "Bob's sister") is a good candidate -- MarkusQ.
Interesting idea about things that aren't in the dictionary. – VirtuosiMedia Mar 4 '09 at 1:15.
I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.