Debugging the implementation of Baum Welch Algorithm (Application type - POS tagging)?

Unsupervised POS tagging is a very interesting emerging research topic. If I understand correctly, you are actually asking how to evaluate your tagging accuracy, not how to debug the code. Evaluation is a known issue in unsupervised POS induction.

The short answer to your question is: get this annotated corpus from NLTK then map your states to the corpus tags by mapping a state to the tag it most often co-occurs with, and find the percentage of correct ones. This evaluation procedure is called Many-to-one mapping.

Unsupervised POS tagging is a very interesting emerging research topic. If I understand correctly, you are actually asking how to evaluate your tagging accuracy, not how to debug the code. Evaluation is a known issue in unsupervised POS induction.

The short answer to your question is: get this annotated corpus from NLTK, then map your states to the corpus tags by mapping a state to the tag it most often co-occurs with, and find the percentage of correct ones. This evaluation procedure is called Many-to-one mapping. You should make yourself familiar with the literature, as it will answer your questions and more.

Here are some places to start: An early paper: Mark Johnson. 2007. Why doesn’t EM find good HMM POS-taggers?

In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pp. 296–305. A survey paper: Christos Christodoulopoulos, Sharon Goldwater and Mark Steedman.2010.

Two Decades of Unsupervised POS induction: How far have we come? In Proceedings of EMNLP 2010. When you say "unsupervised", you should ask yourself whether you want to use only raw text, or also want to use a dictionary, for example.

There are works on that, too. Also, there is code available out there for the task. Another place to ask about NLP is: http://metaoptimize.com/qa .

If you have other questions, don't hesitate to ask.

This would also be an excellent question/answer for bit. Ly/u4lZUG! – eowl yesterday HI, First of all thank you for your answer and the references I am yet not at the Evaluation stage.

Right now what I am more concerned about is the accuracy of my implementation techniques. From my previous experience in implementing NLP problems, I learned that the smallest bug may lead to a different output (may be because of my bad coding style) In this particular case, I do not have any sample checkpoints to match with, All I have is the Wall street journal corpora (labeled and unlabeled) & my experiment goal is to learn some unsupervised labeling with different parameter configuration. – Irtiza yesterday First, try to achieve 60% accuracy with many-to-one mapping.

– cyborg yesterday Thanks :) .. thats a great idea :) .. – Irtiza yesterday.

I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.

Related Questions