Parsing HTML with Python 2.7 - HTMLParser, SGMLParser, or Beautiful Soup?

I am using and would recommend lxml and pyquery for parsing HTML. I had to write a web scraping bot a few month ago and of all the popular alternatives I tried, including HTMLParser and BeautifulSoup I went with lxml and the syntax sugar of pyquery . I haven't tried SGMLParser though.

BeautifulSoup in particular is for dirty HTML as found in the wild. It will parse any old thing, but is slow.

I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.

Related Questions