Trouble parsing HTML using BeautifulSoup?

Version 3.1.0 performs significantly worse with real-world HTML (read: invalid HTML) than 3.0.8. This code works with 3.0.8: import urllib2 from BeautifulSoup import BeautifulSoup page = urllib2. Urlopen("harvardfml.com/") soup = BeautifulSoup(page) for incident in soup.

FindAll('span', { "class" : "quote" }): print incident.contents.

That site is powered by Tumblr. Tumblr has an API.

Thank you so much. I just hope using the API doesn't count as cheating... – LBR Feb 9 at 4:27.

There's a python port of Tumblr that you can use to read documents. From tumblr import Api api = Api('harvardfml. Com') freq = {} posts = api.read() for post in posts: #do something here for your bogus findAll, without the actual source code of your program it is hard to see what is wrong.

I'm trying to use BeautifulSoup to parse some HTML in Python. Specifically, I'm trying to create two arrays of soup objects: one for the dates of postings on a website, and one for the postings themselves. However, when I use findAll on the div class that matches the postings, only the initial tag is returned, not the text inside the tag.

I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.

Related Questions