Using lxml, you might do something like this: import feedparser import lxml. Html as lh import urllib2 #Import Feed for Parsing d = feedparser. Parse("feeds.boston.com/boston/bigpicture/index") # Print feed name print d'feed''title' # Determine number of posts and set range maximum posts = len(d'entries') # Collect Post URLs for post in d'entries': link=post'link' print('Parsing {0}'.
Format(link)) doc=lh. Parse(urllib2. Urlopen(link)) imgs=doc.
Xpath('//img@class="bpImage"') for img in imgs: print(img. Attrib'src').
This is perfect. Thank you very much. – tyebud Nov 23 '10 at 17:28.
The code you have posted looks for all a elements with the bpImage class. But your example has the bpImage class on the img element, not the a. You just need to do: soup.
Find("img", { "class" : "bpImage" }).
Haha. Of course. So that returns the url with tags.Is there some way to strip those down to just the url?
– tyebud Nov 23 '10 at 17:10.
Using pyparsing to search for tags is fairly intuitive: from pyparsing import makeHTMLTags, withAttribute imgTag,notused = makeHTMLTags('img') # only retrieve tags with class='bpImage' imgTag. SetParseAction(withAttribute(**{'class':'bpImage'})) for img in imgTag. SearchString(html): print img.src.
I have searched high and low for a decent explanation of how BeautifulSoup or LXML work. Granted, their documentation is great, but for someone like myself, a python/programming novice, it is difficult to decipher what I am looking for. Anyways, as my first project, I am using Python to parse an RSS feed for post links - I have accomplished this with Feedparser.
My plan is to then scrape each posts' images. For the life of me though, I can not figure out how to get either BeautifulSoup or LXML to do what I want! I have spent hours reading through the documentation and googling to no avail, so I am here.
The following is a line from the Big Picture (my scrapee). To find all instances with that css class. Well, it doesn't return anything.
I'm sure I'm overlooking something trivial so I greatly appreciate your patience. Thank you very much for your responses!
I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.