Python, BeautifulSoup or LXML - Parsing image URL's from HTML using CSS tags?

Using lxml, you might do something like this: import feedparser import lxml. Html as lh import urllib2 #Import Feed for Parsing d = feedparser. Parse("feeds.boston.com/boston/bigpicture/index") # Print feed name print d'feed''title' # Determine number of posts and set range maximum posts = len(d'entries') # Collect Post URLs for post in d'entries': link=post'link' print('Parsing {0}'.

Format(link)) doc=lh. Parse(urllib2. Urlopen(link)) imgs=doc.

Xpath('//img@class="bpImage"') for img in imgs: print(img. Attrib'src').

This is perfect. Thank you very much. – tyebud Nov 23 '10 at 17:28.

The code you have posted looks for all a elements with the bpImage class. But your example has the bpImage class on the img element, not the a. You just need to do: soup.

Find("img", { "class" : "bpImage" }).

Haha. Of course. So that returns the url with tags.Is there some way to strip those down to just the url?

– tyebud Nov 23 '10 at 17:10.

Using pyparsing to search for tags is fairly intuitive: from pyparsing import makeHTMLTags, withAttribute imgTag,notused = makeHTMLTags('img') # only retrieve tags with class='bpImage' imgTag. SetParseAction(withAttribute(**{'class':'bpImage'})) for img in imgTag. SearchString(html): print img.src.

I have searched high and low for a decent explanation of how BeautifulSoup or LXML work. Granted, their documentation is great, but for someone like myself, a python/programming novice, it is difficult to decipher what I am looking for. Anyways, as my first project, I am using Python to parse an RSS feed for post links - I have accomplished this with Feedparser.

My plan is to then scrape each posts' images. For the life of me though, I can not figure out how to get either BeautifulSoup or LXML to do what I want! I have spent hours reading through the documentation and googling to no avail, so I am here.

The following is a line from the Big Picture (my scrapee). To find all instances with that css class. Well, it doesn't return anything.

I'm sure I'm overlooking something trivial so I greatly appreciate your patience. Thank you very much for your responses!

I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.

Python, BeautifulSoup or LXML - Parsing image URL's from HTML using CSS tags?

Related Questions

Remove all javascript tags and style tags from html with python and the lxml module?

Parsing a document with BeautifulSoup while not-parsing the contents of tags?

How can I prevent closing of tags in bad HTML using BeautifulSoup (python)?

How can I remove spaces in between HTML tags using BeautifulSoup in Python?

Not doing something right with my tags my tags are red your tags and everybody tags are white?

Please help parse this html table using BeautifulSoup and lxml the pythonic way?