Beautiful Soup cannot find a CSS class if the object has other classes, too?

Unfortunately, BeautifulSoup treats this as a class with a space in it class1 class2 rather than two classes 'class1','class2' A workaround is to use a regular expression to search for the class instead of a string This works: soup. FindAll(True, {'class': re. Compile(r'\bclass1\b')}).

Unfortunately, BeautifulSoup treats this as a class with a space in it 'class1 class2' rather than two classes 'class1','class2'. A workaround is to use a regular expression to search for the class instead of a string. This works: soup.

FindAll(True, {'class': re. Compile(r'\bclass1\b')}).

1 bugs.launchpad. Net/bugs/410304 – endolith Aug 7 '09 at 14:57.

You should use lxml. It works with multiple class values separated by spaces ('class1 class2'). Despite its name, lxml is also for parsing and scraping HTML.It's much, much faster than BeautifulSoup, and it even handles "broken" HTML better than BeautifulSoup (their claim to fame).

It has a compatibility API for BeautifulSoup too if you don't want to learn the lxml API. Ian Bicking agrees and prefers lxml over BeautifulSoup. There's no reason to use BeautifulSoup anymore, unless you're on Google App Engine or something where anything not purely Python isn't allowed.

You can even use CSS selectors with lxml, so it's far easier to use than BeautifulSoup. Try playing around with it in an interactive Python console.

1 From lxml's own documentation: "While libxml2 (and thus lxml) can also parse broken HTML, BeautifulSoup is a bit more forgiving and has superiour support for encoding detection. " – endolith Aug 10 '09 at 18:41 I've tried it and it is indeed nicer for this sort of thing. – endolith Aug 12 '09 at 19:37 Glad you like it.

Hope you'll spread the word too, lxml is an under-appreciated library. I think many overlook it since it has 'XML' in the name and its documentation isn't as nice as BeautifulSoup's. BS has a charm to it with the name and graphics, which makes it a little more attractive for superficial reasons.

– aehlke Aug 12 '09 at 20:12 Yes, it isn't marketed as a scraper and I don't see enough examples of this kind of stuff in the docs. – endolith Aug 15 '09 at 18:19.

It works with multiple class values separated by spaces ('class1 class2'). Despite its name, lxml is also for parsing and scraping HTML. It's much, much faster than BeautifulSoup, and it even handles "broken" HTML better than BeautifulSoup (their claim to fame).

It has a compatibility API for BeautifulSoup too if you don't want to learn the lxml API. Ian Bicking agrees and prefers lxml over BeautifulSoup. There's no reason to use BeautifulSoup anymore, unless you're on Google App Engine or something where anything not purely Python isn't allowed.

You can even use CSS selectors with lxml, so it's far easier to use than BeautifulSoup. Try playing around with it in an interactive Python console.

I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.

Related Questions