You can use a regular expression in an lxml Xpath, by using EXSLT syntax For example, given your document, this will select the parent node whose text matches the regexp spe. *al : import re import lxml. Html NS = 'exslt.org/regular-expressions' tree = lxml.html.
Fromstring(DOC) # select sibling table nodes after matching node path = "//*re:test(text(), 'spe. *al')/following-sibling::table" print tree. Xpath(path, namespaces={'re': NS}) # select all sibling nodes after matching node path = "//*re:test(text(), 'spe.
*al')/following-sibling::*" print tree. Xpath(path, namespaces={'re': NS}) Output: , .
You can use a regular expression in an lxml Xpath, by using EXSLT syntax. For example, given your document, this will select the parent node whose text matches the regexp spe. *al: import re import lxml.
Html NS = 'exslt.org/regular-expressions' tree = lxml.html. Fromstring(DOC) # select sibling table nodes after matching node path = "//*re:test(text(), 'spe. *al')/following-sibling::table" print tree.
Xpath(path, namespaces={'re': NS}) # select all sibling nodes after matching node path = "//*re:test(text(), 'spe. *al')/following-sibling::*" print tree. Xpath(path, namespaces={'re': NS}) Output: , .
Thanks, but what I'm looking for is not the parent node of the matched text (h2 here); it's (in this example, a sibling, but more generally) an element following that node. – dimitrijoe Apr 23 at 15:57 1 You should be able to use xpath axes to select exactly what you're looking for. I've updated to select the table node, but you can generalize the path however you need.
– samplebias Apr 23 at 16:06 samplebias gives a good answer here. Xpath is extremely powerful; much more powerful than the tools BS gives (although you can use the BS parser in lxml). BS shines with extremely broken HTML, but for ordinary cases lxml is hands-down the more flexible (with downside of largish binary dependency).
– Ryan Ginstrom Apr 23 at 17:29.
I'm scraping an html document using lxml. Html; there's one thing I can do in BeautifulSoup, but don't manage to do with lxml.htm. I tried this with cssselect, but no success.
Any ideas on how I could locate this using the methods in lxml.
I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.