WebScraping with BeautifulSoup or LXML.HTML?

I know you said you can't use lxml. Html But here is how to do it using that library, because it is very good library. So I provide the code using it, for completeness, since I don't use BeautifulSoup anymore -- it's unmaintained, slow and has ugly API The code below parses the page and writes the results in a csv file import lxml.

Html import csv doc = lxml.html. Parse('finance.yahoo.com/q/os?s=lly&m=2011-04-15') # find the first table contaning any tr with a td with class yfnc_tabledata1 table = doc. Xpath("//tabletr/td@class='yfnc_tabledata1'")0 with open('results.

Csv', 'wb') as f: cf = csv. Writer(f) # find all trs inside that table: for tr in table. Xpath('.

/tr'): # add the text of all tds inside each tr to a list row = td. Text_content().strip() for td in tr. Xpath('.

/td') # write the list to the csv file: cf. Writerow(row) That's it! Lxml.

Html is so simple and nice! Too bad you can't use it Here's some lines from the results. Csv file that was generated: LLY110416C00017500,N/A,0.00,17.05,18.45,0,0,17.50,LLY110416P00017500,0.01,0.00,N/A,0.03,0,182 LLY110416C00020000,15.70,0.00,14.55,15.85,0,0,20.00,LLY110416P00020000,0.06,0.00,N/A,0.03,0,439 LLY110416C00022500,N/A,0.00,12.15,12.80,0,0,22.50,LLY110416P00022500,0.01,0.00,N/A,0.03,2,50.

I know you said you can't use lxml.html. But here is how to do it using that library, because it is very good library. So I provide the code using it, for completeness, since I don't use BeautifulSoup anymore -- it's unmaintained, slow and has ugly API.

The code below parses the page and writes the results in a csv file. Import lxml. Html import csv doc = lxml.html.

Parse('finance.yahoo.com/q/os?s=lly&m=2011-04-15') # find the first table contaning any tr with a td with class yfnc_tabledata1 table = doc. Xpath("//tabletr/td@class='yfnc_tabledata1'")0 with open('results. Csv', 'wb') as f: cf = csv.

Writer(f) # find all trs inside that table: for tr in table. Xpath('. /tr'): # add the text of all tds inside each tr to a list row = td.

Text_content().strip() for td in tr. Xpath('. /td') # write the list to the csv file: cf.

Writerow(row) That's it! Lxml. Html is so simple and nice!

Too bad you can't use it. Here's some lines from the results. Csv file that was generated: LLY110416C00017500,N/A,0.00,17.05,18.45,0,0,17.50,LLY110416P00017500,0.01,0.00,N/A,0.03,0,182 LLY110416C00020000,15.70,0.00,14.55,15.85,0,0,20.00,LLY110416P00020000,0.06,0.00,N/A,0.03,0,439 LLY110416C00022500,N/A,0.00,12.15,12.80,0,0,22.50,LLY110416P00022500,0.01,0.00,N/A,0.03,2,50.

And instead of CSV use a list as container. – Merlin Mar 31 at 0:38 1 +1 thanks for the example, I'd only used BeautifulSoup before and hadn't realized even its own maintainer recommends moving on to other libraries – A Lee Mar 31 at 1:02 Can you explain last 2 lines of code: – Merlin Mar 31 at 2:43 @user428862: I have added comments to the code explaining how it works. – nosklo Mar 31 at 3:56.

BeautifulSoup is not maintained anymore and lxml is fantastic but relies on compiling C libraries, which isn't always practical/possible. Another options is this webscraping library.

So I provide the code using it, for completeness, since I don't use BeautifulSoup anymore -- it's unmaintained, slow and has ugly API. The code below parses the page and writes the results in a csv file. Html is so simple and nice!

I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.

WebScraping with BeautifulSoup or LXML.HTML?

Related Questions

Python, BeautifulSoup or LXML - Parsing image URL's from HTML using CSS tags?

Please help parse this html table using BeautifulSoup and lxml the pythonic way?

Locate element using lxml.html vs BeautifulSoup?

Lxml equivalent to BeautifulSoup “OR” syntax?

Python BeautifulSoup equivalent to lxml make_links_absolute?

How to use BeautifulSoup and lxml together?