WebScraping with BeautifulSoup or LXML.HTML?

I know you said you can't use lxml. Html But here is how to do it using that library, because it is very good library. So I provide the code using it, for completeness, since I don't use BeautifulSoup anymore -- it's unmaintained, slow and has ugly API The code below parses the page and writes the results in a csv file import lxml.

Html import csv doc = lxml.html. Parse('finance.yahoo.com/q/os?s=lly&m=2011-04-15') # find the first table contaning any tr with a td with class yfnc_tabledata1 table = doc. Xpath("//tabletr/td@class='yfnc_tabledata1'")0 with open('results.

Csv', 'wb') as f: cf = csv. Writer(f) # find all trs inside that table: for tr in table. Xpath('.

/tr'): # add the text of all tds inside each tr to a list row = td. Text_content().strip() for td in tr. Xpath('.

/td') # write the list to the csv file: cf. Writerow(row) That's it! Lxml.

Html is so simple and nice! Too bad you can't use it Here's some lines from the results. Csv file that was generated: LLY110416C00017500,N/A,0.00,17.05,18.45,0,0,17.50,LLY110416P00017500,0.01,0.00,N/A,0.03,0,182 LLY110416C00020000,15.70,0.00,14.55,15.85,0,0,20.00,LLY110416P00020000,0.06,0.00,N/A,0.03,0,439 LLY110416C00022500,N/A,0.00,12.15,12.80,0,0,22.50,LLY110416P00022500,0.01,0.00,N/A,0.03,2,50.

I know you said you can't use lxml.html. But here is how to do it using that library, because it is very good library. So I provide the code using it, for completeness, since I don't use BeautifulSoup anymore -- it's unmaintained, slow and has ugly API.

The code below parses the page and writes the results in a csv file. Import lxml. Html import csv doc = lxml.html.

Parse('finance.yahoo.com/q/os?s=lly&m=2011-04-15') # find the first table contaning any tr with a td with class yfnc_tabledata1 table = doc. Xpath("//tabletr/td@class='yfnc_tabledata1'")0 with open('results. Csv', 'wb') as f: cf = csv.

Writer(f) # find all trs inside that table: for tr in table. Xpath('. /tr'): # add the text of all tds inside each tr to a list row = td.

Text_content().strip() for td in tr. Xpath('. /td') # write the list to the csv file: cf.

Writerow(row) That's it! Lxml. Html is so simple and nice!

Too bad you can't use it. Here's some lines from the results. Csv file that was generated: LLY110416C00017500,N/A,0.00,17.05,18.45,0,0,17.50,LLY110416P00017500,0.01,0.00,N/A,0.03,0,182 LLY110416C00020000,15.70,0.00,14.55,15.85,0,0,20.00,LLY110416P00020000,0.06,0.00,N/A,0.03,0,439 LLY110416C00022500,N/A,0.00,12.15,12.80,0,0,22.50,LLY110416P00022500,0.01,0.00,N/A,0.03,2,50.

And instead of CSV use a list as container. – Merlin Mar 31 at 0:38 1 +1 thanks for the example, I'd only used BeautifulSoup before and hadn't realized even its own maintainer recommends moving on to other libraries – A Lee Mar 31 at 1:02 Can you explain last 2 lines of code: – Merlin Mar 31 at 2:43 @user428862: I have added comments to the code explaining how it works. – nosklo Mar 31 at 3:56.

BeautifulSoup is not maintained anymore and lxml is fantastic but relies on compiling C libraries, which isn't always practical/possible. Another options is this webscraping library.

So I provide the code using it, for completeness, since I don't use BeautifulSoup anymore -- it's unmaintained, slow and has ugly API. The code below parses the page and writes the results in a csv file. Html is so simple and nice!

I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.

Related Questions