You should provide a string to BeautifulSoup : parse bookmarks. Html with open(sys. Argv1) as bookmark_file: soup = BeautifulSoup(bookmark_file.read()) # extract youtube video urls video_url_regex = re.
Compile('youtube.com/watch') urls = link'href' for link in soup('a', href=video_url_regex) Separate a very fast url parsing from a much longer downloading of the stats: extract video ids from the urls ids = # you could use `set()` and `ids.add()` to avoid duplicates for video_url in urls: url = urlparse. Urlparse(video_url) video_id = urlparse. Parse_qs(url.
Query). Get('v') if not video_id: continue # no video_id in the url ids. Append(video_id0) You don't need to authenticate for readonly requests: get some statistics for the videos yt_service = YouTubeService() yt_service.
Ssl = True #NOTE: it works for readonly requests yt_service. Debug = True # show requests Save some statistics to a csv file provided on the command-line. Don't stop if some video causes an error: writer = csv.
Writer(open(sys. Argv2, 'wb')) # save to cvs file for video_id in ids: try: entry = yt_service. GetYouTubeVideoEntry(video_id=video_id) except Exception, e: print >>sys.
Stderr, "Failed to retrieve entry video_id=%s: %s" %( video_id, e) else: title = entry.media.title. Text print "Title:", title view_count = entry.statistics. View_count print "View count:", view_count writer.
Writerow((video_id, title, view_count)) # write it Here's a full script press playback to watch how it was written Output $ python download-video-stats. Py neudorfer. Html out.
Csv send: u'GET https://gdata.youtube. Com/feeds/api/videos/Gg81zi0pheg HTTP/1.1\r\nAcc ept-Encoding: identity\r\nHost: gdata.youtube. Com\r\nContent-Type: application/ato m+xml\r\nUser-Agent: None GData-Python/2.0.15\r\n\r\n' reply: 'HTTP/1.1 200 OK\r\n' header: X-GData-User-Country: RU header: Content-Type: application/atom+xml; charset=UTF-8 header: Expires: Thu, 10 Nov 2011 19:31:23 GMT header: Date: Thu, 10 Nov 2011 19:31:23 GMT header: Cache-Control: private, max-age=300, no-transform header: Vary: * header: GData-Version: 1.0 header: Last-Modified: Wed, 02 Nov 2011 08:58:11 GMT header: Transfer-Encoding: chunked header: X-Content-Type-Options: nosniff header: X-Frame-Options: SAMEORIGIN header: X-XSS-Protection: 1; mode=block header: Server: GSE Title: Paramore - Let The Flames Begin Wal-Mart Soundcheck View count: 27807 out.
Csv Gg81zi0pheg,Paramore - Let The Flames Begin Wal-Mart Soundcheck,27807 pP9VjGmmhfo,Paramore: Wal-Mart Soundcheck,1363078 yTA1u6D1fyE,Paramore-Walmart Soundcheck 7-CrushCrushCrush(HQ),843 4v8HvQf4fgE,Paramore-Walmart Soundcheck 4-That's What You Get(HQ),1429 e9zG20wQQ1U,Paramore-Walmart Soundcheck 8-Interview(HQ),1306 khL4s2bvn-8,Paramore-Walmart Soundcheck 3-Emergency(HQ),796 XTndQ7bYV0A,Paramore-Walmart Soundcheck 6-For a pessimist(HQ),599 xTT2MqgWRRc,Paramore-Walmart Soundcheck 5-Pressure(HQ),963 J2ZYQngwSUw,Paramore - Wal-Mart Soundcheck Interview,10261 9RZwvg7unrU,Paramore - 08 - Interview Wal-Mart Soundcheck,1674 vz3qOYWwm10,Paramore - 04 - That's What You Get Wal-Mart Soundcheck,1268 yarv52QX_Yw,Paramore - 05 - Pressure Wal-Mart Soundcheck,1296 LRREY1H3GCI,Paramore - Walmart Promo,523.
You should provide a string to BeautifulSoup: # parse bookmarks. Html with open(sys. Argv1) as bookmark_file: soup = BeautifulSoup(bookmark_file.read()) # extract youtube video urls video_url_regex = re.
Compile('youtube.com/watch') urls = link'href' for link in soup('a', href=video_url_regex) Separate a very fast url parsing from a much longer downloading of the stats: # extract video ids from the urls ids = # you could use `set()` and `ids.add()` to avoid duplicates for video_url in urls: url = urlparse. Urlparse(video_url) video_id = urlparse. Parse_qs(url.
Query). Get('v') if not video_id: continue # no video_id in the url ids. Append(video_id0) You don't need to authenticate for readonly requests: # get some statistics for the videos yt_service = YouTubeService() yt_service.
Ssl = True #NOTE: it works for readonly requests yt_service. Debug = True # show requests Save some statistics to a csv file provided on the command-line. Don't stop if some video causes an error: writer = csv.
Writer(open(sys. Argv2, 'wb')) # save to cvs file for video_id in ids: try: entry = yt_service. GetYouTubeVideoEntry(video_id=video_id) except Exception, e: print >>sys.
Stderr, "Failed to retrieve entry video_id=%s: %s" %( video_id, e) else: title = entry.media.title. Text print "Title:", title view_count = entry.statistics. View_count print "View count:", view_count writer.
Writerow((video_id, title, view_count)) # write it Here's a full script, press playback to watch how it was written. Output $ python download-video-stats. Py neudorfer.
Html out. Csv send: u'GET https://gdata.youtube. Com/feeds/api/videos/Gg81zi0pheg HTTP/1.1\r\nAcc ept-Encoding: identity\r\nHost: gdata.youtube.Com\r\nContent-Type: application/ato m+xml\r\nUser-Agent: None GData-Python/2.0.15\r\n\r\n' reply: 'HTTP/1.1 200 OK\r\n' header: X-GData-User-Country: RU header: Content-Type: application/atom+xml; charset=UTF-8 header: Expires: Thu, 10 Nov 2011 19:31:23 GMT header: Date: Thu, 10 Nov 2011 19:31:23 GMT header: Cache-Control: private, max-age=300, no-transform header: Vary: * header: GData-Version: 1.0 header: Last-Modified: Wed, 02 Nov 2011 08:58:11 GMT header: Transfer-Encoding: chunked header: X-Content-Type-Options: nosniff header: X-Frame-Options: SAMEORIGIN header: X-XSS-Protection: 1; mode=block header: Server: GSE Title: Paramore - Let The Flames Begin Wal-Mart Soundcheck View count: 27807 out.
Csv Gg81zi0pheg,Paramore - Let The Flames Begin Wal-Mart Soundcheck,27807 pP9VjGmmhfo,Paramore: Wal-Mart Soundcheck,1363078 yTA1u6D1fyE,Paramore-Walmart Soundcheck 7-CrushCrushCrush(HQ),843 4v8HvQf4fgE,Paramore-Walmart Soundcheck 4-That's What You Get(HQ),1429 e9zG20wQQ1U,Paramore-Walmart Soundcheck 8-Interview(HQ),1306 khL4s2bvn-8,Paramore-Walmart Soundcheck 3-Emergency(HQ),796 XTndQ7bYV0A,Paramore-Walmart Soundcheck 6-For a pessimist(HQ),599 xTT2MqgWRRc,Paramore-Walmart Soundcheck 5-Pressure(HQ),963 J2ZYQngwSUw,Paramore - Wal-Mart Soundcheck Interview,10261 9RZwvg7unrU,Paramore - 08 - Interview Wal-Mart Soundcheck,1674 vz3qOYWwm10,Paramore - 04 - That's What You Get Wal-Mart Soundcheck,1268 yarv52QX_Yw,Paramore - 05 - Pressure Wal-Mart Soundcheck,1296 LRREY1H3GCI,Paramore - Walmart Promo,523.
You are a scholar and a gentleman. Thank you Sir Sebastian. – David Neudorfer Nov 10 at 21:53 code.google.Com/p/python-youtube-datascraper/source/browse/… – David Neudorfer Nov 14 at 18:30 1 @David Neudorfer: Don't copy-paste mindlessly.
Comments that make sense in the context of the question for one-off script are not appropriate as a part of a project source. Try to read the code and understand what every line does e.g. , dir() is usually used at interactive prompt or for a quick debugging; it should not be in the code. – J.F. Sebastian Nov 14 at 18:42 Thanks for the heads up Sebastian.
I'll see what I can do to correct that. – David Neudorfer Nov 14 at 18:50 Sebastian, do you do for hire work? I can't find a website or email on your profile.
– David Neudorfer Nov 101 at 0:46.
Also, you may want to break it up into methods to make it easier to understand later. Also, for csv you're going to want to accumulate your data.So, maybe have a list and each time through append a csv line of youtube info: myyoutubes = ... myyoutubes. Append(", ".
Join(youtubeid, entry.media.title. Text,entry.statistics. View_count)) ... "\n".
Join(myyoutubes) For duplicates, I normally do this: list(set(my_list)) Sets only have unique elements.
Oneporter, what I have so far took me about 12 hours. Any help beyond what you've already done would be amazing. (obviously, thank you for pointing me in the right direction) :) – David Neudorfer Nov 10 at 17:39 David, I mean you can combine bookmarks.Py and video_info.py.
Use code from bookmarks. Py to grab all the urls. Then loop over the urls with the body of the loop being video_info.py.
So, for youtubeurl in urls: – oneporter Nov 10 at 18:38 Before I can pass the cleaned urls from bookmarks. Py to video_info. Py they need to be cleaned up slightly more.
I've asked that question here: stackoverflow. Com/questions/8084935/… – David Neudorfer Nov 10 at 19:10 I have no idea how to loop :( – David Neudorfer Nov 10 at 19:13 No worries.. You're shown how to loop in the link you provided. Check this out docs.python.
Org/tutorial/controlflow. Html#for-statements – oneporter Nov 10 at 19:17.
I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.