Python Process blocked by urllib2?

The issue here is not urllib2, but the use of the multiprocessing module. When using the multiprocessing module under Windows, you must not use code that runs immediately when importing your module - instead, put things in the main module inside a if __name__=='__main__' block. See section "Safe importing of main module" here .

Up vote 5 down vote favorite 2 share g+ share fb share tw.

I set up a process that read a queue for incoming urls to download but when urllib2 open a connection the system hangs. Import urllib2, multiprocessing from threading import Thread from Queue import Queue from multiprocessing import Queue as ProcessQueue, Process def download(url): """Download a page from an url. Url str: url to get.

Return unicode: page downloaded. """ if settings. DEBUG: print u'Downloading %s' % url request = urllib2.

Request(url) response = urllib2. Urlopen(request) encoding = response. Headers'content-type'.

Split('charset=')-1 content = unicode(response.read(), encoding) return content def downloader(url_queue, page_queue): def _downloader(url_queue, page_queue): while True: try: url = url_queue.get() page_queue. Put_nowait({'url': url, 'page': download(url)}) except Exception, err: print u'Error downloading %s' % url raise err finally: url_queue. Task_done() ## Init internal workers internal_url_queue = Queue() internal_page_queue = Queue() for num in range(multiprocessing.

Cpu_count()): worker = Thread(target=_downloader, args=(internal_url_queue, internal_page_queue)) worker. SetDaemon(True) worker.start() # Loop waiting closing for url in iter(url_queue. Get, 'STOP'): internal_url_queue.

Put(url) # Wait for closing internal_url_queue.join() # Init the queues url_queue = ProcessQueue() page_queue = ProcessQueue() # Init the process download_worker = Process(target=downloader, args=(url_queue, page_queue)) download_worker.start() From another module I can add urls and when I want I can stop the process and wait the process closing. Import module module. Url_queue.

Put('http://foobar1') module. Url_queue. Put('http://foobar2') module.

Url_queue. Put('http://foobar3') module. Url_queue.

Put('STOP') downloader. Download_worker.join() The problem is that when I use urlopen ("response = urllib2. Urlopen(request)") it remain all blocked.

There are no problem if I call the download() function or when I use only threads without Process. Python multithreading urllib2 multiprocess link|improve this question edited Jan 26 '10 at 2:37 asked Jan 26 '10 at 2:31Davmuz1316.

The issue here is not urllib2, but the use of the multiprocessing module. When using the multiprocessing module under Windows, you must not use code that runs immediately when importing your module - instead, put things in the main module inside a if __name__=='__main__' block. See section "Safe importing of main module" here.

For your code, make this change following in the downloader module: #.... def start(): global download_worker download_worker = Process(target=downloader, args=(url_queue, page_queue)) download_worker.start() And in the main module: import module if __name__=='__main__': module.start() module. Url_queue. Put('http://foobar1') #.... Because you didn't do this, each time the subprocess was started it would run the main code again and start another process, causing the hang.

I do not use Windows but your suggestion to use a start() function fix the problem. Thanks! – Davmuz Jan 26 '10 at 16:07.

I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.

Related Questions