Use less threads: ThreadPoolExecutor example. Install futures on Python 2. X try asynchronous approach: gevent example twisted example.
This. If resource management is an issue, just have a thread pool and tune the pool limit. – Giacomo Lacava yesterday.
Using "one thread per request" is OK and easy for many use-cases. However, it will require a lot of ressources (as you experienced). A better approach is to use an asynchronuous one, but unfortunately it is a lot more complex.
Some hints into this direction: asyncore Twisted.
Thanks, much appreciated. I read about Twisted before, but sadly I don't know much about it and by the looks of it I wouldn't be able to use mechanize with it. I'll take a look if I could make mechanize work with asyncore.
– Andrew yesterday After all, a "perfect" solution would be a mix of thread pools with one thread per CPU core (to utilize them for processing tasks) and asynchronuous IO. A practical solution will depend on your actual application code. Maybe, even a simple solution based on select will do it for you.
– Frunsi yesterday 1 This means: in your thread: send a bunch of requests, then enter a loop which will select on the appropriate sockets, and handle any incoming data one by one... and so on. After all, the OS cares about socket IO anyway, your task is to interface with the OS in the most efficient way possible. – Frunsi yesterday Thing is, the code I have is quite simple really.
Each subclass is rather same, with just different URLs, different names, values, etc and occasionally some different way of processing of the data. They do not depend on each other at all. All I want is to run them concurrently, wait for them to complete the work and then exit.
All solutions I read are for more complex things I think. I can't believe someone haven't developed a module for simple asynchronous/threaded execution of classes or functions that do not depend on each other at all. – Andrew yesterday @Andrew: All the required code & framework exists, you just have to use it now ;) – Frunsi yesterday.
I'm no expert on Python, but maybe have a few thread pools which control the total number of active threads, and hands off a 'request' to a thread once it's done with the previous thread. The request doesn't have to be the full thread object, just enough data to complete whatever the request is. You could also structure it so you have thread pool A with N threads pinging the website, once the data is retrieved, hand it off the data to thread pool B with Y threads crunching the data.
The solution is to replace code like this: 1) Do something. 2) Wait for something to happen. 3) Do something else.
With code like this: 1) Do something. 2) Arrange it so that when something happens, something else gets done. 3) Done.
Somewhere else, you have a few thread that do this: 1) Wait for anything to happen. 2) Handle whatever happened. 3) Go to step 1.In the first case, if you're waiting for 50 things to happen, you have 50 threads sitting around waiting for 50 things to happen.
In the second case, you have one thread waiting around that will do whichever of those 50 things need to get done. So, don't use a thread to wait for a single thing to happen. Instead, arrange it so that when that thing happens, some other thread will do whatever needs to get done next.
I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.