That looks reasonable. I've found BlockingCollection to be quite fast. I use it to process tens of thousands of requests per second If your application is processor bound, then you probably don't want to create more workers than you have cores.
Certainly you don't want to create a lot more workers than cores. On a quad core machine, if you expect most of the time to be spent doing the FFTs, then four workers will eat all the CPU. More workers just means more that you have thread context switches to deal with.
The TPL will typically balance that for you, but there's no reason to create, say, 100 workers when you can't handle more than a handful I would suggest that you run tests with 3, 4, 5, 6, 7, and 8 workers. See which one gives you the best throughput.
That looks reasonable. I've found BlockingCollection to be quite fast. I use it to process tens of thousands of requests per second.
If your application is processor bound, then you probably don't want to create more workers than you have cores. Certainly you don't want to create a lot more workers than cores. On a quad core machine, if you expect most of the time to be spent doing the FFTs, then four workers will eat all the CPU.
More workers just means more that you have thread context switches to deal with. The TPL will typically balance that for you, but there's no reason to create, say, 100 workers when you can't handle more than a handful. I would suggest that you run tests with 3, 4, 5, 6, 7, and 8 workers.
See which one gives you the best throughput.
4 gives me the best - although theres not much in it. I'm going to stick with this, and make the number configurable, defaulting to Env. ProcessorCount if not set.
– Matt Roberts Jun 1 at 20:36.
I agree with Jim. Your approach looks really good. You are not going to get much better this.
I am not an FFT expert, but I am assuming these operations are nearly 100% CPU bound. If that is indeed the case then a good first guess at the number of workers would be a direct 1-to-1 correlation with the number of cores in the machine. You can use Environment.
ProcessorCount to get this value. You could experiment with a multiplier of say 2x or 4x, but again, if these operations are CPU bound then anything higher than 1x might just cause more overhead. Using Environment.
ProcessorCount would make your code more portable. Another suggestion...let the TPL know that these are dedicated threads. You can do this by specifying the LongRunning option.
Public IncomingPacketQueue() { for (int I = 0; I LongRunning); } }.
I agree, but you might also want to ignore cores from HyperThreading, just consider real cores. – Chris O Jun 1 at 17:16 Good tip with the Env.ProcessorCount... that should work well for me. – Matt Roberts Jun 1 at 20:26.
Why not use Parallel. ForEach and let TPL handle the number of threads created. Parallel.
ForEach(BlockingCollectionExtensions. GetConsumingPartitioneenter(_packetQ), sweep => { //do stuff var worker = new IfftWorker(); Trace. WriteLine(" Thread {0} picking up a pending ifft".
With(Thread.CurrentThread. ManagedThreadId)); worker. DoIfft(sweep); }); (the GetConsumingPartitioner is part of the ParallelExtensionsExtras).
This looks like a nice solution too - I'll play with that and see what I get :) – Matt Roberts Jun 1 at 20:22.
Make the number of workers configruable. Also too many workers and it will get slower (as indicated by another poster), so you need to find the sweet spot. A configurable value would allow test runs to find the optimal value or would allow your program to be adaptable for different types of hardware.
YOu could certainly place this value in App. Config and read it on startup.
You could also try using PLINQ to parallelize the processing to see how it compares to the approach you're currently using. It has some tricks up its sleeve that can make it very efficient under certain circumstances. _packetQ.
GetConsumingEnumerable().AsParallel(). ForAll( sweep => new IfftWorker(). DoIfft(sweep)).
1 You can't use PLINQ with a BlockingCollection. The default partitioner can both miss items or deadlock. Always use the BlockingCollectionPartitioner from the ParallelExtensionsExtras – adrianm Jun 1 at 17:56.
That looks reasonable. I've found BlockingCollection to be quite fast. I use it to process tens of thousands of requests per second.
I agree with Jim. Your approach looks really good. You are not going to get much better this.
I am not an FFT expert, but I am assuming these operations are nearly 100% CPU bound. If that is indeed the case then a good first guess at the number of workers would be a direct 1-to-1 correlation with the number of cores in the machine. You can use Environment.
ProcessorCount to get this value. You could experiment with a multiplier of say 2x or 4x, but again, if these operations are CPU bound then anything higher than 1x might just cause more overhead. Using Environment.
ProcessorCount would make your code more portable.
ProcessorCount to get this value. You could experiment with a multiplier of say 2x or 4x, but again, if these operations are CPU bound then anything higher than 1x might just cause more overhead. ProcessorCount would make your code more portable.
Another suggestion...let the TPL know that these are dedicated threads. You can do this by specifying the LongRunning option.
I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.