If there is significant contention among shared resources in the different threads, it could be that locking and unlocking objects requires a large amount of IPI (inter-processor interrupts) and the processors may spend more time discarding their L1 and L2 caches and re-fetching data from other CPUs than they actually spend making progress on solving the problem at hand.
If there is significant contention among shared resources in the different threads, it could be that locking and unlocking objects requires a large amount of IPI (inter-processor interrupts) and the processors may spend more time discarding their L1 and L2 caches and re-fetching data from other CPUs than they actually spend making progress on solving the problem at hand. This can be a problem if the application has way too-fine-grained locking. (I once heard it summed up "there is no point having more than one lock per CPU cache line", which is definitely true, and perhaps still too fine-grained.) Java's "every object is a mutex" could lead to having too many locks in the running system if too many are live and contended.
I have no doubt someone could intentionally write such an application, but it probably isn't very common. Most developers would write their applications to reduce resource contention where they can.
I doubt the "Much" part. My guess would be that the expense of moving state from one cpu to another is high enough to be noticeable. Generally you want jobs to stay on the same cpu so its data is cached as much as possible locally.
This is entirely speculation without the article/data in question, but there are some types of programs which are not well suited for parallelization - perhaps the application is never CPU-bound (meaning the CPU is not the bottleneck, perhaps some sort of I/O is). However this question/conversation is pretty baseless without more details.
There is no Java-specific reason for this, but moving state from core to core or even from CPU to CPU takes time. This time can be used better if the process stays on a single core. Also, caching can be improved in such cases.
This is only relevant though if the program does not utilize multiple threads and can thus distribute its work on to multiple cores/CPUs effectively.
The application could make very poor use of blocking inter-thread communication. However, this would purely be down to the fact that the application is programmed exceptionally poorly. There is no reason at all why any even mediocre-ly programmed multi-core application with a moderately parallelisable workload should run slower on multiple cores.
From a pure performance perspective, the challenge is often around the memory subsystem. So while more CPUs is often good, having CPUs that aren't near the memory that the Java objects are sitting in is very, very expensive. It is VERY machine specific, and depends greatly on the exact path between each CPU and memory.
Both Intel and AMD have had various shapes / speeds here, and the results vary greatly. See NUMA for reasons why multi-core might hinder. We have seen performance deltas in the 30% range or more depending on how JVMs are pinned to processors.
SPECjbb2005 is now mostly run in "multi-JVM" mode with each JVM associated with a given CPU / memory for this reason.
The JIT will not include memory barriers if it thinks its running in a single core. I suspect that is what is happening in the referenced article. Here is a very concise explanation of memory barriers, it also provides a neat technique of seeing the JIT'd code: infoq.com/articles/memory_barriers_jvm_c... This isn't to say all applications would benefit from being placed on a single core.
Recent Intel CPUs have Turbo Boost: en.wikipedia.org/wiki/Intel_Turbo_Boost.
That would, at worst, mean that a truly absurd (if not malicious) task scheduler could try to arrange things so that Turbo Boost doesn't kick in -- and I'm not even sure it could actually be manipulated that way. In any case it would never amount to a a 6x performance difference. – Nicholas Knight Jul 13 '10 at 20:23 1 I agree.
Turbo Boost doesn't even come close to 6x. – DeadMG Jul 14 '10 at 20:03.
This will be depend on the number of threads the application spawns. If you spawn say four worker-threads doing heavy number-crunching, the app will be almost four times faster on a quad-core machine, depending on how much book-keeping and merging you must do.
CPU often have a limit to how much heat they can produce. This means a chip with less core can run at a high frequency which can result in a program running faster if it doesn't use the extra core effectively. Today the difference is between 4, 6 and 8 core, where more cores are individually slower.
I don't know of any single core systems which are faster than the fastest 4 core system.
In the article the author made his server faster by allocating it to just a single CPU instead of 6 – IttayD Jul 14 '10 at 3:54 You are right that he is allocating all his processes to one core. If this works, he almost certainly has a tuning problem, though what it is is unclear. – Peter Lawrey Jul 14 '10 at 5:54.
I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.