Java System.nanoTime is really slow. Is it possible to implement a high performance java profiler?

I've spent 10 years working on commercial Java performance profilers, for use in both development and production.

I've spent 10 years working on commercial Java performance profilers, for use in both development and production. The short answer is - yes, you're right. You can't pull that off.

And even if you could, putting anything but trivial instrumentation into a method that is called so frequently can: Change the way the JIT treats the code, thus skewing your performance numbers in hard to predict (but generally not useful, from a performance tuning standpoint) ways. (and lets not get started on how making a system call in what is basically a tight assembly loop after the JIT is done with it affects all the fancy optimizations the CPU might otherwise be able to do in terms of prefetches, causing an otherwise unnecessary context switch and flushing your L1 cache, etc, etc) It's OK to instrument slow (or maybe 'infrequently called' would be better? ) methods.

You can get away with instrumenting, for example, a lot of the JDBC API to catch database issues. For actual performance tuning of actual Java code (as opposed to stuff Java calls into, like the network, filesystem, database, ...), instrumentation just isn't really the way to go. You get more understandable results, but no-one has done line-level instrumentation for performance tuning for probably 7 years now - same reasons.

Instead, commercial profilers use "sampling" technology - they periodically take a stack trace. JVMTI has some nice calls that make it pretty cheap to do so every few ms. Then you assume all the time between stack traces was spent on the new stack (which, obviously, isn't true, but statistically, it produces accurate results over a not-stupidly-short measurement period) - and you've got yourself some actionable performance numbers without crazy overhead or any kind of observer effect.

Very nice explanation. Thank you. – willpowerforever Mar 23 '10 at 2:44.

A practical suggestion: Instead of putting the call to System.nanoTime() inside the method, put it outside the loop that calls this method. However, there is a deeper point here: you are saying that you have a method(s) that is called many times and that adding two System.nanoTimes() calls to this method makes it incredibly slow. From the data that you provided your method is 35,000 times faster than a couple of Systme.nanoTime() calls (12500000s/350s = ~35,000).

This is a pretty fast method. It runs in less than a Nano. I don't think you'll be able to make it any faster.

The only performance gains that are waiting for you are those that are based on reducing the number of times this method is called (and not by making the individual method faster). Is it possible that the data is not accurate?

Single or multiple CPU?System.nanoTime() is just about the lowest-overhead you're going to get. Its speed depends on what OS you're running: On Windows it calls QueryPerformanceCounter(), on Linux it uses gettimeofday(), and on Solaris it uses gethrtime(). The fastest of these is probably the Solaris gethrtime() - it doesn't have the overheard of a regular OS system call.

Even so, it is reckoned to take "a few hundred nanoseconds" on a 300Mhz UltraSPARC box.So 500ns sounds about the right range. You might get faster timings using DTrace, but that's not something I've used. Sadly, using profiling tools incurs an overhead.

But do you really need to run through 12.5 BILLION method call invocations (according to your numbers) to profile your code?

1 The concern about nanoTime() is not totally unfounded; it takes about 1500 ns on my fast, modern linux machine. – Kevin Bourrillion Mar 19 '10 at 19:16 @caerphilly: (and Kevin Bourrillion)... I'm pretty sure the source (ie the timer) that Linux uses for gettimeofday() is configurable (eg good old PIC or the newest HPET timers etc. ) and that there are at least three completely different timer that can be used. Maybe trying with another source for gettimeofday() could help take the 1500 ns down to something more usable (it would be very unlikely all the timers have all the exact same overhead)?

I'm willing to bet one of my finger that it's at least configurable when you compile a Linux kernel. – SyntaxT3rr0r Mar 19 '10 at 19:35.

If you want profiling, use a profiler, which will hook into the JVM quasi-magically using JVMTI, without needing to jam nanoTime() calls into code. If you want microbenchmarking, just loop many many times so that when you divide the result by that number, the nanoTime() overhead all but disappears.

I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.

Related Questions