I think there are good gc related reasons to avoid this sort of allocation behaviour. Depending on the size of the heap & the free space in eden at the time of allocation, simply allocating a 30000 element byte could be a serious performance hit given that it could easily be bigger than the TLAB (hence allocation is not a bump the pointer event) & there may even not be enough space in eden available hence allocation directly into tenured which in itself likely to cause another hit down the line due to increased full gc activity (particularly if using cms due to fragmentation) Having said that, the comments from fdreger are completely valid too. A multithreaded object pool is a bit of a grim thing that is likely to cause headaches.
You mention they handle a single request only, if this request is serviced by a single thread only then a ThreadLocal byte that is wiped at the end of the request could be a good option. If the request is short lived relatively to your typical young gc period then the young->old reference issue may not be a big problem (as the probability of any given request being handled during a gc is small even if you're guaranteed to get this periodically).
I think there are good gc related reasons to avoid this sort of allocation behaviour. Depending on the size of the heap & the free space in eden at the time of allocation, simply allocating a 30000 element byte could be a serious performance hit given that it could easily be bigger than the TLAB (hence allocation is not a bump the pointer event) & there may even not be enough space in eden available hence allocation directly into tenured which in itself likely to cause another hit down the line due to increased full gc activity (particularly if using cms due to fragmentation). Having said that, the comments from fdreger are completely valid too.
A multithreaded object pool is a bit of a grim thing that is likely to cause headaches. You mention they handle a single request only, if this request is serviced by a single thread only then a ThreadLocal byte that is wiped at the end of the request could be a good option. If the request is short lived relatively to your typical young gc period then the young->old reference issue may not be a big problem (as the probability of any given request being handled during a gc is small even if you're guaranteed to get this periodically).
There are two threads who work with the buffer - one for writing (after having read from the network), one for reading (which comes from higher levels of the application). You make a valid point regarding allocating 30000 bytes, do you have other ideas that may be of help? – Yon Mar 15 at 7:06 1 how long does it take to process a request?
How does it pass the byte from one thread to the other? If you are experiencing a performance impact from frequent allocation of large arrays then a simple fix would be to pass the byte back to the request handler via some j.u. C queue.
Each offer to that costs an object as it will be wrapped in some internal Node instance but the cost of that really will be tiny relative to the cost of frequently allocating 30k byte. So seed the queue with some no of byte to begin with then poll from the queue on each new request (create if necessary). – Matt Mar 15 at 8:58 The byte is accessed from both threads directly (they sync on the client's object) and there are two pointers - one for the current write position and one for the read position (basically, the read position "chases" the write position).
Is it my understanding that you're basically suggesting a pool that is implemented using a queue (such as ConcurrentlyLinkedQueue)? – Yon Mar 15 at 14:44 yes, you could do it with some array of byte and reserving pointers into that structure but not much point to my mind. Just pop one off the queue to use it and push it back on when you're done.
Ideally you'd reuse all byte instances within each eden collection so that no Node instance survives more than a single collection in order to prevent any leakage (of nodes) to tenured. – Matt Mar 15 at 15:16 Alternatively do something that bit cruftier like copy a j.u. C class and replace the internal Node with your own byte wrapper class that acts as a Node.
This sort of behaviour is explicitly supported in the javolution collection classes & it's a shame the j.u. C classes don't explicitly allow it too. – Matt Mar 15 at 15:19.
Probably pooling will not help you much if at all - possibly it will make things worse, although it depends on a number of factors (what GC are you using, how long the objects live, how much memory is available, etc. ): The time of GC depends mostly on the number of live objects. Collector (I assume you run a vanilla Java JRE) does not visit dead objects and does not deallocate them one by one.It frees whole areas of memory after copying the live objects away (this keeps memory neat and compacted). 100 dead objects can collect as fast as 100000.
On the other hand, all the live objects must be copied - so if you, say, have a pool of 100 objects and only 50 are used at a given time, keeping the unused object is going to cost you. If your arrays currently tend to live shorter than the time needed to get tenured (copied to the old generation space), there is another problem: your pooled arrays will certainly live long enough. This will produce a situation where there is a lot of references from old generation to young - and GCs are optimized with a reverse situation in mind.
Actually it is quite possible that pooling arrays will make your GC SLOWER than creating new ones; this is usually the case with cheap objects. Another cost of pooling comes from synchronizing objects across threads and cleaning them up after use. Both are trickier than they sound.
Summing up, unless you are well aware of the internals of your GC and understand how it works under the hood, AND have a results from a profiler that show that managing all the arrays is a bottleneck - DO NOT POOL.In most cases it is a bad idea.
Thank you for the comprehensive response. We'll try to get the profiling done soon and come back with an answer. Can you think of other solutions for decreasing the impact of allocating these byte arrays?
– Yon Mar 15 at 7:02 I don't know what do you mean by 'impact'. What is wrong with the application? What bad symptoms should be cured?
Do you have problem with the overall speed? Frequent major collections? Frequent minor collections?
Depending on what's your problem, the answers will be different. First hint: if you want your allocations to use less memory, just give it less memory. You might get smaller throughput, but you will get rid of long GC pauses.
– fdreger Mar 16 at 19:24 The long GC pauses are our biggest problem right now, because the entire system is unresponsive. Also, the GC pauses wreck havoc with clocks (such as timeout-related functions). The question is - do we have a method of decreasing GC pauses without decreasing throughput related to these byte arrays?
– Yon Mar 21 at 14:18 Probably not, it is a tradeoff (you would face the same problem in C++ - either clean often and use less memory or clean big blocks and use a lot). Before you try something more radical, try and give your program as little memory as possible. This would be something like 'an ammount of memory that all the active clients need at once at the peak of memory usage + 20%'.
This will force very frequent minor collections, but they should be very fast. This may be good enough. – fdreger Mar 21 at 16:51.
If garbage collection in your case is really a performance hit (often cleaning up the eden space does not take much time if not many objects survive), and it is easy to plug in the object pool, try it, and measure it. This certainly depends on your application's need.
1 for "try it and measure it". Optimization is tricky, and an ounce of experimental data is worth a pound of internet opinions. – Mark Tozzi Mar 14 at 21:30 This is quite a complex application.Do you think setting up the following would serve the "try it and measure it" goal: creating an app which only uses the clients, running it using something like YourKit (counting GC's, etc. ) using current solution and the proposed one?
– Yon Mar 15 at 6:34 you don't need a profiler to count stw pauses, just use -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCApplicationConcurrentTime -Xloggc:gc. Log then parse the log file to see how frequently it pauses. – Matt Mar 15 at 9:03.
The pool would work out much better as long as you always have a reference to it, this way the garbage collector simply ignores the pool and will only be declared once (you could always declare it static to be on the safe side). Although it would be persistent memory but I doubt that will be a problem for your application.
I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.