MATLAB's Garbage Collector?

This is the list of facts I collected. Instead of GC the term memory (de)allocation seems to be more appropriate in this context My principal information source is the blog of Loren (especially its comments) and this article from MATLAB Digest Because of its orientation for numeric computing with possible large data sets, MATLAB does really good job on optimizing stack objects performance like using in-place operations on data and call-by-reference on function arguments. Also because of its orientation its memory model is fundamentally different from such OO languages as Java MATLAB had officially no user-defined heap memory until version 7 (in version 6 there was undocumented reference functionality in schema.

M files). MATLAB 7 has heap both in form of nested functions (closures) and handle objects their implementation share the same underpinnings. As a side note OO could be emulated with closures in MATLAB (interesting for pre-2008a) Surprisingly it is possible to examine entire workspace of the enclosing function captured by function handle (closure), see function functions(fhandle) in MATLAB Help.It means that enclosing workspace is being frozen in memory.

This is why cellfun/arrayfun are sometimes very slow when used inside nested functions There are also interesting posts by Loren and Brad Phelan on object cleanup The most interesting fact about heap deallocation in MATLAB is, in my opinion, that MATLAB tries to do it each time the stack is being deallocated, i.e. On leaving every function. This has advantages but is also a huge CPU penalty if heap deallocation is slow.

And it is actually very slow in MATLAB in some scenarios! The performance problems of MATLAB memory deallocation that can hit code are pretty bad. I always notice that I unintentionally introduce a cyclic references in my code when it suddenly runs x20 slower and sometimes needs some seconds between leaving function and returning to its caller (time spent on cleanup).

It is a known problem, see Dave Foti and this older forum post which code is used to make this picture visualizing performance (tests are made on different machines, so absolute timing comparison of different MATLAB versions is meaningless): Linear increase of pool size for reference-objects means polynomial (or exponential) decrease of MATLAB performance! For value-objects the performance is, as expected, linear Considering these facts I can only speculate that MATLAB uses not yet very efficient form of reference counting for heap deallocation EDIT : I always encountered performance problem with many small nested functions but recently I noticed that at least with 2006a the cleanup of a single nested scope with some megabytes of data is also terrible, it takes 1.5 seconds just to set nested scope variable to empty! EDIT 2 : finally I got the answer by Dave Foti himself He acknowledges the flaws but says that MATLAB is going to retain its present deterministic cleanup approach Legend: Shorter execution time is better.

This is the list of facts I collected. Instead of GC the term memory (de)allocation seems to be more appropriate in this context. My principal information source is the blog of Loren (especially its comments) and this article from MATLAB Digest.

Because of its orientation for numeric computing with possible large data sets, MATLAB does really good job on optimizing stack objects performance like using in-place operations on data and call-by-reference on function arguments. Also because of its orientation its memory model is fundamentally different from such OO languages as Java. MATLAB had officially no user-defined heap memory until version 7 (in version 6 there was undocumented reference functionality in schema.

M files). MATLAB 7 has heap both in form of nested functions (closures) and handle objects, their implementation share the same underpinnings. As a side note OO could be emulated with closures in MATLAB (interesting for pre-2008a).

Surprisingly it is possible to examine entire workspace of the enclosing function captured by function handle (closure), see function functions(fhandle) in MATLAB Help. It means that enclosing workspace is being frozen in memory. This is why cellfun/arrayfun are sometimes very slow when used inside nested functions.

There are also interesting posts by Loren and Brad Phelan on object cleanup. The most interesting fact about heap deallocation in MATLAB is, in my opinion, that MATLAB tries to do it each time the stack is being deallocated, i.e. On leaving every function.

This has advantages but is also a huge CPU penalty if heap deallocation is slow. And it is actually very slow in MATLAB in some scenarios! The performance problems of MATLAB memory deallocation that can hit code are pretty bad.

I always notice that I unintentionally introduce a cyclic references in my code when it suddenly runs x20 slower and sometimes needs some seconds between leaving function and returning to its caller (time spent on cleanup). It is a known problem, see Dave Foti and this older forum post which code is used to make this picture visualizing performance (tests are made on different machines, so absolute timing comparison of different MATLAB versions is meaningless): Linear increase of pool size for reference-objects means polynomial (or exponential) decrease of MATLAB performance! For value-objects the performance is, as expected, linear.

Considering these facts I can only speculate that MATLAB uses not yet very efficient form of reference counting for heap deallocation. EDIT: I always encountered performance problem with many small nested functions but recently I noticed that at least with 2006a the cleanup of a single nested scope with some megabytes of data is also terrible, it takes 1.5 seconds just to set nested scope variable to empty! EDIT 2: finally I got the answer - by Dave Foti himself.

He acknowledges the flaws but says that MATLAB is going to retain its present deterministic cleanup approach. Legend: Shorter execution time is better.

MATLAB makes the workspace very clear in the Workspace browser or with the "whos" command. This shows you all the objects created by your commands and how much memory they take up. Feature('memstats') will show you the largest contiguous block of memory available to MATLAB, which means that is the largest matrix you can create.

Using the "clear" command will synchronously remove those objects from memory and free up the space to be used again. The JVM handles the garbage collection only of Java items. So if you open a file in the editor and close it, Java takes care of removing the window and text, etc from memory.

If you create a Java object in the MATLAB workspace, it first has to be cleared and then it can be cleaned up by the jvm. There's lots of information about managing program memory in our technote: mathworks.com/support/tech-notes/1100/11... And I recently wrote about handling Java memory on the MATLAB Desktop blog: http://blogs.mathworks.com/desktop/2009/08/17/calling-java-from-matlab-memory-issues/ If you're academically interested what happens to memory allocated when a function exits or when you resize a variable...I'm pretty sure that's a trade secret and it changes every release. You should never notice it, and if you run into performance problems that you suspect are related to object managmenet, please file a help ticket with technical support: http://www.mathworks.com/support.

Mike: I think that we can speak about 3 kinds of objects in Matlab: java, stack and heap. Your answer is about the first two but my question is about the third. – Mikhail Sep 19 '09 at 12:24 This answer deserves more upvotes, since it provides useful information as opposed to rlbond's post.

– Cecil Has a Name Sep 19 '09 at 13:18 1 Sorry Cecil, this answer IS NOT the answer to my question. It is just some stuff, unrelated to my question. This answer looks suggestive but is just a purposeful distraction from the problem I asked about.

– Mikhail Nov 7 '09 at 19:44 1 I am running into performance problems that I suspect are related to object management. Stackoverflow. Com/questions/4268113/… – Marc Nov 24 '10 at 17:55.

What I think about MATLAB's GC is that if you have to ask, MATLAB is not the right language for your problem. The language is high level and offers no direct memory access for a reason; proper problems take advantage of that abstraction.

Mikhail: Matlab is a programming language for scientists and engineers, made to perform data processing. None of the implementation details of the Matlab language are actually specified. The Matlab environment handles all memory automatically; the user never specifically allocates memory.So in that sense there is no GC that is shown to the user.

However, the environment runs on java, so the environment itself uses Java's GC. – rlbond Sep 18 '09 at 20:05 7 No matter what programming environment, understanding its memory management characteristics is extremely valuable to utilize it more efficiently. Even though a system can be designed idiot-proof there's always some way to abuse or harness it, so Mikhail's question is very relevant.

– Cecil Has a Name Sep 19 '09 at 13:22.

It seems like you're trying to construct some sort of Python vs MATLAB argument. I'm not that interested in that argument. A meta-answer to your meta-question.It's actually fairly important that you don't care.

When I say that, I don't mean to limit it to MATLAB memory management. This extends to Python, Java, . NET and any other language that does dynamic memory allocation and is still under active development.

The more you know about the current mechanism of memory management, the more likely you'll code defensively against that specific implementation, the more likely it becomes that you won't benefit from future performance improvements. A number of good examples of this can be found in Java's gc capably written up by Brian Goetz over at developerworks.Com: ibm.com/developerworks/library/j-jtp0127... You can say it's important to know. I counter that it's all about the requirements.

The more appropriate question is, do the languages I am considering for my project meet my needs in terms of performance, development effort, maintainability, portability, expertise of my developers, etc, etc?I've never seen a project with a requirement for using a generational gc over mark sweep over ref counting. I don't expect to see one soon.

I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.

Related Questions