Memory management myths: promptness
People often assert that scope-based reference counting such as shared_ptr in C++ collects garbage “promptly” and some people define this as “collects at the earliest possible point”. For example, at the time of writing the Wikipedia page about garbage collection says:
“Compared to tracing garbage collection, reference counting guarantees that objects are destroyed as soon as they become unreachable…” – Wikipedia
Similar claims can even be seen in published research such as the paper “Down for the Count? Getting Reference Counting Back in the Ring”:
“Of the two fundamental algorithms on which the garbage collection literature is built, reference counting has lived in the shadow of tracing. It has a niche among language developers for whom either performance or completeness is not essential, and is unused by mature high performance systems, despite a number of intrinsic advantages such as promptness of recovery and dependence on local rather than global state.” – Blackburn et al.
On the other hand you can see statements by experts like Richard Jones, co-author of the excellent Garbage Collection Handbook, make statements like:
“More importantly, note also that even an immediate (i.e. non deferred) reference counter cannot reclaim objects as soon as they are no longer referenced as finalisation must be asynchronous (see Hans Boehm's POPL03 paper "Destructors, finalizers and synchronization").” – a post on the gc-list by Richard Jones.
Let’s have a closer look at the thinking behind this belief and test it with a simple program. The mental model that underpins this belief is that any function’s local variables are stored in separate slots in the function’s stack frame for the entire duration of a function’s body and, therefore, will be reachable from the point of view of the garbage collector for the duration of the call to the function. This mental model underpins exam and interview questions such as Is object eligible for garbage collection after “obj = null”? and When Is The Object Eligible For Garbage Collection?.
In reality, this mental model is simple, obvious and wrong. Why? Firstly, the garbage collector sees the run-time representation of a program after it has been subjected to transforms such as inlining, instruction reordering and code block reordering by the compiler that can mutilate the structure of a program beyond recognition and, consequently, concepts like scope that exist only in the source code and not in the compiled form are not visible to the garbage collector. Secondly, the register allocator does everything possible to keep references in registers and avoid spilling them to the stack and when they must be spilled it uses the results of liveness analysis to overwrite any dead references in the stack frame whenever possible. In fact, some compilers don’t even use stack frames, such as our own x86 JIT in F# and the HLVM project, and other compilers like SML/NJ convert every call into continuation style and put stack frames on the heap, splitting every segment of code between a pair of function calls in the source into its own separate function in the compiled form.
Enough theory, let’s take a look at some working code. Here is a simple example using tracing garbage collection in OCaml/F# where an argument tmp to a function dies in the middle of the function body and, in particular, before a recursive call:
let rec loop tmp i =
if i<=0 then tmp else
let tmp2 = loop (Array.copy tmp) (i-1)
tmp2.[0] <- tmp2.[0] + 1
tmp2
When run using loop (Array.init m id) n, this code clearly uses less than mn space and keeps on running indefinitely. This can only be because the argument tmp is no longer reachable via the stack when the recursive call is made and, consequently, gets garbage collected.
Here is the equivalent using scope-based reference counting in C++:
shared_ptr<vector<double> > loop(shared_ptr<vector<double> > tmp, int i) {
if (i<=0) {
return tmp;
} else {
shared_ptr<vector<double> > tmp1(new vector<double>(*tmp));
shared_ptr<vector<double> > tmp2 = loop(tmp1, i-1);
++(*tmp2)[0];
return tmp2;
}
}
In contrast, this code clearly requires at least mn space when run, goes to swap and (on Windows) dies from out of memory. Unlike the OCaml/F# code, the scope-based reference counting using shared_ptr in C++ keeps the tmp array allocated for longer than necessary, right until the end of the function call.
This observation also destroys another popular memory management myth: that tracing garbage collection always requires more memory than reference counting.
If there is any advantage to the C++ then it is the presence of guarantees. The semantics of C++ guarantee that after the end of scope the object has been deleted. However, it is worth noting that this guarantee of determinism does not apply to objects shared between threads because in that situation the threads race to decrement the reference counter to zero and the winner of the race condition is burdened with executing the destructor.
Comments
Post a Comment