Are multicore-capable garbage collectors hard to write?

In this era of multicore computing, garbage collectors need to allow user threads (aka mutators) to run in parallel on separate cores simultaneously in order to facilitate efficient shared memory parallel programming.

There are two relevant phrases from garbage collection terminology here:

  • Parallel GC means the garbage collector itself has been parallelised in order to speed up garbage collections. For example, a stop-the-world collector might traverse the heap in parallel when marking in an attempt to reduce pause times.

  • Concurrent GC means the garbage collector runs at the same time as the user threads (aka mutators). For example, Dijkstra's algorithm and derivatives like Doligez-Leroy use fine-grained synchronization to keep a concurrently-running collector apprised of the constantly-changing heap topology.

However, we are talking about neither parallel GC nor concurrent GC but, rather, the simpler challenge of just allowing mutators to run in parallel. In the absence of any established terminology, we call any GC that allows mutator threads to run in parallel with each other a multicore-friendly garbage collector.

Frustratingly, many people are perpetuating the myth that it is difficult to write parallel or concurrent or even just multicore-friendly garbage collectors. In particular, this is happening around the OCaml language as a result of Xavier Leroy's (in)famous "standard lecture on threads" from 2002 where he explained that they were not creating a multicore-friendly garbage collector for OCaml because multicores would never become popular and he described Intel's hyperthreading as "the last convulsive movement of SMP's corpse". Xavier is a great guy and had done a lot of great work but, of course, he was completely wrong about this. Not only are multicores ubiquitous and hyperthreading is very common but shared memory parallelism is here to stay: even if distributed parallelism becomes essential in the future when cache coherence breaks down we will be doing distributed parallel programming between multicores because shared memory parallelism is so much more efficient than distributed parallelism most of the time.

The JVM and .NET CLR obviously provide multicore-friendly garbage collectors so people sometimes assert that creating such a garbage collector requires huge resources and is beyond a small group such as the OCaml team at INRIA. This is simply not true. The simplest multicore-friendly garbage collector design is a stop-the-world collector that pauses all user threads while the entire heap is marked and swept safe in the knowledge that the heap topology is static. Our own HLVM project currently uses exactly this design and it took just a few days to write and is under 100 lines of code! And we are not alone. Simon Marlow has written several far more sophisticated multicore-friendly garbage collectors for the Glasgow Haskell Compiler by himself. The PolyML project developed a multicore-friendly garbage collector without benefit of funding from a multinational corporation. Same for Manticore.

Even some of the more sophisticated mostly-concurrent garbage collector designs are remarkably simple. For example, the Very Concurrent Garbage Collector (VCGC) uses a breathtakingly-elegant approach based around three epochs (instead of the usual tricolor marking scheme) that completely avoids the error-prone and inefficient fine-grained synchronization originally proposed by Dijkstra and followed by almost everyone else. The entire algorithm for this mostly concurrent garbage collector can be expressed in a single page of code!

So please do not let these myths put you off trying to write your own multicore-friendly garbage collector: this is an interesting, rewarding and entirely tractable challenge!

Comments

Popular posts from this blog

Bjarne Stroustrup is catching up

Does reference counting really use less memory than tracing garbage collection? Mathematica vs Swift vs OCaml vs F# on .NET and Mono

Does reference counting really use less memory than tracing garbage collection? Swift vs OCaml