Posts

Showing posts from November, 2011

What is your garbage collector collecting?

Image
As a garbage collector runs it recycles unreachable values. A well known result in GC theory is that values tend to die in "clumps" rather than individually. This post visualizes these clumps. We instrumented a C++ program equipped with our own mark-region collector (using 64kB regions) that is equivalent to the following F# implementation of a list-based n-queens solver in order to gather information about the structures being collected: let safe x1 y1 x2 y2 = x1 <> x2 && y1 <> y2 && x2 - x1 <> y2 - y1 && x1 - y2 <> x2 -...

The LMAX disruptor and Baker's Treadmill

Image
A new retail financial trading platform called LMAX recently published their work on new software technology for low-latency servers that they call the "disruptor": We have noticed a striking similiarity between their disruptor and an old garbage collection algorithm from 1992 called Baker's Treadmill : The LMAX disruptor and the story behind it are very interesting in their own right and well worth reading up on. In their system, binary messages come from the "receiver" and are distributed to the "journaller", "replicator" and "unmarshaller" in parallel and the "unmarshaller" then passes on the deserialized messages to the "consumer". The core of their idea is to accomplish all of this message passing using a single shared data structure, the disruptor, rather than using several separate concurrent queues. Baker's Treadmill is a low-latency garbage collection algorithm. Allocated blocks in the heap are li...

Real garbage collector characteristics

Image
The trade-offs between tracing and reference counting garbage collectors are nicely demonstrated by systems like F# and Mathematica. F# inherits garbage collection from .NET which uses a conventional generational tracing garbage collector with three generations and a Large Object Heap (LOH). Mathematica uses reference counting with language semantics that make it impossible to create cycles in the heap. The following Mathematica program creates a balanced binary tree 25 levels deep that contains 2 25 branches stores it in a variable and then mutates the variable back to the empty list: Because Mathematica uses reference counting, the act of resetting the variable back to the empty list decrements the reference count for the root of our tree back to zero, causing an avalanche of reference counts in the branches down to the leaves also being decremented back to zero, reclaiming all of the space that was consumed by the tree that is now unreachable. Moreover, Mathematica is serial so all...

Classifying garbage collection algorithms

Richard Jones' excellent new book about garbage collection has rekindled my fascination with this important subject. The Wikipedia page about GC is disappointingly poor quality so it is productive to review the main points here. GC algorithms are categorized into four main families with production garbage collectors often combining algorithms from different families. The families are copying, mark-sweep, mark-compact and reference counting. The first three are all tracing collectors that work by tracing all reachable values by starting from the global roots (global variables and locals held on the stack). Copying collectors allocate into one space and then "evacuate" the reachable heap blocks (called "survivors") into another space before clearing the space they came from. The Cheney semi-space algorithm is the simplest copying collector: it uses two spaces and copies from one to the other and back again. The advantage of copying collection is that many heap-...

Applying optimization algorithms to profits

As a technology company, we like to apply technical solutions to problems at all levels. Our board of directors even apply technical solutions to the problem of company direction. Business can be thought of as an optimization algorithm: tweaking stuff and things in order to maximize company profits. Interestingly, we use a number of different kinds of optimization algorithm when dictating the direction of the company. We begin new product lines based on experience but continue to optimize our products based on customer feedback, trying to solve the problems that are most important to our customers. For example, our F# for Numerics library started life as our second attempt at selling libraries to F# users (our first attempt was F# for Visualization ) and we provided the features we thought would be most useful. Customers inevitably requested more features including both technical features like parallel matrix inversion with arbitrary-precision rational arithmetic but also non-technica...