Posts

Showing posts from September, 2013

How do reference counting and tracing garbage collection compare?

Reference counting works by counting the number of references to each object. When the reference count drops to zero the object is definitely unreachable and can be recycled. The first advantage is simplicity. The second advantage is that decrements can occur as locals fall out of scope so collection can be deterministic and predictable. The third advantage is that the asymptotic complexity of reference counting does not depend upon the total size of the heap. The first problem is that cycles keep reference counts above zero even when they are unreachable, so reference counting alone cannot handle the general case. The second problem with reference counting is that incrementing and decrementing a counter in an object every time a reference to it is copied or dropped is very computationally expensive because it incurs a cache miss even if nothing else in the object is read or written. The third problem is that multithreaded reference counting is non-deterministic because increments and ...

Why is garbage collection an important feature?

Garbage collection eliminates a class of bugs caused by erroneous memory management (forget to free, free too soon, free more than once). Garbage collection removes the need for APIs to describe contracts about memory management. Garbage collection facilitates programming styles such as first-class lexical closures in functional programming (see the  Funarg problem ).

What's new in garbage collection?

Since the 1950s there have been three main families of collectors: semi-space, mark-sweep and mark-compact. Almost all production GCs have been generational mark-sweep even though they exhibit pathological performance when the nursery is full of survivors because they are marked, (physically) copied and all references to them updated which is ~3x slower than necessary and is practically useful when filling a hash table with heap-allocated keys and/or values. The Beltway (2002) and Immix (2008) garbage collectors introduced the new family called the mark-region GCs. With a mark-region GC the entire heap is a collection of regions and so is the nursery generation so it can be logically aged by replacing it with another region when it is full of survivors. Sun's Hotspot JVM introduced the first mainstream mark-region GC with its G1 collector. The advent of multicore in 2005 has meant more emphasis on parallel and concurrent garbage collectors. The Staccato (2008) garbage collector ...

A look at the IronJS source code

The author of IronJS has made some claims (e.g. F# can feel “like a slightly crappier version of C#”) that I find surprising so I'm reading his code. I find it to be quite odd for a variety of reasons: Surprisingly OOP (dozens of interfaces and classes and hundreds of augmentations, static members and overloads) given how well suited ML is to writing compilers. Lots of low-level bit-twiddling. Uses  List.empty  instead of  [] . Uses large tuples. Uses arrays of closures, e.g.  Stmt : (Lexer.Token -> State -> Ast.Tree) array . Doesn't use a symbol table. Choice of data structures appears to incur lots of unnecessary boxing. Odd choice of data structures for memory representation, e.g.  null ,  left  and  stmt  fields for a token are scattered. Doesn't seem to use concurrency or parallelism. Parser contains only 42  match  expressions in 1,400 lines of code. Massive amounts of code duplication. Hand-rolled parser is unusual. Ha...

Efficient Parallel Stencil Convolution in Haskell

Until recently, all research on parallel Haskell programs neglected CPU caches and existing high-performance software solutions like BLAS , LAPACK , FFTW and OpenCV . We criticized the paper Regular, shape-polymorphic, parallel arrays in Haskell for comparing parallel Haskell programs only with poorly-written C code, particularly because this buries the real performance differences in unnecessary cache misses. In 2011, Ben Lippmeier, Gabriele Keller and Simon Peyton-Jones published a paper entitled Efficient Parallel Stencil Convolution in Haskell that is the first to address caches and locality in the context of parallel Haskell programs and to compare parallel Haskell programs to existing high-performance solutions. The paper discusses an algorithm for edge detection in computer vision that was 10x slower using the old REPA approach and walks through the design and implementation of a more advanced solution using new approaches that take CPU caches into account and, ultimately, a...

Hash table insertion performance: F# vs OCaml vs Haskell

Image
Time taken to insert ten million int→T bindings for T as each of int , float , complex and string using F# on .NET 4.5, OCaml 3.12 and GHC 7.4.2: On a 3.4GHz Intel Core i7-3770 running Windows 7. Value types and reified generics facilitate .NET's very efficient Dictionary implementation when handling value types like int , float and complex (unmatched by any other language/VM including C, C++ and Java). F# is 50% faster than OCaml and Haskell using the reference type string perhaps because the .NET write barrier is significantly more efficient. Note: we use "int" to refer to ordinary 32-bit ints in F# and Haskell but 31-bit ints in OCaml. Using 32-bit ints in OCaml would be much slower because they are boxed (i.e. heap allocated).