Posts

Showing posts from January, 2011

Sweeping 700× faster

Image
Our initial experiments using the new garbage collector design indicate that it dramatically improves the performance of the slowest GC phase (sweeping) as expected, to the extent that it brings the overall performance of our C++ prototype within 10% of OCaml without introducing any of the overheads of generational garbage collection: Moreover the algorithm is very simple and, in particular, parallel and concurrent variants should be much easier to design with this style of collector than with generational collectors because regions are a suitable granularity. The sweep phase of the GC accounted for around a third of the total running time of the entire program using the traditional algorithm with allocated and free lists, with each sweep taking around 70µs. Using the new bitwise algorithm, the sweep phase takes just 100ns and accounts for just 0.7% of the total running time of the program. Our prototype mark region collector using a fake marking phase is now between 2 and 10% slower ...

The importance of locality and sparsity in memory management

Image
Our previous articles describing the disadvantages of generational garbage collection and our prototype mark-region memory management system designed for HLVM originally showed that region-based allocation and deallocation has the potential to be only 4-20% slower than OCaml's generational collector. However, our more recent work that was designed to be more realistic by deferring deallocations to bulk GC cycles was significantly slower, around twice as slow as OCaml. There are several differences between the stack-based deallocation scheme used in the first benchmark and the GC-like deallocation scheme used in the second benchmark that have the potential to account for this performance degradation: The mock GC introduced mark bytes into the allocated values and marked them as unreachable when they fell out of scope in the mutator. The mock GC allocated by popping a reference of the top of a region's free list. The mock GC deallocated by pushing a reference onto the free lis...

Paul Graham's accumulator generator

Paul Graham once published an article about what he called "accumulator generators". This problem requires the existence of an unspecified numeric tower. Lisp happens to have one and it happens to be adequate for Paul Graham's examples. You can implement a numeric tower in F# either using a union type (like type number = Int of int | Float of float ) or by boxing everything. The following solution uses the latter approach: let add ( x : obj ) ( y : obj ) = match x , y with | (:? int as m ), (:? int as n ) -> box ( m + n ) | (:? int as n ), (:? float as x ) | (:? float as x ), (:? int as n ) -> box ( x + float n ) | (:? float as x ), (:? float as y ) -> box ( x + y ) | _ -> failwith "Run-time type error" let acc x = let x = ref x fun ( y : obj ) -> x := add ! x y ! x let x : obj -> _ = acc ( box 1 ) do x ( box 5 ) do acc ( box 3 ) do printfn "%A" ( x ( box 2.3 )) However, nume...