More on Haskell's hash table problems
We recently proved that Haskell's defacto-standard hash table implementation is extremely slow. So slow that optimized native-code compiled Haskell is even slower than interpreted Python and 32× slower than a decent hash table implementation! If that result were not remarkable enough, we have now discovered that the history of this issue is even more surprising...
Haskell programmers have been complaining about the poor performance of its hash table implementation for at least 3 years. The cause of the problem, a bug in the garbage collector, was identified a long time ago but that bug was never fixed. Instead, it appears that the Haskell developers went on a crusade to persuade the world that decent hash table implementations are not advantageous. For example, the book "Real World Haskell" by Don Stewart et al. explicitly states:
"Compared to a hash table, a well-implemented purely functional tree data structure will perform competitively. You should not approach trees with the assumption that your code will pay a performance penalty."
Perhaps more surprising, given Haskell's uniquely bad support for hash tables, is the fact that "Real World Haskell" is one of only three books cited from the Wikipedia article on this subject (!).
Suffice to say, there is overwhelming evidence that trees are not competitively performant when compared to hash tables. Indeed, that is precisely why hash tables are such a ubiquitous data structure. In general, we have found that trees are typically 10× slower than hash tables in practice for construction and lookup with around a million elements. That is supported by our recent benchmark where a hash table in F# took 0.7 and the fastest tree-based implementation written in Haskell (which is extremely heavily optimized for trees) was still 11× slower, taking over 7s to complete the same task.
Also, the Haskell entry for the k-nucleotide benchmark on the shootout contains the comment:
-- Note: Hash tables are not generally used in functional languages, so there
-- are no available library implementations for Haskell. This benchmark
-- requires a hash table. This is why I have implemented the hash table here.
And, again, this is just completely factually incorrect. Hash tables are just as popular in functional languages (like OCaml, F#, Scala, Clojure, Mathematica) as they are in imperative and object oriented languages for the simple reason that they are a great data structure. For example, the hash table implementation bundled with OCaml is so efficient that even the uniprocessor OCaml implementation of knucleotide thrashes the parallel Haskell implementation.
Comments
Post a Comment