A look at the IronJS source code

The author of IronJS has made some claims (e.g. F# can feel “like a slightly crappier version of C#”) that I find surprising so I'm reading his code. I find it to be quite odd for a variety of reasons:
  1. Surprisingly OOP (dozens of interfaces and classes and hundreds of augmentations, static members and overloads) given how well suited ML is to writing compilers.
  2. Lots of low-level bit-twiddling.
  3. Uses List.empty instead of [].
  4. Uses large tuples.
  5. Uses arrays of closures, e.g. Stmt : (Lexer.Token -> State -> Ast.Tree) array.
  6. Doesn't use a symbol table.
  7. Choice of data structures appears to incur lots of unnecessary boxing.
  8. Odd choice of data structures for memory representation, e.g. nullleft and stmt fields for a token are scattered.
  9. Doesn't seem to use concurrency or parallelism.
  10. Parser contains only 42 match expressions in 1,400 lines of code.
  11. Massive amounts of code duplication.
  12. Hand-rolled parser is unusual. Hand-rolled lexer without tables is even more unusual.
  13. Strange choice of syntax, e.g. lots of p |> csymbol instead of just csymbol p.
  14. Allegedly optimized but no structs.
  15. Uses HashSet.Add repeatedly instead of the declarative constructor from a seq.
  16. Doesn't use HashIdentity.Structural.
Furthermore, I don't recall ever seeing that author ask anyone for help with it and it looks like he could do with quite a bit of advice. So I'd take the "rewrite in C#" with a pinch of salt...
Here's an example of some code from the hand-rolled lexer that is using many comparisons of unicode characters when single table lookups could suffice:

let inline isAlpha c = (c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z')
let inline isDecimal c = c >= '0' && c <= '9'
let inline isOctal c = c >= '0' && c <= '7'
let inline isNonOctalDigit c = c = '8' && c = '9'
let inline isHex c = isDecimal c || (c >= 'A' && c <= 'F') || (c >= 'a' && c <= 'f')

Here's an example of some odd code that uses OOP methods to give names to the elements of a tuple when it should just use a record type instead of a tuple:

member x.TokenName =
  let s, _, _, _ = x.Token in s |> Lexer.Symbol.getName

member x.TokenValue =
  let _, v, _, _ = x.Token in v

member x.TokenLine =
  let _, _, l, _ = x.Token in l

member x.TokenColumn =
  let _, _, _, c = x.Token in c

I believe this is the offending type definition:

type Token = int * string * int * int

Here's another example of some strange code from the lexer where the programmer has manually assigned consecutive integers to symbols:

let [<Literal>] Break = 0
let [<Literal>] Case = 1
let [<Literal>] Catch = 2
let [<Literal>] Continue = 3
let [<Literal>] Default = 4
...

Note the complete absence of any metaprogramming here, which is surprising because it is written in a MetaLanguage (ML).

Here's another oddity, the use of new when constructing non-IDisposable objects:

new Dictionary<string, int>(

Yet another oddity, pretending that all type definitions are mutually recursive when they are actually completely independent:

and ScopeType
  = GlobalScope
  | FunctionScope
and EvalMode
  = Clean
  | Contains
  | Effected
and LookupMode
  = Static
  | Dynamic

Very odd to say that a Dictionary being faster than a Map is "sad" when it isn't even being mutated:

// Sadly a normal Dictionary is so
// much faster then a F# Map that
// it's worth using it
new Dictionary<string, int>(

In summary, although IronJS is written in F# it is not idiomatic F# code and there is a lot of room for improvement. I see no reason to believe that the problems with the IronJS code base have anything to do with the F# programming language.

Comments

Popular posts from this blog

Bjarne Stroustrup is catching up

Does reference counting really use less memory than tracing garbage collection? Mathematica vs Swift vs OCaml vs F# on .NET and Mono

Does reference counting really use less memory than tracing garbage collection? Swift vs OCaml