A look at the IronJS source code

September 14, 2013

The author of IronJS has made some claims (e.g. F# can feel “like a slightly crappier version of C#”) that I find surprising so I'm reading his code. I find it to be quite odd for a variety of reasons:

Surprisingly OOP (dozens of interfaces and classes and hundreds of augmentations, static members and overloads) given how well suited ML is to writing compilers.
Lots of low-level bit-twiddling.
Uses List.empty instead of [].
Uses large tuples.
Uses arrays of closures, e.g. Stmt : (Lexer.Token -> State -> Ast.Tree) array.
Doesn't use a symbol table.
Choice of data structures appears to incur lots of unnecessary boxing.
Odd choice of data structures for memory representation, e.g. null, left and stmt fields for a token are scattered.
Doesn't seem to use concurrency or parallelism.
Parser contains only 42 match expressions in 1,400 lines of code.
Massive amounts of code duplication.
Hand-rolled parser is unusual. Hand-rolled lexer without tables is even more unusual.
Strange choice of syntax, e.g. lots of p |> csymbol instead of just csymbol p.
Allegedly optimized but no structs.
Uses HashSet.Add repeatedly instead of the declarative constructor from a seq.
Doesn't use HashIdentity.Structural.

Furthermore, I don't recall ever seeing that author ask anyone for help with it and it looks like he could do with quite a bit of advice. So I'd take the "rewrite in C#" with a pinch of salt...

Here's an example of some code from the hand-rolled lexer that is using many comparisons of unicode characters when single table lookups could suffice:

let inline isAlpha c = (c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z')

let inline isDecimal c = c >= '0' && c <= '9'

let inline isOctal c = c >= '0' && c <= '7'

let inline isNonOctalDigit c = c = '8' && c = '9'

let inline isHex c = isDecimal c || (c >= 'A' && c <= 'F') || (c >= 'a' && c <= 'f')

Here's an example of some odd code that uses OOP methods to give names to the elements of a tuple when it should just use a record type instead of a tuple:

member x.TokenName =

let s, _, _, _ = x.Token in s |> Lexer.Symbol.getName

member x.TokenValue =

let _, v, _, _ = x.Token in v

member x.TokenLine =

let _, _, l, _ = x.Token in l

member x.TokenColumn =

let _, _, _, c = x.Token in c

I believe this is the offending type definition:

type Token = int * string * int * int

Here's another example of some strange code from the lexer where the programmer has manually assigned consecutive integers to symbols:

let [<Literal>] Break = 0

let [<Literal>] Case = 1

let [<Literal>] Catch = 2

let [<Literal>] Continue = 3

let [<Literal>] Default = 4

...

Note the complete absence of any metaprogramming here, which is surprising because it is written in a MetaLanguage (ML).

Here's another oddity, the use of new when constructing non-IDisposable objects:

new Dictionary<string, int>(

Yet another oddity, pretending that all type definitions are mutually recursive when they are actually completely independent:

and ScopeType

= GlobalScope

| FunctionScope

and EvalMode

= Clean

| Contains

| Effected

and LookupMode

= Static

| Dynamic

Very odd to say that a Dictionary being faster than a Map is "sad" when it isn't even being mutated:

// Sadly a normal Dictionary is so

// much faster then a F# Map that

// it's worth using it

new Dictionary<string, int>(

In summary, although IronJS is written in F# it is not idiomatic F# code and there is a lot of room for improvement. I see no reason to believe that the problems with the IronJS code base have anything to do with the F# programming language.

Search This Blog

sutton037

A look at the IronJS source code

Comments

Post a Comment

Popular posts from this blog

Benchmarking in the web age

Does reference counting really use less memory than tracing garbage collection? Mathematica vs Swift vs OCaml vs F# on .NET and Mono

Does reference counting really use less memory than tracing garbage collection? Swift vs OCaml