A look at the IronJS source code
The author of IronJS has made some claims (e.g. F# can feel “like a slightly crappier version of C#”) that I find surprising so I'm reading his code. I find it to be quite odd for a variety of reasons:
- Surprisingly OOP (dozens of interfaces and classes and hundreds of augmentations, static members and overloads) given how well suited ML is to writing compilers.
- Lots of low-level bit-twiddling.
- Uses List.empty instead of [].
- Uses large tuples.
- Uses arrays of closures, e.g. Stmt : (Lexer.Token -> State -> Ast.Tree) array.
- Doesn't use a symbol table.
- Choice of data structures appears to incur lots of unnecessary boxing.
- Odd choice of data structures for memory representation, e.g. null, left and stmt fields for a token are scattered.
- Doesn't seem to use concurrency or parallelism.
- Parser contains only 42 match expressions in 1,400 lines of code.
- Massive amounts of code duplication.
- Hand-rolled parser is unusual. Hand-rolled lexer without tables is even more unusual.
- Strange choice of syntax, e.g. lots of p |> csymbol instead of just csymbol p.
- Allegedly optimized but no structs.
- Uses HashSet.Add repeatedly instead of the declarative constructor from a seq.
- Doesn't use HashIdentity.Structural.
Here's an example of some code from the hand-rolled lexer that is using many comparisons of unicode characters when single table lookups could suffice:
let inline isAlpha c = (c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z')
let inline isDecimal c = c >= '0' && c <= '9'
let inline isOctal c = c >= '0' && c <= '7'
let inline isNonOctalDigit c = c = '8' && c = '9'
let inline isHex c = isDecimal c || (c >= 'A' && c <= 'F') || (c >= 'a' && c <= 'f')
Here's an example of some odd code that uses OOP methods to give names to the elements of a tuple when it should just use a record type instead of a tuple:
member x.TokenName =
let s, _, _, _ = x.Token in s |> Lexer.Symbol.getName
member x.TokenValue =
let _, v, _, _ = x.Token in v
member x.TokenLine =
let _, _, l, _ = x.Token in l
member x.TokenColumn =
let _, _, _, c = x.Token in c
I believe this is the offending type definition:
type Token = int * string * int * int
Here's another example of some strange code from the lexer where the programmer has manually assigned consecutive integers to symbols:
let [<Literal>] Break = 0
let [<Literal>] Case = 1
let [<Literal>] Catch = 2
let [<Literal>] Continue = 3
let [<Literal>] Default = 4
...
Note the complete absence of any metaprogramming here, which is surprising because it is written in a MetaLanguage (ML).
Here's another oddity, the use of new when constructing non-IDisposable objects:
new Dictionary<string, int>(
Yet another oddity, pretending that all type definitions are mutually recursive when they are actually completely independent:
and ScopeType
= GlobalScope
| FunctionScope
and EvalMode
= Clean
| Contains
| Effected
and LookupMode
= Static
| Dynamic
Very odd to say that a Dictionary being faster than a Map is "sad" when it isn't even being mutated:
// Sadly a normal Dictionary is so
// much faster then a F# Map that
// it's worth using it
new Dictionary<string, int>(
In summary, although IronJS is written in F# it is not idiomatic F# code and there is a lot of room for improvement. I see no reason to believe that the problems with the IronJS code base have anything to do with the F# programming language.
Comments
Post a Comment