Avoiding and exploiting JavaScript's warts

[article index] [] [@mattmight] [rss]

JavaScript is a Gestalt language.

One's sentiment toward JavaScript flips between elegance and disgust without transiting intermediate states.

The key to seeing JavaScript as elegant is understanding its warts, and knowing how to avoid, work around or even exploit them.

I adopted this avoid/fix/exploit approach after reading Doug Crockford's JavaScript: The Good Parts:

Doug has a slightly different and more elaborate take on the bad parts and awful parts, so I'm sharing my perspective on the five issues that have caused me the most grief in the past:

  • how to fix broken block scope with with;
  • the four (not three!) meanings of this;
  • promoting arguments to an array;
  • avoiding truthiness; and
  • understanding prototype pollution.

Dave Herman, of Mozilla Research, sent detailed and thoughtful corrections of this post. He's a true authority on JavaScript.

Dave has since published Effective JavaScript: 68 Specific Ways to Harness the Power of JavaScript:

It's a precise accounting of JavaScript's finer points, and I strongly recommend it to all JavaScript programmers.

When warts collide: var versus with

Lexical block scope in JavaScript is broken, and though the use of with is generally considered poor form, it's a good solution to this problem.

In most curly-braced languages, blocks delineate lexical scope. For example, in C or Java:

 { 
    int i = 13 ;
    { 
       int i = 42 ; 
       print(i) ;
    }
    print(i) ;
 }

this code prints 42 and then 13.

But, in JavaScript:

 {
    var i = 13 ;
    {
       var i = 42 ;
       console.log(i) ;
    }
    console.log(i) ;
 }

this code prints 42 and 42.

In JavaScript, only functions introduce a new lexical scope and variable declarations are implicitly hoisted to this level.

For instance:

function f ()  {
  var i = 13 ;
  {
     var i = 42 ;
     print(i);
  }
  print(i) ;
}

Is equivalent to:

function f () {
  var i ;
  i = 13 ;
  {
     i = 42 ;
     print(i) ;
  }
  print(i) ;
}

Aside: Hoisting under the hood

JavaScript takes hoisting to extremes.

The following program -- a lone reference to x -- provokes a reference error:

 x ; // ReferenceError: x is undefined

but the following program is OK because var x gets hoisted:

 if (false) {
   var x ;
 }
 x ; // No problem! x is declared in this scope.

By extension, it must be the case (and it is) that the following is also legal:

 x ; // No problem! x is declared in this scope.
 if (false) {
   var x ;
 }

Function hoisting

The story on function hoisting is messier.

The following code works:

 console.log(fact(3)) ;

 function fact(n) {
  return (n == 0) ? 1 : n*fact(n-1) ;
 }

because the definition of fact is hoisted to the top of the block.

Thus, the following works too:

 {
   console.log(fact(3)) ;
   { 
     function fact(n) {
       return (n == 0) ? 1 : n*fact(n-1) ;
     }
   }
 }

But, the following fails:

 console.log(fact(3)) ;

 if (false) {
   function fact(n) {
    return (n == 0) ? 1 : n*fact(n-1) ;
   }
 }

in most implementations of JavaScript.

Variable declarations are hoisted out of conditionals.

Function delarations are not.

Fixing block scope with with

To restore block scoping to JavaScript, try using with with explicit objects; for example:

{ 
   var i = 13 ;
   with ({i: 42}) {
     console.log(i) ; // prints 42
   }
   console.log(i) ; // prints 13
}

Because the object is declared explicitly, it will not intefere with static analysis of the code, and it is equally straightforward for human reasoning.

This is the only justifiable use of with.

Dave Herman of Mozilla called me out on this advice, saying that the right way to handle this is an immediately applied anonymous function:

 {
    var i = 13 ;
    (function () { 
       var i = 42 ;
       console.log(i) ;  // prints 42
    })() ; 
    console.log(i) ; // prints 13
 }

The functional programmer in me agrees with Dave, but I have a hard time getting over the aesthetics.

What does this mean?

The meaning of this depends on how the current function was called:

  • directly: f(...);
  • indirectly: f.call(this,...) or f.apply(this,array);
  • as a method: o.f(...); or
  • as a constructor: new f(...).

Called directly

Called directly, this gets bound to the top-level Window object.

Because global variables are actually fields in this object, this modifies the global namespace:

 function f () {
   this.x = 3 ;
 }
 f() ;
 alert(x) ; // alert(3)

But, what about nodejs, where there is no window object?

Run this code as an experiment:

function f() {
  this.x = 10 ;
  console.log(this) ;
  console.log(this.x) ;
}

f() ;

console.log(x) ;

It prints:

{}
10
10

Clearly, the default this in nodejs is no ordinary empty object.

As expected, this object retains its powers even if returned:

function global() {
  return this ;
}

(global()).y = 20 ;

console.log(y) ; // prints 20

Called indirectly

The most bizarre (and often overlooked) behavior with respect to this comes from calling a function directly, and attempting to forcibly define this with f.call and f.apply.

If an object is supplied as the first argument, then that object becomes this.

But, if an atom like a number, a boolean or a string is passed, this is not bound to that value.

Instead, this is bound to an "objectified atom"--an object that behaves kind of like the atom.

Try this in nodejs or firebug:

 function f () { return this ; }

 var myTrue = f.call(true) ;
 console.log(myTrue) ;             // prints {}
 console.log(myTrue.constructor) ; // prints [Function: Boolean] 
 console.log(typeof myTrue) ;      // prints "object"

 var myBar = f.call('bar') ;
 console.log(myBar) ;             // prints {'0': 'b','1': 'a','2': 'r'}
 console.log(myBar.constructor) ; // prints [Function: String]
 console.log(myBar.toString()) ;  // prints bar
 console.log(typeof myBar) ;      // prints "object"

 var myThree = f.call(3) ;
 console.log(myThree) ;             // prints {}
 console.log(myThree.constructor) ; // prints [Function: Number]
 console.log(myThree.valueOf()) ;   // prints 3
 console.log(typeof myThree) ;      // prints "object"

Spooky, eh?

Called as a method

When invoked as a method--o.f()--a function receives the object o as this.

There are two situations where methods lead to trouble: Curried or nested functions and first-class methods.

It's easy to forget that when functions nest, the inner function gets its own this, even when that this makes no sense.

 o.a = 3 ;
 o.b = 4 ;
 o.c = 5 ;

 o.generateValidator = function () {
   return function () {
     if (this.a*this.a + this.b*this.b != this.c*this.c)
       throw Error("invalid right triangle") ;
   } ;
 }

The way around this scoping issue is to declare that:

 o.a = 3 ;
 o.b = 4 ;
 o.c = 5 ;

 o.generateValidator = function () {
   var that = this ;
   return function () {
     if (that.a*that.a + that.b*that.b != that.c*that.c)
       throw Error("invalid right triangle") ;
   } ;
 }

I was once bitten by accidentally using a method in a first-class setting:

 engine.setTickHandler(ship.ontick) ;

ship.ontick is a method, but once invoked, this will not be bound to ship.

In all likelihood, it will be bound to global().

My solution to this problem is inspired by the notion of η-expansion from the lambda calculus:

function eta (that,methodName) {
  var f = that[methodName] ;
  return function () {
    return f.apply(that,arguments) ;
  }
}

Then, instead of writing object.methodName to pass a method as a first-class function, use eta(object,'methodName').

Called as a constructor

When a function is called as a constructor, the value of this is the newly created object.

Omitting new by accident trashes the global namespace.

If global variables are mutating without explanation, try guarding constructors with:

 this == global() && error() ;

Fixing arguments

The ability to accept an arbitrary number of arguments in JavaScript is frequently handy.

In JavaScript, the arguments passed to a function are implicitly bound to the variable arguments.

This object looks and acts mostly like an Array, but it's just an object that happens to have numeric indices plus a field called length.

Most programmers don't discover this until it bites them.

For example, with:

function f() {
  return arguments;
}

a call to f(1,2,3) returns:

{ '0': 1,
  '1': 2,
  '2': 3  }

rather than [ 1, 2, 3 ].

The usual methods -- like indexOf -- are missing.

There are a couple ways to promote arguments to an actual Array. The method adopted by many JavaScript frameworks is to use the slice method:

 function f() {
   arguments = Array.prototype.slice.call(arguments) ;
   return arguments ;
 }

In non-IE implementations of JavaScript, it is possible to directly reassign the prototype object to the prototype for arrays:

 function f() {
   arguments.__proto__ = Array.prototype ;
   return arguments ;
 }

Avoiding truthiness

There is little truth to truth in JavaScript.

Many values qualify as false in a conditional:

false, 0, undefined, null, NaN and ''.

At first glance, it appears that == understands this, given that:

 0 == false

yields true.

Yet, null == false and '' == false are both false.

The operators == and != attempt coercion on operands of different types.

For example, ' \t\t ' == false, yet ' \t\t ' is true in a conditional.

In theory, it is better to use === and !==, which do not attempt coercion.

Yet, there is still a value x such that x != x and x !== x.

That value is NaN.

If true equality matters, use a helper function:

function equal(a,b) {
  if (a === b)
    return true ;
  if (isNaN(a) && isNaN(b))
    return true ;
  return false
}

Zach Allaun wrote to correct this definition. He pointed out that all of the following evaluate to true:

    equal("foo", "bar") // true
    equal({x:1}, {x:2}) // true
    equal({}, {})       // true

because isNaN returns true on all non-numeric values -- not just NaN.

The recommended fix is to define isActuallyNaN:

function isActuallyNaN(x) {
   return x !== x
}

which works, since NaN is the only value not truly equal to itself.

The correct code becomes:

function equal(a,b) {
  if (a === b)
    return true ;
  if (isActuallyNaN(a) && isActuallyNaN(b))
    return true ;
  return false
}

Alternatively, one could use isNaN, but guarded with typeof to check that a and b are both "number", since typeof NaN yields "number".

Prototype pollution

JavaScript's prototypical inheritence leads to subtle bugs when the prototypes for Object and Array are modified.

For instance, adding a field to Object.prototype is effectively adding it to all objects:

 Object.prototype.foo = 3 ;
 alert({}.foo) ; // Alerts 3!
 for (var k in {}) {
   alert(k) ; // Alerts 'foo'
 }

Similarly, adding a field to Array.prototype adds it to all arrays:

 Array.prototype.bar = 4 ;
 for (var k in [5,6,7]) {
   alert(k) ; // Alerts for 0, 1, 2 and "bar".
 }

Unfortunately, this breaks the lexical scope workaround with with:

 Object.prototype.foo = 3 ;
 var foo = 4 ;
 with ({}) { console.log(foo) ; } // Prints 3

More