I've been playing around with benchmark.js to do some performance testing in JavaScript. Its an easy-to-use library, but because its so easy, it hides a lot of its details behind the scenes. Mathias Bynens and John-David Dalton, the authors of benchmark.js, wrote a great overview of how benchmark.js works. But I wanted a better understanding of what's going on, so I took a deeper look at the code.

Here's an example of a performance test using benchmark.js. This particular test measures the performance of adding a span element to the page:

<html>
<body>

<a href="#" onclick="bench.run({async: true}); return false;">run test</a>

<div id="mydiv"></div>

<script src="https://raw.github.com/bestiejs/benchmark.js/v1.0.0/benchmark.js"></script>
<script>

var bench = new Benchmark('insertNode',

// The function to test
function() {
  mydiv.insertAdjacentHTML('beforeend', '<span></span>');
},

// Additional options for the test
{
  'setup': function() {
    var mydiv = document.getElementById('mydiv');
  },
  'teardown': function() {
    mydiv.innerHTML = '';
  }
});
</script>

</body>
</html>

The code above represents a single benchmark, which is the performance test and any associated code. The fundamental unit of work in a benchmark is the test function. This is the thing we are measuring. Traditional performance tests ask the user to specify how many times to run the test (for example, run this test 1000 times and tell me how long it takes). The magic of benchmark.js is that it automatically calculates the number of iterations. The user is responsible for writing the test, benchmark.js handles the rest.

Check out the weird scoping in this particular test: mydiv is declared in the setup function, but is referenced in the test function and the teardown function. This works because benchmark.js doesn't call those functions directly. Instead, it strips the JS code out of those functions and "compiles" it to this:

function(t1354904061091) {
  var r1354904061091,
      s1354904061091,
      m1354904061091=this,
      f1354904061091=m1354904061091.fn,
      i1354904061091=m1354904061091.count,
      n1354904061091=t1354904061091.ns;

  // From setup()
  var mydiv = document.getElementById('mydiv');

  s1354904061091=n1354904061091.now();
  while(i1354904061091--) {
    // From test function.
    mydiv.insertAdjacentHTML('beforeend', '<span></span>');
  }
  r1354904061091=(n1354904061091.now()-s1354904061091)/1e3;

  // From teardown()
  mydiv.innerHTML = '';

  return {elapsed: r1354904061091, uid: "uid1354904061091"};
}

This is the heart of a benchmark.js test. Don’t let the obfuscated variable names fool you; this code is very simple. It looks like a performance test one would write by hand: call the setup function, start the timer, run the test function a few times, measure the time, and finally call the teardown function. I call this chunk of code a unit. The setup, test, and teardown functions in a unit all share the same scope, and are run in the context of the benchmark itself (i.e. this === the benchmark instance).

Units form the building blocks of a cycle. There is almost a 1-to-1 mapping between a unit and a cycle, but a cycle actually consists of two units. The first unit runs the setup, test and teardown functions one time to see if the code throws any errors (this is a nice feature; we don't want to run the entire test only to discover an error in the teardown function). If there are no errors, the unit runs again, this time with multiple iterations. (I’m not sure why the error check is done once per cycle, it seems like something that could be done once per benchmark).

So how does benchmark.js actually determine how many times to run a test function? Benchmark.js aims to run a test as fast as possible without sacrificing accuracy. It finds this sweet spot by running a few cycles to get a sense of how long the test runs. I call this the analysis phase. It starts by dipping its toes in the water with a few iterations, and then continues to increase the number of iterations until it reaches a percent uncertainty of at most 1% (or a min time or max time is reached, if specified by the user).

A note about the timer used to run the test. The code above measures time by calling the obscure n1354904061091.now(). The actual timer could be any one of the following:

Benchmark.js chooses the timer with the finest resolution available on the platform. It takes the timer's resolution into account when calculating the number of iterations. So while using JavaScript's Date object (with a resolution of 15ms) is not ideal, benchmark.js compensates by running the tests for more iterations.

Once a number of iterations is calculated, benchmark.js enters what I call the sampling phase. This phase runs the tests and stores the results of each cycle in the Benchmark.prototype.stats.sample array. The number of iterations is stored in Benchmark.prototype.count. This is the number of iterations per cycle, not the total number of iterations across all cycles. The count is not necessarily fixed; it may increase if benchmark.js determines it needs more iterations to get an accurate reading.

One point of confusion is the Benchmark.prototype.cycles property. It sounds like it should store the total number of cycles run by the benchmark. But instead it stores the number of cycles run during the analysis phase. If you’d like the number of cycles run during the sampling phase, look at the length of the Benchmark.prototype.stats.sample array.

As the test runs, the Benchmark.prototype.stats object stores the results. The stats are calculated and updated after each cycle, so they will exist even if the benchmark is aborted before finishing. There are a few places where results are stored:

A benchmark also emits various events during the course of a test:

  • onStart - Called once, before the entire benchmark starts.
  • onCycle - Called after each cycle completes. Fires during both the analysis and sampling phases.
  • onComplete - Called once, after the entire benchmark completes.
  • onError - Called if the JS code has an error.
  • onAbort - Called if the test is aborted.
  • onReset - Reset properties (and abort if running).

Finally, benchmarks can also be organized into a suite. A suite is a collection of benchmarks, and is useful for grouping benchmarks. A suite has methods to operate over its benchmarks (such as forEach), as well as an analogous set of events that operate at the suite level (for example, onCycle is called after each benchmark completes).

I hope this gives a good introduction to what happens when runnning a benchmark.js test. To recap, here's an outline of how a test is executed:

  • For each benchmark in a suite:
    • Fire event: Suite.onStart()
    • For each benchmark:
      • Fire event: Benchmark.onStart()
      • For each sampled run
        • Run unit once to check for errors
          • setup()
          • testfn()
          • teardown()
        • Run unit multiple times and measure results
          • setup()
          • for each Benchmark.count
            • testfn()
          • teardown()
        • Fire event: Benchmark.onCycle()
      • Fire event: Benchmark.onComplete()
      • Fire event: Suite.onCycle()
    • Fire event: Suite.onComplete()