Chapter 15: Code Generator

Terms defined: byte code, code coverage (in testing), compiler, Decorator pattern, macro, nested function, two hard problems in computer science

We’ve been writing tests since Chapter 4, but how much of our code do they actually check? One way to find out is to use a code coverage tool like Istanbul that watches a program while it executes and keeps track of which lines have run and which haven’t. Making sure that each line is tested at least once doesn’t guarantee that the code is bug-free, but any code that isn’t run shouldn’t be trusted.

Our code coverage tool will keep track of which functions have and haven’t been called. Rather than rewriting Node to keep track of this for us, we will modify the functions themselves by parsing the code with Acorn, inserting the instructions we need into the AST, and then turning the AST back into code.

Simple usually isn’t

At first glance it would be a lot simpler to use regular expressions to find every line that looks like the start of a function definition and insert a line right after each one to record the information we want. Of course, some people split function headers across several lines if they have lots of parameters, and there might be things that look like function definitions embedded in comments or strings. It doesn’t take long before our simple solution turns into a poorly-implemented parser for a subset of JavaScript that no-one else understands. Using a full-blown parser and working with the AST is almost always less work.

Section 15.2: How can we replace a function with another function?

The first thing we need is a way to wrap up an arbitrary function call. If we declare a function in JavaScript with a parameter like ...args, all of the “extra” arguments in the call that don’t line up with regular parameters are stuffed into the variable args (Figure 15.1). We can also call a function by putting values in a variable and using func(...var) to spread those values out. There’s nothing special about the names args and vars: what matters is the ellipsis ...

Spreading parameters
Figure 15.1: Using …args to capture and spread parameters.

We can use ...args to capture all of the arguments to a function call and forward them to another function. Let’s start by creating functions with a varying number of parameters that run to completion or throw an exception, then run them to make sure they do what we want:

let zero = () => console.log('zero')
let one = (first) => console.log(`one(${first})`)
let two = (first, second) => console.log(`two(${first}, ${second})`)

let error = () => {
  console.log('error')
  throw new Error('from error')
  console.log('should not reach this')
}

const runAll = (title) => {
  console.log(title)
  zero()
  one(1)
  two(1, 2)
  try {
    error()
  } catch (error) {
    console.log(`caught ${error} as expected`)
  }
  console.log()
}

runAll('first time')

We can now write a function that takes a function as an input and creates a new function that handles all of the errors in the original function:

const replace = (func) => {
  return (...args) => {
    console.log('before')
    try {
      const result = func(...args)
      console.log('after')
      return result
    } catch (error) {
      console.log('error')
      throw error
    }
  }
}

zero = replace(zero)
one = replace(one)
two = replace(two)
error = replace(error)

runAll('second time')

Let’s try it out:

first time
zero
one(1)
two(1, 2)
error
caught Error: from error as expected

second time
before
zero
after
before
one(1)
after
before
two(1, 2)
after
before
error
error
caught Error: from error as expected

This is an example of the Decorator design pattern. A decorator is a function whose job is to modify the behavior of other functions in some general ways. Decorators are built in to some languages (like Python), and we can add them in most others as we have done here.

Section 15.3: How can we generate JavaScript?

We could use a decorator to replace every function in our program with one that keeps track of whether or not it was called, but it would be tedious to apply the decorator to every one of our functions by hand. What we really want is a way to do this automatically for everything, and for that we need to parse and generate code.

Other ways to do it

A third way to achieve what we want is to let the system turn code into runnable instructions and then modify those instructions. This approach is often used in compiled languages like Java, where the byte code produced by the compiler is saved in files in order to be run. We can’t do this here because Node compiles and runs code in a single step.

Our tool will parse the JavaScript with Acorn to create an AST, modify the AST, and then use a library called Escodegen to turn the AST back into JavaScript. To start, let’s look at the AST for a simple function definition, which is 75 lines of pretty-printed JSON:

import acorn from 'acorn'

const text = `const func = (param) => {
  return param + 1
}`

const ast = acorn.parse(text, { sourceType: 'module' })
console.log(JSON.stringify(ast, null, 2))
{
  "type": "Program",
  "start": 0,
  "end": 46,
  "body": [
    {
      "type": "VariableDeclaration",
      "start": 0,
      "end": 46,
      "declarations": [
        {
          "type": "VariableDeclarator",
          "start": 6,
          "end": 46,
          "id": {
            "type": "Identifier",
            "start": 6,
            "end": 10,
            "name": "func"
          },
          "init": {
            "type": "ArrowFunctionExpression",
            "start": 13,
            "end": 46,
            "id": null,
            "expression": false,
            "generator": false,
            "async": false,
            "params": [
              {
                "type": "Identifier",
                "start": 14,
                "end": 19,
                "name": "param"
              }
            ],
            "body": {
              "type": "BlockStatement",
              "start": 24,
              "end": 46,
              "body": [
                {
                  "type": "ReturnStatement",
                  "start": 28,
                  "end": 44,
                  "argument": {
                    "type": "BinaryExpression",
                    "start": 35,
                    "end": 44,
                    "left": {
                      "type": "Identifier",
                      "start": 35,
                      "end": 40,
                      "name": "param"
                    },
                    "operator": "+",
                    "right": {
                      "type": "Literal",
                      "start": 43,
                      "end": 44,
                      "value": 1,
                      "raw": "1"
                    }
                  }
                }
              ]
            }
          }
        }
      ],
      "kind": "const"
    }
  ],
  "sourceType": "module"
}

After inspecting a few nodes, we can create some of our own and turn them into code. Here, for example, we have the JSON representation of the expression 40+2:

import escodegen from 'escodegen'
const result = escodegen.generate({
  type: 'BinaryExpression',
  operator: '+',
  left: { type: 'Literal', value: 40 },
  right: { type: 'Literal', value: 2 }
})
console.log(result)
40 + 2

Section 15.4: How can we count how often functions are executed?

Our tool will find all the function declaration nodes in the program and insert a node to increment an entry in a global variable called __counters. (Prefixing the name with two underscores doesn’t guarantee that we won’t accidentally clobber a variable in the user’s program with the same name, but hopefully it makes that less likely.) Our test case is:

const TEXT = `
const funcOuter = (param) => {
  return param + 1
}
const funcInner = (param) => {
  return param + 1
}
for (const i of [1, 3, 5]) {
  funcOuter(funcInner(i) + funcInner(i))
}
`

and the main function of our program is:

const main = () => {
  const ast = acorn.parse(TEXT, { sourceType: 'module' })

  const allNodes = []
  walk.simple(ast, {
    VariableDeclarator: (node, state) => {
      if (node.init && (node.init.type === 'ArrowFunctionExpression')) {
        state.push(node)
      }
    }
  }, null, allNodes)

  const names = {}
  allNodes.forEach(node => insertCounter(names, node))
  console.log(initializeCounters(names))
  console.log(escodegen.generate(ast))
  console.log(reportCounters())
}

To insert a count we call insertCounter to record the function’s name and modify the node:

const insertCounter = (names, node) => {
  const name = node.id.name
  names[name] = 0

  const body = node.init.body.body
  const increment =
    acorn.parse(`__counters['${name}'] += 1`, { sourceType: 'module' })
  body.unshift(increment)
}

Notice how we don’t try to build the nodes by hand, but instead construct the string we need, use Acorn to parse that, and use the result. Doing this saves us from embedding multiple lines of JSON in our program and also ensures that if a newer version of Acorn decides to generate a different AST, our program will do the right thing automatically.

Finally, we need to add a couple of helper functions:

const initializeCounters = (names) => {
  const body = Object.keys(names).map(n => `'${n}': 0`).join(',\n')
  return 'const __counters = {\n' + body + '\n}'
}

const reportCounters = () => {
  return 'console.log(__counters)'
}

and run it to make sure it all works:

const __counters = {
'funcOuter': 0,
'funcInner': 0
}
const funcOuter = param => {
        __counters['funcOuter'] += 1;
    return param + 1;
};
const funcInner = param => {
        __counters['funcInner'] += 1;
    return param + 1;
};
for (const i of [
        1,
        3,
        5
    ]) {
    funcOuter(funcInner(i) + funcInner(i));
}
console.log(__counters)

Too simple to be safe

Our simple approach to naming counters doesn’t work if functions can have the same names, which they can if we use modules or nested functions. One solution would be to manufacture a label from the function’s name and the line number in the source code; another would be to keep track of which functions are nested within which and concatenate their names to produce a unique key. Problems like this are why people say that naming things is one of the two hard problems in computer science.

Section 15.5: How can we time function execution?

Now that we have a way to insert code into functions we can use it to do many other things. For example, we can find out how long it takes functions to run by wrapping them up in code that records the start and end time of each call. As before, we find the nodes of interest and decorate them, then stitch the result together with a bit of bookkeeping:

const timeFunc = (text) => {
  const ast = acorn.parse(text, { sourceType: 'module' })
  const allNodes = gatherNodes(ast)
  allNodes.forEach(node => wrapFuncDef(node))
  return [
    initializeCounters(allNodes),
    escodegen.generate(ast),
    reportCounters()
  ].join('\n')
}

Gathering nodes is straightforward:

const gatherNodes = (ast) => {
  const allNodes = []
  walk.simple(ast, {
    VariableDeclarator: (node, state) => {
      if (node.init && (node.init.type === 'ArrowFunctionExpression')) {
        state.push(node)
      }
    }
  }, null, allNodes)
  return allNodes
}

as is wrapping the function definition:

const wrapFuncDef = (originalAst) => {
  const name = originalAst.id.name
  const wrapperAst = makeWrapperAst(name)
  wrapperAst.init.body.body[0].declarations[0].init = originalAst.init
  originalAst.init = wrapperAst.init
}

The only big difference is how we make the wrapper function. We create it with a placeholder for the original function so that we have a spot in the AST to insert the actual code:

const timeFunc = (text) => {
  const ast = acorn.parse(text, { sourceType: 'module' })
  const allNodes = gatherNodes(ast)
  allNodes.forEach(node => wrapFuncDef(node))
  return [
    initializeCounters(allNodes),
    escodegen.generate(ast),
    reportCounters()
  ].join('\n')
}

Let’s run one last test:

const __counters = {
'assignment': 0,
'readFile': 0
}
const assignment = (...originalArgs) => {
    const originalFunc = range => {
        let j = 0;
        for (let i = 0; i < range; i += 1) {
            j = i;
        }
    };
    const startTime = Date.now();
    try {
        const result = originalFunc(...originalArgs);
        const endTime = Date.now();
        __counters['assignment'] += endTime - startTime;
        return result;
    } catch (error) {
        const endTime = Date.now();
        __counters['assignment'] += endTime - startTime;
        throw error;
    }
};
const readFile = (...originalArgs) => {
    const originalFunc = (range, filename) => {
        for (let i = 0; i < range; i += 1) {
            fs.readFileSync(filename, 'utf-8');
        }
    };
    const startTime = Date.now();
    try {
        const result = originalFunc(...originalArgs);
        const endTime = Date.now();
        __counters['readFile'] += endTime - startTime;
        return result;
    } catch (error) {
        const endTime = Date.now();
        __counters['readFile'] += endTime - startTime;
        throw error;
    }
};
const numLoops = 100000;
assignment(numLoops);
readFile(numLoops, 'index.md');
console.log(__counters)
OUTPUT
{ assignment: 1, readFile: 3879 }

Source-to-source translation is widely used in JavaScript: tools like Babel use it to transform modern features like async and await (Chapter 3) into code that older browsers can understand. The technique is so powerful that it is built into languages like Scheme, which allow programmers to add new syntax to the language by defining macros. Depending on how carefully they are used, macros can make programs elegant, incomprehensible, or both.

Section 15.6: Exercises

JSON to JavaScript

Write a tool that uses Escodegen to translate simple expressions written in JSON into runnable JavaScript. For example, the tool should translate:

['+', 3, ['*', 5, 'a']]

into:

3 + (5 * a)

JavaScript to HTML

Write a function that takes nested JavaScript function calls for generating HTML like this:

div(h1('title'), p('explanation'))

and turns them into HTML like this:

<div><h1>title</h1><p>explanation</p></div>

Handling modules

Modify the code that counts the number of times a function is called to handle functions with the same name from different modules.

Tracking calls

Write a decorator that takes a function as its argument and returns a new function that behaves exactly the same way except that it keeps track of who called it.

  1. The program contains a stack where decorated functions push and pop their names as they are called and as they exit.

  2. Each time a function is called it adds a record to an array to record its name and the name at the top of the stack (i.e., the most-recently-called decorated function).

Counting classical function definitions

Modify the code generator to handle functions declared with the function keyword as well as functions declared using =>.

Recording input file size

  1. Write a program that replaces all calls to fs.readFileSync with calls to readFileSyncCount.

  2. Write the function readFileSyncCount to read and return a file using fs.readFileSync but to also record the file’s name and size in bytes.

  3. Write a third function reportInputFileSizes that reports what files were read and how large they were.

  4. Write tests for these functions using Mocha and mock-fs.

Checking argument types

Write a tool that modifies functions to check the types of their arguments at run-time.

  1. Each function is replaced by a function that passes all of its arguments to checkArgs along with the function’s name, then continues with the function’s original operation.

  2. The first time checkArgs is called for a particular function it records the actual types of the arguments.

  3. On subsequent calls, it checks that the argument types match those of the first call and throws an exception if they do not.

Two-dimensional arrays

The function make2D takes a row length and one or more values and creates a two-dimensional array from those values:

make2D(2, 'a', 'b', 'c', 'd')
// produces [['a', 'b'], ['c', 'd']]

Write a function that searches code to find calls to make2D and replaces them with inline arrays-of-arrays. This function only has to work for calls with a fixed row length, i.e., it does not have to handle make2D(N, 'a', 'b').

From require to import

Write a function that searches code for simple calls to require and replaces them with calls to import. This function only needs to work for the simplest case; for example, if the input is:

const name = require('module')

then the output is:

import name from 'module'

Removing empty constructors

Write a function that removes empty constructors from class definitions. For example, if the input is:

class Example {
  constructor () {
  }

  someMethod () {
    console.log('some method')
  }
}

then the output should be:

class Example {
  someMethod () {
    console.log('some method')
  }
}