Chapter 15: Code Generator
Terms defined: byte code, code coverage (in testing), compiler, Decorator pattern, macro, nested function, two hard problems in computer science
We’ve been writing tests since Chapter 4, but how much of our code do they actually check? One way to find out is to use a code coverage tool like Istanbul that watches a program while it executes and keeps track of which lines have run and which haven’t. Making sure that each line is tested at least once doesn’t guarantee that the code is bug-free, but any code that isn’t run shouldn’t be trusted.
Our code coverage tool will keep track of which functions have and haven’t been called. Rather than rewriting Node to keep track of this for us, we will modify the functions themselves by parsing the code with Acorn, inserting the instructions we need into the AST, and then turning the AST back into code.
Simple usually isn’t
At first glance it would be a lot simpler to use regular expressions to find every line that looks like the start of a function definition and insert a line right after each one to record the information we want. Of course, some people split function headers across several lines if they have lots of parameters, and there might be things that look like function definitions embedded in comments or strings. It doesn’t take long before our simple solution turns into a poorly-implemented parser for a subset of JavaScript that no-one else understands. Using a full-blown parser and working with the AST is almost always less work.
Section 15.2: How can we replace a function with another function?
The first thing we need is a way to wrap up an arbitrary function call.
If we declare a function in JavaScript with a parameter like ...args
,
all of the “extra” arguments in the call that don’t line up with regular parameters
are stuffed into the variable args
(Figure 15.1).
We can also call a function by putting values in a variable
and using func(...var)
to spread those values out.
There’s nothing special about the names args
and vars
:
what matters is the ellipsis ...
We can use ...args
to capture all of the arguments to a function call
and forward them to another function.
Let’s start by creating functions with a varying number of parameters
that run to completion or throw an exception,
then run them to make sure they do what we want:
let zero = () => console.log('zero')
let one = (first) => console.log(`one(${first})`)
let two = (first, second) => console.log(`two(${first}, ${second})`)
let error = () => {
console.log('error')
throw new Error('from error')
console.log('should not reach this')
}
const runAll = (title) => {
console.log(title)
zero()
one(1)
two(1, 2)
try {
error()
} catch (error) {
console.log(`caught ${error} as expected`)
}
console.log()
}
runAll('first time')
We can now write a function that takes a function as an input and creates a new function that handles all of the errors in the original function:
const replace = (func) => {
return (...args) => {
console.log('before')
try {
const result = func(...args)
console.log('after')
return result
} catch (error) {
console.log('error')
throw error
}
}
}
zero = replace(zero)
one = replace(one)
two = replace(two)
error = replace(error)
runAll('second time')
Let’s try it out:
first time
zero
one(1)
two(1, 2)
error
caught Error: from error as expected
second time
before
zero
after
before
one(1)
after
before
two(1, 2)
after
before
error
error
caught Error: from error as expected
This is an example of the Decorator design pattern. A decorator is a function whose job is to modify the behavior of other functions in some general ways. Decorators are built in to some languages (like Python), and we can add them in most others as we have done here.
Section 15.3: How can we generate JavaScript?
We could use a decorator to replace every function in our program with one that keeps track of whether or not it was called, but it would be tedious to apply the decorator to every one of our functions by hand. What we really want is a way to do this automatically for everything, and for that we need to parse and generate code.
Other ways to do it
A third way to achieve what we want is to let the system turn code into runnable instructions and then modify those instructions. This approach is often used in compiled languages like Java, where the byte code produced by the compiler is saved in files in order to be run. We can’t do this here because Node compiles and runs code in a single step.
Our tool will parse the JavaScript with Acorn to create an AST, modify the AST, and then use a library called Escodegen to turn the AST back into JavaScript. To start, let’s look at the AST for a simple function definition, which is 75 lines of pretty-printed JSON:
import acorn from 'acorn'
const text = `const func = (param) => {
return param + 1
}`
const ast = acorn.parse(text, { sourceType: 'module' })
console.log(JSON.stringify(ast, null, 2))
{
"type": "Program",
"start": 0,
"end": 46,
"body": [
{
"type": "VariableDeclaration",
"start": 0,
"end": 46,
"declarations": [
{
"type": "VariableDeclarator",
"start": 6,
"end": 46,
"id": {
"type": "Identifier",
"start": 6,
"end": 10,
"name": "func"
},
"init": {
"type": "ArrowFunctionExpression",
"start": 13,
"end": 46,
"id": null,
"expression": false,
"generator": false,
"async": false,
"params": [
{
"type": "Identifier",
"start": 14,
"end": 19,
"name": "param"
}
],
"body": {
"type": "BlockStatement",
"start": 24,
"end": 46,
"body": [
{
"type": "ReturnStatement",
"start": 28,
"end": 44,
"argument": {
"type": "BinaryExpression",
"start": 35,
"end": 44,
"left": {
"type": "Identifier",
"start": 35,
"end": 40,
"name": "param"
},
"operator": "+",
"right": {
"type": "Literal",
"start": 43,
"end": 44,
"value": 1,
"raw": "1"
}
}
}
]
}
}
}
],
"kind": "const"
}
],
"sourceType": "module"
}
After inspecting a few nodes,
we can create some of our own and turn them into code.
Here,
for example,
we have the JSON representation of the expression 40+2
:
import escodegen from 'escodegen'
const result = escodegen.generate({
type: 'BinaryExpression',
operator: '+',
left: { type: 'Literal', value: 40 },
right: { type: 'Literal', value: 2 }
})
console.log(result)
40 + 2
Section 15.4: How can we count how often functions are executed?
Our tool will find all the function declaration nodes in the program
and insert a node to increment an entry in a global variable called __counters
.
(Prefixing the name with two underscores doesn’t guarantee that
we won’t accidentally clobber a variable in the user’s program with the same name,
but hopefully it makes that less likely.)
Our test case is:
const TEXT = `
const funcOuter = (param) => {
return param + 1
}
const funcInner = (param) => {
return param + 1
}
for (const i of [1, 3, 5]) {
funcOuter(funcInner(i) + funcInner(i))
}
`
and the main function of our program is:
const main = () => {
const ast = acorn.parse(TEXT, { sourceType: 'module' })
const allNodes = []
walk.simple(ast, {
VariableDeclarator: (node, state) => {
if (node.init && (node.init.type === 'ArrowFunctionExpression')) {
state.push(node)
}
}
}, null, allNodes)
const names = {}
allNodes.forEach(node => insertCounter(names, node))
console.log(initializeCounters(names))
console.log(escodegen.generate(ast))
console.log(reportCounters())
}
To insert a count we call insertCounter
to record the function’s name and modify the node:
const insertCounter = (names, node) => {
const name = node.id.name
names[name] = 0
const body = node.init.body.body
const increment =
acorn.parse(`__counters['${name}'] += 1`, { sourceType: 'module' })
body.unshift(increment)
}
Notice how we don’t try to build the nodes by hand, but instead construct the string we need, use Acorn to parse that, and use the result. Doing this saves us from embedding multiple lines of JSON in our program and also ensures that if a newer version of Acorn decides to generate a different AST, our program will do the right thing automatically.
Finally, we need to add a couple of helper functions:
const initializeCounters = (names) => {
const body = Object.keys(names).map(n => `'${n}': 0`).join(',\n')
return 'const __counters = {\n' + body + '\n}'
}
const reportCounters = () => {
return 'console.log(__counters)'
}
and run it to make sure it all works:
const __counters = {
'funcOuter': 0,
'funcInner': 0
}
const funcOuter = param => {
__counters['funcOuter'] += 1;
return param + 1;
};
const funcInner = param => {
__counters['funcInner'] += 1;
return param + 1;
};
for (const i of [
1,
3,
5
]) {
funcOuter(funcInner(i) + funcInner(i));
}
console.log(__counters)
Too simple to be safe
Our simple approach to naming counters doesn’t work if functions can have the same names, which they can if we use modules or nested functions. One solution would be to manufacture a label from the function’s name and the line number in the source code; another would be to keep track of which functions are nested within which and concatenate their names to produce a unique key. Problems like this are why people say that naming things is one of the two hard problems in computer science.
Section 15.5: How can we time function execution?
Now that we have a way to insert code into functions we can use it to do many other things. For example, we can find out how long it takes functions to run by wrapping them up in code that records the start and end time of each call. As before, we find the nodes of interest and decorate them, then stitch the result together with a bit of bookkeeping:
const timeFunc = (text) => {
const ast = acorn.parse(text, { sourceType: 'module' })
const allNodes = gatherNodes(ast)
allNodes.forEach(node => wrapFuncDef(node))
return [
initializeCounters(allNodes),
escodegen.generate(ast),
reportCounters()
].join('\n')
}
Gathering nodes is straightforward:
const gatherNodes = (ast) => {
const allNodes = []
walk.simple(ast, {
VariableDeclarator: (node, state) => {
if (node.init && (node.init.type === 'ArrowFunctionExpression')) {
state.push(node)
}
}
}, null, allNodes)
return allNodes
}
as is wrapping the function definition:
const wrapFuncDef = (originalAst) => {
const name = originalAst.id.name
const wrapperAst = makeWrapperAst(name)
wrapperAst.init.body.body[0].declarations[0].init = originalAst.init
originalAst.init = wrapperAst.init
}
The only big difference is how we make the wrapper function. We create it with a placeholder for the original function so that we have a spot in the AST to insert the actual code:
const timeFunc = (text) => {
const ast = acorn.parse(text, { sourceType: 'module' })
const allNodes = gatherNodes(ast)
allNodes.forEach(node => wrapFuncDef(node))
return [
initializeCounters(allNodes),
escodegen.generate(ast),
reportCounters()
].join('\n')
}
Let’s run one last test:
const __counters = {
'assignment': 0,
'readFile': 0
}
const assignment = (...originalArgs) => {
const originalFunc = range => {
let j = 0;
for (let i = 0; i < range; i += 1) {
j = i;
}
};
const startTime = Date.now();
try {
const result = originalFunc(...originalArgs);
const endTime = Date.now();
__counters['assignment'] += endTime - startTime;
return result;
} catch (error) {
const endTime = Date.now();
__counters['assignment'] += endTime - startTime;
throw error;
}
};
const readFile = (...originalArgs) => {
const originalFunc = (range, filename) => {
for (let i = 0; i < range; i += 1) {
fs.readFileSync(filename, 'utf-8');
}
};
const startTime = Date.now();
try {
const result = originalFunc(...originalArgs);
const endTime = Date.now();
__counters['readFile'] += endTime - startTime;
return result;
} catch (error) {
const endTime = Date.now();
__counters['readFile'] += endTime - startTime;
throw error;
}
};
const numLoops = 100000;
assignment(numLoops);
readFile(numLoops, 'index.md');
console.log(__counters)
OUTPUT
{ assignment: 1, readFile: 3879 }
Source-to-source translation is widely used in JavaScript:
tools like Babel use it to transform modern features like async
and await
(Chapter 3)
into code that older browsers can understand.
The technique is so powerful that it is built into languages like Scheme,
which allow programmers to add new syntax to the language
by defining macros.
Depending on how carefully they are used,
macros can make programs elegant, incomprehensible, or both.
Section 15.6: Exercises
JSON to JavaScript
Write a tool that uses Escodegen to translate simple expressions written in JSON into runnable JavaScript. For example, the tool should translate:
['+', 3, ['*', 5, 'a']]
into:
3 + (5 * a)
JavaScript to HTML
Write a function that takes nested JavaScript function calls for generating HTML like this:
div(h1('title'), p('explanation'))
and turns them into HTML like this:
<div><h1>title</h1><p>explanation</p></div>
Handling modules
Modify the code that counts the number of times a function is called to handle functions with the same name from different modules.
Tracking calls
Write a decorator that takes a function as its argument and returns a new function that behaves exactly the same way except that it keeps track of who called it.
-
The program contains a stack where decorated functions push and pop their names as they are called and as they exit.
-
Each time a function is called it adds a record to an array to record its name and the name at the top of the stack (i.e., the most-recently-called decorated function).
Counting classical function definitions
Modify the code generator to handle functions declared with the function
keyword
as well as functions declared using =>
.
Recording input file size
-
Write a program that replaces all calls to
fs.readFileSync
with calls toreadFileSyncCount
. -
Write the function
readFileSyncCount
to read and return a file usingfs.readFileSync
but to also record the file’s name and size in bytes. -
Write a third function
reportInputFileSizes
that reports what files were read and how large they were. -
Write tests for these functions using Mocha and
mock-fs
.
Checking argument types
Write a tool that modifies functions to check the types of their arguments at run-time.
-
Each function is replaced by a function that passes all of its arguments to
checkArgs
along with the function’s name, then continues with the function’s original operation. -
The first time
checkArgs
is called for a particular function it records the actual types of the arguments. -
On subsequent calls, it checks that the argument types match those of the first call and throws an exception if they do not.
Two-dimensional arrays
The function make2D
takes a row length and one or more values
and creates a two-dimensional array from those values:
make2D(2, 'a', 'b', 'c', 'd')
// produces [['a', 'b'], ['c', 'd']]
Write a function that searches code to find calls to make2D
and replaces them with inline arrays-of-arrays.
This function only has to work for calls with a fixed row length,
i.e., it does not have to handle make2D(N, 'a', 'b')
.
From require to import
Write a function that searches code for simple calls to require
and replaces them with calls to import
.
This function only needs to work for the simplest case;
for example, if the input is:
const name = require('module')
then the output is:
import name from 'module'
Removing empty constructors
Write a function that removes empty constructors from class definitions. For example, if the input is:
class Example {
constructor () {
}
someMethod () {
console.log('some method')
}
}
then the output should be:
class Example {
someMethod () {
console.log('some method')
}
}