Chapter 14: Style Checker
Terms defined: abstract syntax tree, Adapter pattern, column-major storage, dynamic lookup, generator function, intrinsic complexity, Iterator pattern, linter, Markdown, row-major storage, walk (a tree)
Programmers argue endlessly about the best way to format their programs,
but everyone agrees that the most important thing is to be consistent
[Binkley2012, Johnson2019].
Since checking rules by hand is tedious,
most programmers use tools to compare code against various rules and report any violations.
Programs that do this are often called linters
in honor of an early one for C named lint
(because it looked for fluff in source code).
In this chapter we will build a simple linter of our own inspired by ESLint, which we use to check the code in this book. Our tool will parse source code to create a data structure, then go through that data structure and apply rules for each part of the program. It will also introduce us to one of the key ideas of this book, which is that source code is just another kind of data.
Don’t define your own style
Just as the world doesn’t need more file format (Chapter 8) it also doesn’t need more programming styles, or more arguments among programmers about whether there should be spaces before curly braces or not. Standard JS may not do everything exactly the way you want, but adopting it increases the odds that other programmers will be able to read your code at first glance.
Section 14.2: How can we parse JavaScript to create an AST?
A parser for a simple language like arithmetic or JSON is relatively easy to write. A parser for a language as complex as JavaScript is much more work, so we will use one called Acorn instead. Acorn takes a string containing source code as input and produces an abstract syntax tree (AST) whose nodes store information about what’s in the program (Figure 14.1). An AST is for a program what the DOM is for HTML: an in-memory representation that is easy for software to inspect and manipulate.
ASTs can be quite complex—for example, the JSON representation of the AST for a single constant declaration is 84 lines long:
import acorn from 'acorn'
const ast = acorn.parse('const x = 0', { locations: true })
console.log(JSON.stringify(ast, null, 2))
{
"type": "Program",
"start": 0,
"end": 11,
"loc": {
"start": {
"line": 1,
"column": 0
},
"end": {
...
"value": 0,
"raw": "0"
}
}
],
"kind": "const"
}
],
"sourceType": "script"
}
Acorn’s output is in Esprima format (so-called because it was originally defined by a tool with that name). The format’s specification is very detailed, but we can usually figure out most of what we need by inspection. For example, here is the output for a 15-line program:
import acorn from 'acorn'
const program = `const value = 2
const double = (x) => {
const y = 2 * x
return y
}
const result = double(value)
console.log(result)
`
const ast = acorn.parse(program, { locations: true })
console.log(JSON.stringify(ast, null, 2))
{
"type": "Program",
"start": 0,
"end": 122,
"loc": {
"start": {
"line": 1,
"column": 0
},
"end": {
...
"line": 1,
"column": 0
},
"end": {
"line": 1,
"column": 15
}
},
"declarations": [
...480 more lines...
Yes, it really is almost 500 lines long…
Section 14.3: How can we find things in an AST?
If we want to find functions, variables, or anything else in an AST
we need to walk the tree,
i.e.,
to visit each node in turn.
The acorn-walk
library will do this for us
using the Visitor design pattern we first saw in Chapter 9.
If we provide a function to act on nodes of type Identifier
,
acorn-walk
will call that function each time it finds an identifier.
We can use other options to say that we want to record the locations of nodes (i.e., their line numbers)
and to collect comments in an array called onComment
.
Our function can do whatever we want;
for demonstration purposes we will add nodes to an array called state
and report them all at the end
(Figure 14.2).
import acorn from 'acorn'
import walk from 'acorn-walk'
const program = `// Constant
const value = 2
// Function
const double = (x) => {
const y = 2 * x
return y
}
// Main body
const result = double(value)
console.log(result)
`
const options = {
locations: true,
onComment: []
}
const ast = acorn.parse(program, options)
const state = []
walk.simple(ast, {
Identifier: (node, state) => {
state.push(node)
}
}, null, state)
state.forEach(node => console.log(
`identifier ${node.name} on line ${node.loc.start.line}`
))
const comments = options.onComment.map(
node => node.loc.start.line
).join(', ')
console.log(`comments on lines ${comments}`)
identifier x on line 6
identifier y on line 7
identifier double on line 11
identifier value on line 11
identifier console on line 12
identifier result on line 12
comments on lines 1, 4, 10
There’s more than one way to do it
walk.simple
takes four arguments:
-
The root node of the AST, which is used as the starting point.
-
An object containing callback functions for handling various kinds of nodes.
-
Another object that specifies what algorithm to use—we have set this to
null
to use the default because we don’t particularly care about the order in which the nodes are processed. -
Something we want passed in to each of the node handlers, which in our case is the
state
array. If our node handling functions don’t require any extra data from one call to the next we can leave this out; if we want to accumulate information across calls, this argument acts as the Visitor’s memory.
Any general-purpose implementation of the Visitor pattern is going to need these four things, but as we will see below, we can implement them in different ways.
Section 14.4: How can we apply checks?
We don’t just want to collect nodes:
we want to check their properties against a set of rules.
One way to do this would be to call walk.simple
once for each rule,
passing it a function that checks just that rule.
Another way—the one we’ll use—is to write a generic function
that checks a rule and records any nodes that don’t satisfy it,
and then call that function once for each rule inside our Identifier
handler.
This may seem like extra work,
but it ensures that all of our rule-checkers store their results in the same way,
which in turn means that we can write one reporting function
and be sure it will handle everything.
The function applyCheck
takes the current state (where we are accumulating rule violations),
a label that identifies this rule (so that violations of it can be stored together),
the node,
and a logical value telling it whether the node passed the test or not.
If the node failed the test
we make sure that state
contains a list with the appropriate label
and then append this node to it.
This “create storage space on demand” pattern
is widely used but doesn’t have a well-known name.
const applyCheck = (state, label, node, passes) => {
if (!passes) {
if (!(label in state)) {
state[label] = []
}
state[label].push(node)
}
}
We can now put a call to applyCheck
inside the handler for Identifier
:
const ast = acorn.parse(program, { locations: true })
const state = {}
walk.simple(ast, {
Identifier: (node, state) => {
applyCheck(state, 'name_length', node, node.name.length >= 4)
}
}, null, state)
state.name_length.forEach(
node => console.log(`${node.name} at line ${node.loc.start.line}`))
We can’t just use applyCheck
as the handler for Identifier
because walk.simple
wouldn’t know how to call it.
This is a (very simple) example of the Adapter design pattern:
we write a function or class to connect the code we want to call
to the already-written code that is going to call it.
The output for the same sample program as before is:
x at line 6
y at line 7
The exercises will ask why the parameter x
doesn’t show up
as a violation of our rule
that variables’ names must be at least four characters long.
Section 14.5: How does the AST walker work?
The AST walker uses the Visitor pattern, but how does it actually work? We can build our own by defining a class with methods that walk the tree, take action depending on the kind of node, and then go through the children of that node (if any). The user can then derive a class of their own from this and override the set of action methods they’re interested in.
One key difference between our implementation and acorn-walk
‘s is that
our methods don’t need to take state
as a parameter
because it’s contained in the object that they’re part of.
That simplifies the methods—one less parameter—but it does mean that
anyone who wants to use our visitor has to derive a class,
which is a bit more complicated than writing a function.
This tradeoff is a sign that managing state is part of the problem’s
intrinsic complexity:
we can move it around,
but we can’t get rid of it.
The other difference between our visitor and acorn-walk
is that
our class uses dynamic lookup
(a form of introspection)
to look up a method
with the same name as the node type in the object.
While we normally refer to a particular method of an object using object.method
,
we can also look them up by asking for object[name]
in the same way that we would look up any other property of any other object.
Our completed class looks like this:
class Walker {
// Construct a new AST tree walker.
constructor (ast) {
this.ast = ast
}
// Walk the tree.
walk (accumulator) {
this.stack = []
this._walk(this.ast, accumulator)
return accumulator
}
// Act on node and then on children.
_walk (node, accumulator) {
if (node && (typeof node === 'object') && ('type' in node)) {
this._doNode(node, accumulator)
this._doChildren(node, accumulator)
}
}
// Handle a single node by lookup.
_doNode (node, accumulator) {
if (node.type in this) {
this[node.type](node, accumulator)
}
}
// Recurse for anything interesting within the node.
_doChildren (node, accumulator) {
this.stack.push(node)
for (const key in node) {
if (Array.isArray(node[key])) {
node[key].forEach(child => {
this._walk(child, accumulator)
})
} else if (typeof node[key] === 'object') {
this._walk(node[key], accumulator)
}
}
this.stack.pop(node)
}
// Is the current node a child of some other type of node?
_childOf (nodeTypes) {
return this.stack &&
nodeTypes.includes(this.stack.slice(-1)[0].type)
}
}
The code we need to use it is:
import acorn from 'acorn'
// Walk to accumulate variable and parameter definitions.
class VariableWalker extends Walker {
Identifier (node, accumulator) {
if (this._childOf(['ArrowFunctionExpression',
'VariableDeclarator'])) {
accumulator.push(node.name)
}
}
}
// Test.
const program = `const value = 2
const double = (x) => {
const y = 2 * x
return y
}
const result = double(value)
console.log(result)
`
const ast = acorn.parse(program, { locations: true })
const walker = new VariableWalker(ast)
const accumulator = []
walker.walk(accumulator)
console.log('definitions are', accumulator)
and its output is:
definitions are [ 'value', 'double', 'x', 'y', 'result' ]
We think this approach to implementing the Visitor pattern is easier to understand and extend than one that relies on callbacks, but that could just be a reflection of our background and experience. As with code style, the most important thing is consistency: if we implement Visitor using classes in one place, we should implement it that way everywhere.
Section 14.6: How else could the AST walker work?
A third approach to this problem uses the Iterator design pattern. Instead of taking the computation to the nodes, an iterator returns the elements of a structure for processing (Figure 14.3). One way to think about it is that Visitor encapsulates recursion, while Iterator turns everything into a loop.
We can implement the Iterator pattern in JavaScript using
generator functions.
If we declare a function using function *
(with an asterisk) instead of function
then we can use the yield
keyword to return a value and suspend processing to be resumed later.
The result of yield
is a two-part structure with a value and a flag showing whether or not processing is done:
function * threeWords () {
yield 'first'
yield 'second'
yield 'third'
}
const gen = threeWords()
console.log(gen.next())
console.log(gen.next())
console.log(gen.next())
console.log(gen.next())
{ value: 'first', done: false }
{ value: 'second', done: false }
{ value: 'third', done: false }
{ value: undefined, done: true }
As another example, this generator takes a string and produces its vowels one by one:
function * getVowels (text) {
for (const char of text) {
if ('AEIOUaeiou'.includes(char)) {
yield char
}
}
}
const test = 'this is a test'
const gen = getVowels(test)
let current = gen.next()
while (!current.done) {
console.log(current.value)
current = gen.next()
}
i
i
a
e
A generator function doesn’t actually generate anything; instead, it creates an object that we can then ask for values repeatedly. This gives us a way to have several generators in play at the same time.
Instead of a while
loop it is much more common to use for...of
,
which knows how to work with generators:
for (const vowel of getVowels(test)) {
console.log(vowel)
}
Finally,
just as function *
says “this function is a generator”,
yield *
says “yield the values from a nested generator one by one”.
We can use it to walk irregular structures like nested arrays:
function * getNodes (here) {
if (typeof here === 'string') {
yield here
} else if (Array.isArray(here)) {
for (const child of here) {
yield * getNodes(child)
}
} else {
throw new Error(`unknown type "${typeof here}"`)
}
}
const nested = ['first', ['second', 'third']]
for (const value of getNodes(nested)) {
console.log(value)
}
Let’s use generators to count the number of expressions of various types in a program. The generator function that visits each node is:
function * getNodes (node) {
if (node && (typeof node === 'object') && ('type' in node)) {
yield node
for (const key in node) {
if (Array.isArray(node[key])) {
for (const child of node[key]) {
yield * getNodes(child)
}
} else if (typeof node[key] === 'object') {
yield * getNodes(node[key])
}
}
}
}
and the program that uses it is:
const ast = acorn.parse(program, { locations: true })
const result = {}
for (const node of getNodes(ast)) {
if (node.type === 'BinaryExpression') {
if (node.operator in result) {
result[node.operator] += 1
} else {
result[node.operator] = 1
}
}
}
console.log('counts are', result)
When we run it with our usual test program as input, we get:
counts are { '*': 2, '+': 1 }
Generators are a clean solution to many hard problems, but we find it more difficult to check variable identifiers using generators than using the class-based Visitor approach because we want to accumulate violations to report later. Again, this could be a reflection of what we’re used to rather than anything intrinsic; as with coding style, the most important thing is to be consistent.
Section 14.7: What other kinds of analysis can we do?
As one final example, consider the problem of keeping track of which methods are defined where in a deeply nested class hierarchy. (This problem comes up in some of the later chapters in this book: we wrote so many classes that incrementally extended their predecessors for pedagogical purposes that we lost track of what was defined where.) To create a table of method definitions, we first need to find the ancestors of the last class in the hierarchy:
import assert from 'assert'
import acorn from 'acorn'
import fs from 'fs'
import path from 'path'
import walk from 'acorn-walk'
class FindAncestors {
find (dirname, filename, className) {
return this.traceAncestry(dirname, filename, className, [])
}
traceAncestry (dirname, filename, className, accum) {
const fullPath = path.join(dirname, filename)
const program = fs.readFileSync(fullPath, 'utf-8')
const options = { locations: true, sourceType: 'module' }
const ast = acorn.parse(program, options)
const classDef = this.findClassDef(filename, ast, className)
accum.push({ filename, className, classDef })
const ancestorName = this.getAncestor(classDef)
if (ancestorName === null) {
return accum
}
const ancestorFile = this.findImport(filename, ast, ancestorName)
return this.traceAncestry(dirname, ancestorFile, ancestorName, accum)
}
}
export default FindAncestors
Finding class definitions is a straightforward extension of what we have already done:
findClassDef (filename, ast, className) {
const state = []
walk.simple(ast, {
ClassDeclaration: (node, state) => {
if ((node.id.type === 'Identifier') &&
(node.id.name === className)) {
state.push(node)
}
}
}, null, state)
assert(state.length === 1,
`No definition for ${className} in ${filename}`)
return state[0]
}
To test this code, we start with the last of these three short files:
class Upper {
constructor () {
this.name = 'upper'
}
report () {
console.log(this.modify(this.name))
}
modify (text) {
return text.toUpperCase()
}
}
module.exports = Upper
import Upper from './upper.js'
class Middle extends Upper {
constructor () {
super()
this.range = 'middle'
}
modify (text) {
return `** ${super.modify(text)} **`
}
}
export default Middle
import Middle from './middle.js'
class Lower extends Middle {
report () {
console.log(this.additional())
}
additional () {
return 'lower'
}
}
export default Lower
Lower in lower.js
Middle in ./middle.js
Upper in ./upper.js
Good: we can recover the chain of inheritance. Finding method definitions is also straightforward:
import FindAncestors from './find-ancestors.js'
class FindMethods extends FindAncestors {
find (dirname, filename, className) {
const classes = super.find(dirname, filename, className)
classes.forEach(record => {
record.methods = this.findMethods(record.classDef)
})
return classes
}
findMethods (classDef) {
return classDef.body.body
.filter(item => item.type === 'MethodDefinition')
.map(item => {
if (item.kind === 'constructor') {
return 'constructor'
} else if (item.kind === 'method') {
return item.key.name
} else {
return null
}
})
.filter(item => item !== null)
}
}
export default FindMethods
And finally, we can print a Markdown-formatted table showing which methods are defined in which class:
| method | Upper | Middle | Lower |
| ---- | ---- | ---- | ---- |
| additional | . | . | X |
| constructor | X | X | . |
| modify | X | X | . |
| report | X | . | X |
which renders as:
method | Upper | Middle | Lower |
---|---|---|---|
additional | . | . | X |
constructor | X | X | . |
modify | X | X | . |
report | X | . | X |
This may seem rather pointless for our toy example, but it proves its worth when we are looking at something like the virtual machine we will build in Chapter 19, which has a more complex method definition table:
method | Base | Interactive | Test | Exit |
---|---|---|---|---|
clear | . | X | . | . |
constructor | X | X | X | . |
exit | . | X | . | X |
getCommand | . | X | . | . |
handle | . | X | . | . |
help | . | X | . | . |
input | . | X | X | . |
interact | . | X | . | . |
list | . | X | . | . |
message | X | . | X | . |
next | . | X | . | . |
. | X | . | . | |
run | . | X | . | . |
setTester | . | . | X | . |
setVM | X | . | . | . |
stop | . | X | . | . |
variables | . | X | . | . |
Section 14.8: Exercises
Function length
Derive a class from Walker
that reports the length in lines of each function defined in the code being checked.
Expression depth
Derive a class from Walker
that reports how deep each top-level expression in the source code is.
For example,
the depth of 1 + 2 * 3
is 2,
while the depth of max(1 + 2 + 3)
is 3
(one level for the function call, one for the first addition, and one for the nested addition).
Downward and upward
Modify Walker
so that users can specify
one action to take at a node on the way down the tree
and a separate action to take on the way up.
(Hint: require users to specify Nodename_downward
and/or Nodename_upward
methods in their class,
then use string concatenation to construct method names while traversing the tree.)
Aggregating across files
Create a command-line program called sniff.js
that checks for style violations in any number of source files.
The first command-line argument to sniff.js
must be a JavaScript source file
that exports a class derived from Walker
called Check
that implements the checks the user wants.
The other command-line arguments must be the names of JavaScript source files to be checked:
node sniff.js my-check.js source-1.js source-2.js
Finding assertions
Write a program find-assertions.js
that finds all calls to assert
or assert.something
and prints the assertion message (if any).
Finding a missing parameter
-
Why doesn’t the parameter
x
show up as a rule violation in the example where we check name lengths? -
Modify the example so that it does.
Finding nested indexes
Write a tool that finds places where nested indexing is used,
i.e.,
where the program contains expressions like arr[table[i]]
.
Dynamic lookup
-
Write a function
dynamicExecution
that takes an object, the name of a method, and zero or more parameters as arguments and calls that method on that object:dynamicExecution(obj, 'meth', 1, 'a') // same as obj.meth(1, 'a')
-
What doesn’t this work for?
Generators and arrays
-
Write a generator that takes a two-dimensional table represented as an array of arrays and returns the values in column-major order.
-
Write another generator that takes a similar table and returns the values in row-major order.
Generators and identifiers
Rewrite the tool to check identifier lengths using a generator.