Chapter 20: Debugger
Terms defined: breakpoint, source map, tab completion, watchpoint
We have finally come to another of the topics that sparked this book: how does a debugger work? Debuggers are as much a part of programmers’ lives as version control but are taught far less often (in part, we believe, because it’s harder to create homework questions for them). In this chapter we will build a simple single-stepping debugger and show one way to test interactive applications (Chapter 4).
Section 20.1: What is our starting point?
We would like to debug a higher-level language than the assembly code of Chapter 19,
but we don’t want to have to write a parser
or wrestle with the ASTs of Chapter 14.
As a compromise,
we will represent programs as JSON data structures
whose elements have the form [command ...args]
.
If the JavaScript version of our program is:
const a = [-3, -5, -1, 0, -2, 1, 3, 1]
const b = Array()
let largest = a[0]
let i = 0
while (i < length(a)) {
if (a[i] > largest) {
b.push(a[i])
}
i += 1
}
i = 0
while (i < length(b)) {
console.log(b[i])
i += 1
}
then the JSON representation is:
[
["defA", "a", ["data", -3, -5, -1, 0, -2, 1, 3, 1]],
["defA", "b", ["data"]],
["defV", "largest", ["getA", "a", ["num", 0]]],
["append", "b", ["getV", "largest"]],
["defV", "i", ["num", 0]],
["loop", ["lt", ["getV", "i"], ["len", "a"]],
["test", ["gt", ["getA", "a", ["getV", "i"]], ["getV", "largest"]],
["setV", "largest", ["getA", "a", ["getV", "i"]]],
["append", "b", ["getV", "largest"]]
],
["setV", "i", ["add", ["getV", "i"], ["num", 1]]]
],
["setV", "i", ["num", 0]],
["loop", ["lt", ["getV", "i"], ["len", "b"]],
["print", ["getA", "b", ["getV", "i"]]],
["setV", "i", ["add", ["getV", "i"], ["num", 1]]]
]
]
Our virtual machine is structured like the one in Chapter 19. A real system would parse a program to create JSON, then translate JSON into assembly code, then assemble that to create machine instructions. Again, to keep things simple we will execute a program by removing comments and blank lines and then running commands by looking up the command name’s and calling that method:
import assert from 'assert'
class VirtualMachineBase {
constructor (program) {
this.program = this.compile(program)
this.prefix = '>>'
}
compile (lines) {
const text = lines
.map(line => line.trim())
.filter(line => (line.length > 0) && !line.startsWith('//'))
.join('\n')
return JSON.parse(text)
}
run () {
this.env = {}
this.runAll(this.program)
}
runAll (commands) {
commands.forEach(command => this.exec(command))
}
exec (command) {
const [op, ...args] = command
assert(op in this,
`Unknown op "${op}"`)
return this[op](args)
}
}
export default VirtualMachineBase
Remember, functions and methods are just another kind of data,
so if an object has a method called "meth"
,
the expression this["meth"]
looks it up
and this["meth"](args)
calls it.
If "meth"
is stored in a variable called name
,
then this[name](args)
will do exactly the same thing.
The method in our VM that defines a new variable with an initial value looks like this:
defV (args) {
this.checkOp('defV', 2, args)
const [name, value] = args
this.env[name] = this.exec(value)
}
while the one that adds two values looks like this:
add (args) {
this.checkOp('add', 2, args)
const left = this.exec(args[0])
const right = this.exec(args[1])
return left + right
}
Running a while
loop is:
loop (args) {
this.checkBody('loop', 1, args)
const body = args.slice(1)
while (this.exec(args[0])) {
this.runAll(body)
}
}
and checking that a variable name refers to an array is:
checkArray (op, name) {
this.checkName(op, name)
const array = this.env[name]
assert(Array.isArray(array),
`Variable "${name}" used in "${op}" is not array`)
}
The other operations are similar to these.
Section 20.2: How can we make a tracing debugger?
The next thing we need in our debugger is a source map that keeps track of where in the source file each instruction came from. Since JSON is a subset of JavaScript, we could get line numbers by parsing our programs with Acorn. However, we would then have to scrape the information we want for this example out of the AST. Since this chapter is supposed to be about debugging, not parsing, we will instead cheat and add a line number to each interesting statement by hand so that our program looks like this:
[
[1, "defA", "a", ["data", -3, -5, -1, 0, -2, 1, 3, 1]],
[2, "defA", "b", ["data"]],
[3, "defV", "largest", ["getA", "a", ["num", 0]]],
[4, "append", "b", ["getV", "largest"]],
[5, "defV", "i", ["num", 0]],
[6, "loop", ["lt", ["getV", "i"], ["len", "a"]],
[7, "test", ["gt", ["getA", "a", ["getV", "i"]], ["getV", "largest"]],
[8, "setV", "largest", ["getA", "a", ["getV", "i"]]],
[9, "append", "b", ["getV", "largest"]]
],
[11, "setV", "i", ["add", ["getV", "i"], ["num", 1]]]
],
[13, "setV", "i", ["num", 0]],
[14, "loop", ["lt", ["getV", "i"], ["len", "b"]],
[15, "print", ["getA", "b", ["getV", "i"]]],
[16, "setV", "i", ["add", ["getV", "i"], ["num", 1]]]
]
]
Building the source map from that is simple;
for now,
we just modify exec
to ignore the line number:
import assert from 'assert'
import VirtualMachineBase from './vm-base.js'
class VirtualMachineSourceMap extends VirtualMachineBase {
compile (lines) {
const original = super.compile(lines)
this.sourceMap = {}
const result = original.map(command => this.transform(command))
return result
}
transform (node) {
if (!Array.isArray(node)) {
return node
}
if (Array.length === 0) {
return []
}
const [first, ...rest] = node
if (typeof first !== 'number') {
return [first, null, ...rest.map(arg => this.transform(arg))]
}
const [op, ...args] = rest
this.sourceMap[first] =
[op, first, ...args.map(arg => this.transform(arg))]
return this.sourceMap[first]
}
exec (command) {
const [op, lineNum, ...args] = command
assert(op in this,
`Unknown op "${op}"`)
return this[op](args)
}
}
export default VirtualMachineSourceMap
It’s not really cheating
We said that adding line numbers by hand was cheating, but it isn’t. What we’re actually doing is deferring a problem until we’re sure we need to solve it. If our approach is clumsy or fails outright because of some aspect of design we didn’t foresee, there will have been no point handling line numbers the “right” way. A good rule for software design is to tackle the thing you’re least sure about first, using temporary code in place of what you think you’ll eventually need.
The next step is to modify the VM’s exec
method
so that it executes a callback function for each significant operation
(where “significant” means “we bothered to record its line number”).
Since we’re not sure what our debugger is going to need,
we give this callback the environment holding the current set of variables,
the line number,
and the operation being performed:
import assert from 'assert'
import VirtualMachineSourceMap from './vm-source-map.js'
class VirtualMachineCallback extends VirtualMachineSourceMap {
constructor (program, dbg) {
super(program)
this.dbg = dbg
this.dbg.setVM(this)
}
exec (command) {
const [op, lineNum, ...args] = command
this.dbg.handle(this.env, lineNum, op)
assert(op in this,
`Unknown op "${op}"`)
return this[op](args, lineNum)
}
message (prefix, val) {
this.dbg.message(`${prefix} ${val}`)
}
}
export default VirtualMachineCallback
We also modify the VM’s constructor to record the debugger and give it a reference to the virtual machine (Figure 20.1). We have to connect the two objects explicitly because each one needs a reference to the other, but one of them has to be created first. “A gets B then B tells A about itself” is a common pattern; we will look at other ways to manage it in the exercises.
To run the program, we create a debugger object and pass it to the VM’s constructor:
import assert from 'assert'
import readSource from './read-source.js'
const main = () => {
assert(process.argv.length === 5,
'Usage: run-debugger.js ./vm ./debugger input|-')
const VM = require(process.argv[2])
const Debugger = require(process.argv[3])
const inFile = process.argv[4]
const lines = readSource(inFile)
const dbg = new Debugger()
const vm = new VM(lines, dbg)
vm.run()
}
main()
A simple debugger just traces interesting statements as they run:
import DebuggerBase from './debugger-base.js'
class DebuggerTrace extends DebuggerBase {
handle (env, lineNum, op) {
if (lineNum !== null) {
console.log(`${lineNum} / ${op}: ${JSON.stringify(env)}`)
}
}
}
export default DebuggerTrace
Let’s try it on a program that adds the numbers in an array:
// const a = [-5, 1, 3]
// const total = 0
// let i = 0
// while (i < length(a)) {
// total += a[i]
// i += 1
// }
// console.log(total)
[
[1, "defA", "a", ["data", -5, 1, 3]],
[2, "defV", "total", ["num", 0]],
[3, "defV", "i", ["num", 0]],
[4, "loop", ["lt", ["getV", "i"], ["len", "a"]],
[5, "setV", "total",
["add", ["getV", "total"], ["getA", "a", ["getV", "i"]]]
],
[8, "setV", "i", ["add", ["getV", "i"], ["num", 1]]]
],
[10, "print", ["getV", "total"]]
]
1 / defA: {}
2 / defV: {"a":[-5,1,3]}
3 / defV: {"a":[-5,1,3],"total":0}
4 / loop: {"a":[-5,1,3],"total":0,"i":0}
5 / setV: {"a":[-5,1,3],"total":0,"i":0}
8 / setV: {"a":[-5,1,3],"total":-5,"i":0}
5 / setV: {"a":[-5,1,3],"total":-5,"i":1}
8 / setV: {"a":[-5,1,3],"total":-4,"i":1}
5 / setV: {"a":[-5,1,3],"total":-4,"i":2}
8 / setV: {"a":[-5,1,3],"total":-1,"i":2}
10 / print: {"a":[-5,1,3],"total":-1,"i":3}
>> -1
Section 20.3: How can we make the debugger interactive?
What we have built so far is an always-on print
statement.
To turn it into an interactive debugger,
we will use the prompt-sync
module to manage user input
with the following set of commands:
-
?
orhelp
to list commands. -
clear #
to clear a breakpoint at a numbered line. -
list
to list lines and breakpoints. -
next
to go forward one line. -
print name
to show a variable while at a breakpoint. -
run
to run to the next breakpoint. -
stop #
to break at a numbered line. -
variables
to list all variable names. -
exit
to exit immediately.
When the virtual machine calls the debugger, the debugger first checks if it is on a numbered line. If it isn’t, it hands control back to the VM. Otherwise, if we are single-stepping or this line is a breakpoint, the debugger takes over. Its overall structure is:
import prompt from 'prompt-sync'
import DebuggerBase from './debugger-base.js'
const PROMPT_OPTIONS = { sigint: true }
class DebuggerInteractive extends DebuggerBase {
constructor () {
super()
this.singleStep = true
this.breakpoints = new Set()
this.lookup = {
'?': 'help',
c: 'clear',
l: 'list',
n: 'next',
p: 'print',
r: 'run',
s: 'stop',
v: 'variables',
x: 'exit'
}
}
handle (env, lineNum, op) {
if (lineNum === null) {
return
}
if (this.singleStep) {
this.singleStep = false
this.interact(env, lineNum, op)
} else if (this.breakpoints.has(lineNum)) {
this.interact(env, lineNum, op)
}
}
}
export default DebuggerInteractive
It interacts with users by lookup up a command and invoking the corresponding method, just as the VM does:
interact (env, lineNum, op) {
let interacting = true
while (interacting) {
const command = this.getCommand(env, lineNum, op)
if (command.length === 0) {
continue
}
const [cmd, ...args] = command
if (cmd in this) {
interacting = this[cmd](env, lineNum, op, args)
} else if (cmd in this.lookup) {
interacting = this[this.lookup[cmd]](env, lineNum, op, args)
} else {
this.message(`unknown command ${command} (use '?' for help)`)
}
}
}
getCommand (env, lineNum, op) {
const options = Object.keys(this.lookup).sort().join('')
const display = `[${lineNum} ${options}] `
return this.input(display)
.split(/\s+/)
.map(s => s.trim())
.filter(s => s.length > 0)
}
input (display) {
return prompt(PROMPT_OPTIONS)(display)
}
Learning as we go
We didn’t originally put the input and output in methods that could be overridden, but realized later we needed to do this to make the debugger testable. Rather than coming back and rewriting this, we have done it here.
With this structure in place, the command handlers are pretty straightforward. For example, this method moves us to the next line:
next (env, lineNum, op, args) {
this.singleStep = true
return false
}
while this one prints the value of a variable:
print (env, lineNum, op, args) {
if (args.length !== 1) {
this.message('p[rint] requires one variable name')
} else if (!(args[0] in env)) {
this.message(`unknown variable name "${args[0]}"`)
} else {
this.message(JSON.stringify(env[args[0]]))
}
return true
}
After using this for a few moments,
though
we realized that we needed to change the signature of the loop
method.
We want to stop the loop each time it runs,
and need to know where we are.
We didn’t allow for this in the base class,
and we don’t want to have to change every method,
so we take advantage of the fact that JavaScript ignores any extra arguments passed to a method:
import VirtualMachineCallback from './vm-callback.js'
class VirtualMachineInteractive extends VirtualMachineCallback {
loop (args, lineNum) {
this.checkBody('loop', 1, args)
const body = args.slice(1)
while (this.exec(args[0])) {
this.dbg.handle(this.env, lineNum, 'loop')
this.runAll(body)
}
}
}
export default VirtualMachineInteractive
This is sloppy, but it works; we will tidy it up in the exercises.
Section 20.4: How can we test an interactive application?
How can we test an interactive application like a debugger? The answer is, “By making it non-interactive.” Like many tools over the past 30 years, our approach is based on a program called Expect. Our library replaces the input and output functions of the application being tested with callbacks, then provides input when asked and checks output when it is given (Figure 20.2).
The results look like this:
describe('interactive debugger', () => {
it('runs and prints', (done) => {
setup('print-0.json')
.get('[1 ?clnprsvx] ')
.send('r')
.get('>> 0')
.run()
done()
})
it('breaks and resumes', (done) => {
setup('print-3.json')
.get('[1 ?clnprsvx] ')
.send('s 3')
.get('[1 ?clnprsvx] ')
.send('r')
.get('>> 0')
.get('>> 1')
.get('[3 ?clnprsvx] ')
.send('x')
.run()
done()
})
})
Our Expect
class may be short,
but it is hard to understand because it is so abstract:
import assert from 'assert'
class Expect {
constructor (subject, start) {
this.start = start
this.steps = []
subject.setTester(this)
}
send (text) {
this.steps.push({ op: 'toSystem', arg: text })
return this
}
get (text) {
this.steps.push({ op: 'fromSystem', arg: text })
return this
}
run () {
this.start()
assert.strictEqual(this.steps.length, 0,
'Extra steps at end of test')
}
toSystem () {
return this.next('toSystem')
}
fromSystem (actual) {
const expected = this.next('fromSystem')
assert.strictEqual(expected, actual,
`Expected "${expected}" got "${actual}"`)
}
next (kind) {
assert(this.steps.length > 0,
'Unexpected end of steps')
assert.strictEqual(this.steps[0].op, kind,
`Expected ${kind}, got "${this.steps[0].op}"`)
const text = this.steps[0].arg
this.steps = this.steps.slice(1)
return text
}
}
export default Expect
Piece-by-piece:
subject
is the thing being tested.start
is a callback to start the system running. It gives control to the subject, which then calls back into the test framework for input and output.get
andsend
store things to be given to the subject and to be checked against its output. Both methods returnthis
so that we can chain calls together.run
starts the system and checks that all expected interactions have been used up when testing is done.toSystem
andfromSystem
usenext
to get the next test record, check its type, and return the string.
Let’s modify the debugger to use the tester, keeping in mind that the prompt counts as an output (and yes, we forgot this in the first version):
import DebuggerInteractive from './debugger-interactive.js'
class DebuggerTest extends DebuggerInteractive {
constructor () {
super()
this.tester = null
}
setTester (tester) {
this.tester = tester
}
input (display) {
this.tester.fromSystem(display)
return this.tester.toSystem()
}
message (m) {
this.tester.fromSystem(m)
}
}
export default DebuggerTest
Again,
we can’t pass the tester as a constructor parameter because of initialization order,
so we write a setup
function to make sure everything is connected the right way:
import Expect from '../expect.js'
import VM from '../vm-interactive.js'
import Debugger from '../debugger-test.js'
import readSource from '../read-source.js'
const setup = (filename) => {
const lines = readSource(path.join('debugger/test', filename))
const dbg = new Debugger()
const vm = new VM(lines, dbg)
return new Expect(dbg, () => vm.run())
}
Let’s try running our tests:
npm run test -- -g 'interactive debugger'
> stjs@1.0.0 test /u/stjs
> mocha */test/test-*.js "-g" "interactive debugger"
interactive debugger
✓ runs and prints
That works—or does it?
Why is only one test shown,
and doesn’t the summary appear?
After a bit of digging,
we realize that the debugger’s exit
command calls process.exit
when the simulated program ends,
so the whole program including the VM, debugger, and test framework stops immediately
before the promises that contain the tests have run.
We could fix this by modifying the debugger callback
to return an indication of whether or not execution should continue,
then modify the VM to pay attention to that flag.
However,
this approach becomes very complicated when we have deeply-nested calls to exec
,
which will happen with loops and conditionals.
A better alternative is to use an exception for control flow. We can define our own kind of exception as an empty class: it doesn’t need any data because we are only using it to get a typed object:
class HaltException {
}
export default HaltException
Next, we modify the debugger to throw this exception when asked to exit:
import HaltException from './halt-exception.js'
import DebuggerTest from './debugger-test.js'
class DebuggerExit extends DebuggerTest {
exit (env, lineNum, op, args) {
throw new HaltException()
}
}
export default DebuggerExit
And finally we modify the VM to finish cleanly if this exception is thrown, but re-throw any other kind of exception:
import HaltException from './halt-exception.js'
import VirtualMachineInteractive from './vm-interactive.js'
class VirtualMachineExit extends VirtualMachineInteractive {
run () {
this.env = {}
try {
this.runAll(this.program)
} catch (exc) {
if (exc instanceof HaltException) {
return
}
throw exc
}
}
}
export default VirtualMachineExit
With these changes in place, we are finally able to test our interactive debugger:
npm run test -- -g 'exitable debugger'
> stjs@1.0.0 test /u/stjs
> mocha */test/test-*.js "-g" "exitable debugger"
exitable debugger
✓ runs and prints
✓ breaks and resumes
2 passing (7ms)
Section 20.5: Exercises
Implementing tab completion
Read the documentation for prompt-sync
and then implement tab completion
for the debugger.
Modifying variables while running
Add a set
command that sets the value of a variable to a new value in a running program.
How do you handle setting array elements?
Making output more readable
Modify the tracing debugger so that the statements inside loops and conditionals are indented for easier reading.
Better loops
Our solution for handling loops is sloppy; fix it.
Using a flag to continue execution
Modify the debugger and virtual machine to use a “continue executing” flag rather than throwing an exception when execution should end. Which approach is easier to understand? Which will be easier to extend in the future?
Numbering lines
Write a tool that takes a JSON program representation without statement numbers and produces one that numbers all of the interesting statements for debugging purposes. Use whatever definition of “interesting” you think would be most useful.
Looping around again
Implement a “next loop iteration” command that runs the program until it reaches the current point in the next iteration of the current loop.
Looking up objects
Rather than having some objects call setXYZ
methods in other objects,
it is common practice to use a lookup table for mutual dependencies:
-
Every object initializes calls
table.set(name, this)
in its constructor. -
Whenever object A needs the instance of object B, it calls
table.lookup('B')
. It does not store the result in a member variable.
Modify the virtual machine and debugger to use this pattern.
Watching for variable changes
Modify the debugger and virtual machine to implement watchpoints that halt the program whenever the value of a variable changes.
Translating JSON to assembler
Write a tool that translates the JSON program representation into the assembly code of Chapter 19. To simplify things, increase the number of registers so that there is always storage for intermediate results when doing arithmetic.