Debugger

We have finally come to another of the topics that sparked this book: how does a debugger work? Debuggers are as much a part of programmers' lives as version control but are taught far less often (in part, we believe, because it's harder to create homework questions for them). In this chapter we will build a simple single-stepping debugger and show one way to test interactive applications (Chapter 4).

What is our starting point?

We would like to debug a higher-level language than the assembly code of Chapter 19, but we don't want to have to write a parser or wrestle with the ASTs of Chapter 14. As a compromise, we will represent programs as JSON data structures whose elements have the form [command ...args]. If the JavaScript version of our program is:

const a = [-3, -5, -1, 0, -2, 1, 3, 1]
const b = Array()
let largest = a[0]
let i = 0
while (i < length(a)) {
  if (a[i] > largest) {
    b.push(a[i])
  }
  i += 1
}
i = 0
while (i < length(b)) {
  console.log(b[i])
  i += 1
}

then the JSON representation is:

[
  ["defA", "a", ["data", -3, -5, -1, 0, -2, 1, 3, 1]],
  ["defA", "b", ["data"]],
  ["defV", "largest", ["getA", "a", ["num", 0]]],
  ["append", "b", ["getV", "largest"]],
  ["defV", "i", ["num", 0]],
  ["loop", ["lt", ["getV", "i"], ["len", "a"]],
    ["test", ["gt", ["getA", "a", ["getV", "i"]], ["getV", "largest"]],
      ["setV", "largest", ["getA", "a", ["getV", "i"]]],
      ["append", "b", ["getV", "largest"]]
    ],
    ["setV", "i", ["add", ["getV", "i"], ["num", 1]]]
  ],
  ["setV", "i", ["num", 0]],
  ["loop", ["lt", ["getV", "i"], ["len", "b"]],
    ["print", ["getA", "b", ["getV", "i"]]],
    ["setV", "i", ["add", ["getV", "i"], ["num", 1]]]
  ]
]

Our virtual machine is structured like the one in Chapter 19. A real system would parse a program to create JSON, then translate JSON into assembly code, then assemble that to create machine instructions. Again, to keep things simple we will execute a program by removing comments and blank lines and then running commands by looking up the command name's and calling that method:

import assert from 'assert'

class VirtualMachineBase {
  constructor (program) {
    this.program = this.compile(program)
    this.prefix = '>>'
  }

  compile (lines) {
    const text = lines
      .map(line => line.trim())
      .filter(line => (line.length > 0) && !line.startsWith('//'))
      .join('\n')
    return JSON.parse(text)
  }

  run () {
    this.env = {}
    this.runAll(this.program)
  }

  runAll (commands) {
    commands.forEach(command => this.exec(command))
  }

  exec (command) {
    const [op, ...args] = command
    assert(op in this,
      `Unknown op "${op}"`)
    return this[op](args)
  }

// ...144 lines not shown...
}

export default VirtualMachineBase

Remember, functions and methods are just another kind of data, so if an object has a method called "meth", the expression this["meth"] looks it up and this["meth"](args) calls it. If "meth" is stored in a variable called name, then this[name](args) will do exactly the same thing.

The method in our VM that defines a new variable with an initial value looks like this:

  defV (args) {
    this.checkOp('defV', 2, args)
    const [name, value] = args
    this.env[name] = this.exec(value)
  }

while the one that adds two values looks like this:

  add (args) {
    this.checkOp('add', 2, args)
    const left = this.exec(args[0])
    const right = this.exec(args[1])
    return left + right
  }

Running a while loop is:

  loop (args) {
    this.checkBody('loop', 1, args)
    const body = args.slice(1)
    while (this.exec(args[0])) {
      this.runAll(body)
    }
  }

and checking that a variable name refers to an array is:

  checkArray (op, name) {
    this.checkName(op, name)
    const array = this.env[name]
    assert(Array.isArray(array),
      `Variable "${name}" used in "${op}" is not array`)
  }

The other operations are similar to these.

How can we make a tracing debugger?

The next thing we need in our debugger is a source map that keeps track of where in the source file each instruction came from. Since JSON is a subset of JavaScript, we could get line numbers by parsing our programs with Acorn. However, we would then have to scrape the information we want for this example out of the AST. Since this chapter is supposed to be about debugging, not parsing, we will instead cheat and add a line number to each interesting statement by hand so that our program looks like this:

// const a = [-3, -5, -1, 0, -2, 1, 3, 1]
// const b = Array()
// let largest = a[0]
// let i = 0
// while (i < length(a)) {
//   if (a[i] > largest) {
//     b.push(a[i])
//   }
//   i += 1
// }
// i = 0
// while (i < length(b)) {
//   console.log(b[i])
//   i += 1
// }
// [program]
[
  [1, "defA", "a", ["data", -3, -5, -1, 0, -2, 1, 3, 1]],
  [2, "defA", "b", ["data"]],
  [3, "defV", "largest", ["getA", "a", ["num", 0]]],
  [4, "append", "b", ["getV", "largest"]],
  [5, "defV", "i", ["num", 0]],
  [6, "loop", ["lt", ["getV", "i"], ["len", "a"]],
   [7, "test", ["gt", ["getA", "a", ["getV", "i"]], ["getV", "largest"]],
    [8, "setV", "largest", ["getA", "a", ["getV", "i"]]],
    [9, "append", "b", ["getV", "largest"]]
   ],
   [11, "setV", "i", ["add", ["getV", "i"], ["num", 1]]]
  ],
  [13, "setV", "i", ["num", 0]],
  [14, "loop", ["lt", ["getV", "i"], ["len", "b"]],
   [15, "print", ["getA", "b", ["getV", "i"]]],
   [16, "setV", "i", ["add", ["getV", "i"], ["num", 1]]]
  ]
]
// [/program]

Building the source map from that is simple; for now, we just modify exec to ignore the line number:

import assert from 'assert'

import VirtualMachineBase from './vm-base.js'

class VirtualMachineSourceMap extends VirtualMachineBase {
  compile (lines) {
    const original = super.compile(lines)
    this.sourceMap = {}
    const result = original.map(command => this.transform(command))
    return result
  }

  transform (node) {
    if (!Array.isArray(node)) {
      return node
    }
    if (Array.length === 0) {
      return []
    }
    const [first, ...rest] = node
    if (typeof first !== 'number') {
      return [first, null, ...rest.map(arg => this.transform(arg))]
    }
    const [op, ...args] = rest
    this.sourceMap[first] =
      [op, first, ...args.map(arg => this.transform(arg))]
    return this.sourceMap[first]
  }

  exec (command) {
    const [op, lineNum, ...args] = command
    assert(op in this,
      `Unknown op "${op}"`)
    return this[op](args)
  }
}

export default VirtualMachineSourceMap

It's not really cheating

We said that adding line numbers by hand was cheating, but it isn't. What we're actually doing is deferring a problem until we're sure we need to solve it. If our approach is clumsy or fails outright because of some aspect of design we didn't foresee, there will have been no point handling line numbers the "right" way. A good rule for software design is to tackle the thing you're least sure about first, using temporary code in place of what you think you'll eventually need.

The next step is to modify the VM's exec method so that it executes a callback function for each significant operation (where "significant" means "we bothered to record its line number"). Since we're not sure what our debugger is going to need, we give this callback the environment holding the current set of variables, the line number, and the operation being performed:

import assert from 'assert'

import VirtualMachineSourceMap from './vm-source-map.js'

class VirtualMachineCallback extends VirtualMachineSourceMap {
  constructor (program, dbg) {
    super(program)
    this.dbg = dbg
    this.dbg.setVM(this)
  }

  exec (command) {
    const [op, lineNum, ...args] = command
    this.dbg.handle(this.env, lineNum, op)
    assert(op in this,
      `Unknown op "${op}"`)
    return this[op](args, lineNum)
  }

  message (prefix, val) {
    this.dbg.message(`${prefix} ${val}`)
  }
}

export default VirtualMachineCallback

We also modify the VM's constructor to record the debugger and give it a reference to the virtual machine (Figure 1). We have to connect the two objects explicitly because each one needs a reference to the other, but one of them has to be created first. "A gets B then B tells A about itself" is a common pattern; we will look at other ways to manage it in the exercises.

Initializing mutually-depending objects — Figure 1: Two-step initialization of mutually-dependent objects.

To run the program, we create a debugger object and pass it to the VM's constructor:

import assert from 'assert'

import readSource from './read-source.js'

const main = () => {
  assert(process.argv.length === 5,
    'Usage: run-debugger.js ./vm ./debugger input|-')
  const VM = require(process.argv[2])
  const Debugger = require(process.argv[3])
  const inFile = process.argv[4]
  const lines = readSource(inFile)
  const dbg = new Debugger()
  const vm = new VM(lines, dbg)
  vm.run()
}

main()

A simple debugger just traces interesting statements as they run:

import DebuggerBase from './debugger-base.js'

class DebuggerTrace extends DebuggerBase {
  handle (env, lineNum, op) {
    if (lineNum !== null) {
      console.log(`${lineNum} / ${op}: ${JSON.stringify(env)}`)
    }
  }
}

export default DebuggerTrace

Let's try it on a program that adds the numbers in an array:

// const a = [-5, 1, 3]
// const total = 0
// let i = 0
// while (i < length(a)) {
//   total += a[i]
//   i += 1
// }
// console.log(total)

[
  [1, "defA", "a", ["data", -5, 1, 3]],
  [2, "defV", "total", ["num", 0]],
  [3, "defV", "i", ["num", 0]],
  [4, "loop", ["lt", ["getV", "i"], ["len", "a"]],
    [5, "setV", "total",
      ["add", ["getV", "total"], ["getA", "a", ["getV", "i"]]]
    ],
    [8, "setV", "i", ["add", ["getV", "i"], ["num", 1]]]
  ],
  [10, "print", ["getV", "total"]]
]

1 / defA: {}
2 / defV: {"a":[-5,1,3]}
3 / defV: {"a":[-5,1,3],"total":0}
4 / loop: {"a":[-5,1,3],"total":0,"i":0}
5 / setV: {"a":[-5,1,3],"total":0,"i":0}
8 / setV: {"a":[-5,1,3],"total":-5,"i":0}
5 / setV: {"a":[-5,1,3],"total":-5,"i":1}
8 / setV: {"a":[-5,1,3],"total":-4,"i":1}
5 / setV: {"a":[-5,1,3],"total":-4,"i":2}
8 / setV: {"a":[-5,1,3],"total":-1,"i":2}
10 / print: {"a":[-5,1,3],"total":-1,"i":3}
>> -1

How can we make the debugger interactive?

What we have built so far is an always-on print statement. To turn it into an interactive debugger, we will use the prompt-sync module to manage user input with the following set of commands:

? or help to list commands.
clear # to clear a breakpoint at a numbered line.
list to list lines and breakpoints.
next to go forward one line.
print name to show a variable while at a breakpoint.
run to run to the next breakpoint.
stop # to break at a numbered line.
variables to list all variable names.
exit to exit immediately.

When the virtual machine calls the debugger, the debugger first checks if it is on a numbered line. If it isn't, it hands control back to the VM. Otherwise, if we are single-stepping or this line is a breakpoint, the debugger takes over. Its overall structure is:

import prompt from 'prompt-sync'

import DebuggerBase from './debugger-base.js'

const PROMPT_OPTIONS = { sigint: true }

class DebuggerInteractive extends DebuggerBase {
  constructor () {
    super()
    this.singleStep = true
    this.breakpoints = new Set()
    this.lookup = {
      '?': 'help',
      c: 'clear',
      l: 'list',
      n: 'next',
      p: 'print',
      r: 'run',
      s: 'stop',
      v: 'variables',
      x: 'exit'
    }
  }

  handle (env, lineNum, op) {
    if (lineNum === null) {
      return
    }
    if (this.singleStep) {
      this.singleStep = false
      this.interact(env, lineNum, op)
    } else if (this.breakpoints.has(lineNum)) {
      this.interact(env, lineNum, op)
    }
  }

// ...102 lines not shown...
}

export default DebuggerInteractive

It interacts with users by lookup up a command and invoking the corresponding method, just as the VM does:

  interact (env, lineNum, op) {
    let interacting = true
    while (interacting) {
      const command = this.getCommand(env, lineNum, op)
      if (command.length === 0) {
        continue
      }
      const [cmd, ...args] = command
      if (cmd in this) {
        interacting = this[cmd](env, lineNum, op, args)
      } else if (cmd in this.lookup) {
        interacting = this[this.lookup[cmd]](env, lineNum, op, args)
      } else {
        this.message(`unknown command ${command} (use '?' for help)`)
      }
    }
  }

  getCommand (env, lineNum, op) {
    const options = Object.keys(this.lookup).sort().join('')
    const display = `[${lineNum} ${options}] `
    return this.input(display)
      .split(/\s+/)
      .map(s => s.trim())
      .filter(s => s.length > 0)
  }

  input (display) {
    return prompt(PROMPT_OPTIONS)(display)
  }

Learning as we go

We didn't originally put the input and output in methods that could be overridden, but realized later we needed to do this to make the debugger testable. Rather than coming back and rewriting this, we have done it here.

With this structure in place, the command handlers are pretty straightforward. For example, this method moves us to the next line:

  next (env, lineNum, op, args) {
    this.singleStep = true
    return false
  }

while this one prints the value of a variable:

  print (env, lineNum, op, args) {
    if (args.length !== 1) {
      this.message('p[rint] requires one variable name')
    } else if (!(args[0] in env)) {
      this.message(`unknown variable name "${args[0]}"`)
    } else {
      this.message(JSON.stringify(env[args[0]]))
    }
    return true
  }

After using this for a few moments, though we realized that we needed to change the signature of the loop method. We want to stop the loop each time it runs, and need to know where we are. We didn't allow for this in the base class, and we don't want to have to change every method, so we take advantage of the fact that JavaScript ignores any extra arguments passed to a method:

import VirtualMachineCallback from './vm-callback.js'

class VirtualMachineInteractive extends VirtualMachineCallback {
  loop (args, lineNum) {
    this.checkBody('loop', 1, args)
    const body = args.slice(1)
    while (this.exec(args[0])) {
      this.dbg.handle(this.env, lineNum, 'loop')
      this.runAll(body)
    }
  }
}

export default VirtualMachineInteractive

This is sloppy, but it works; we will tidy it up in the exercises.

How can we test an interactive application?

How can we test an interactive application like a debugger? The answer is, "By making it non-interactive." Like many tools over the past 30 years, our approach is based on a program called Expect. Our library replaces the input and output functions of the application being tested with callbacks, then provides input when asked and checks output when it is given (Figure 2).

Testing interactive application — Figure 2: Replacing input and output to test interactive applications.

The results look like this:

describe('interactive debugger', () => {
  it('runs and prints', (done) => {
    setup('print-0.json')
      .get('[1 ?clnprsvx] ')
      .send('r')
      .get('>> 0')
      .run()
    done()
  })

  it('breaks and resumes', (done) => {
    setup('print-3.json')
      .get('[1 ?clnprsvx] ')
      .send('s 3')
      .get('[1 ?clnprsvx] ')
      .send('r')
      .get('>> 0')
      .get('>> 1')
      .get('[3 ?clnprsvx] ')
      .send('x')
      .run()
    done()
  })
})

Our Expect class may be short, but it is hard to understand because it is so abstract:

import assert from 'assert'

class Expect {
  constructor (subject, start) {
    this.start = start
    this.steps = []
    subject.setTester(this)
  }

  send (text) {
    this.steps.push({ op: 'toSystem', arg: text })
    return this
  }

  get (text) {
    this.steps.push({ op: 'fromSystem', arg: text })
    return this
  }

  run () {
    this.start()
    assert.strictEqual(this.steps.length, 0,
      'Extra steps at end of test')
  }

  toSystem () {
    return this.next('toSystem')
  }

  fromSystem (actual) {
    const expected = this.next('fromSystem')
    assert.strictEqual(expected, actual,
      `Expected "${expected}" got "${actual}"`)
  }

  next (kind) {
    assert(this.steps.length > 0,
      'Unexpected end of steps')
    assert.strictEqual(this.steps[0].op, kind,
      `Expected ${kind}, got "${this.steps[0].op}"`)
    const text = this.steps[0].arg
    this.steps = this.steps.slice(1)
    return text
  }
}

export default Expect

Piece-by-piece:

subject is the thing being tested.
start is a callback to start the system running. It gives control to the subject, which then calls back into the test framework for input and output.
get and send store things to be given to the subject and to be checked against its output. Both methods return this so that we can chain calls together.
run starts the system and checks that all expected interactions have been used up when testing is done.
toSystem and fromSystem use next to get the next test record, check its type, and return the string.

Let's modify the debugger to use the tester, keeping in mind that the prompt counts as an output (and yes, we forgot this in the first version):

import DebuggerInteractive from './debugger-interactive.js'

class DebuggerTest extends DebuggerInteractive {
  constructor () {
    super()
    this.tester = null
  }

  setTester (tester) {
    this.tester = tester
  }

  input (display) {
    this.tester.fromSystem(display)
    return this.tester.toSystem()
  }

  message (m) {
    this.tester.fromSystem(m)
  }
}

export default DebuggerTest

Again, we can't pass the tester as a constructor parameter because of initialization order, so we write a setup function to make sure everything is connected the right way:

import Expect from '../expect.js'
import VM from '../vm-interactive.js'
import Debugger from '../debugger-test.js'
import readSource from '../read-source.js'

const setup = (filename) => {
  const lines = readSource(path.join('debugger/test', filename))
  const dbg = new Debugger()
  const vm = new VM(lines, dbg)
  return new Expect(dbg, () => vm.run())
}

Let's try running our tests:

npm run test -- -g 'interactive debugger'

> stjs@1.0.0 test /u/stjs
> mocha */test/test-*.js "-g" "interactive debugger"



  interactive debugger
    ✓ runs and prints

That works---or does it? Why is only one test shown, and doesn't the summary appear? After a bit of digging, we realize that the debugger's exit command calls process.exit when the simulated program ends, so the whole program including the VM, debugger, and test framework stops immediately before the promises that contain the tests have run.

We could fix this by modifying the debugger callback to return an indication of whether or not execution should continue, then modify the VM to pay attention to that flag. However, this approach becomes very complicated when we have deeply-nested calls to exec, which will happen with loops and conditionals.

A better alternative is to use an exception for control flow. We can define our own kind of exception as an empty class: it doesn't need any data because we are only using it to get a typed object:

class HaltException {
}

export default HaltException

Next, we modify the debugger to throw this exception when asked to exit:

import HaltException from './halt-exception.js'
import DebuggerTest from './debugger-test.js'

class DebuggerExit extends DebuggerTest {
  exit (env, lineNum, op, args) {
    throw new HaltException()
  }
}

export default DebuggerExit

And finally we modify the VM to finish cleanly if this exception is thrown, but re-throw any other kind of exception:

import HaltException from './halt-exception.js'
import VirtualMachineInteractive from './vm-interactive.js'

class VirtualMachineExit extends VirtualMachineInteractive {
  run () {
    this.env = {}
    try {
      this.runAll(this.program)
    } catch (exc) {
      if (exc instanceof HaltException) {
        return
      }
      throw exc
    }
  }
}

export default VirtualMachineExit

With these changes in place, we are finally able to test our interactive debugger:

npm run test -- -g 'exitable debugger'

> stjs@1.0.0 test /u/stjs
> mocha */test/test-*.js "-g" "exitable debugger"



  exitable debugger
    ✓ runs and prints
    ✓ breaks and resumes


  2 passing (7ms)

Exercises

Implementing tab completion

Read the documentation for prompt-sync and then implement tab completion for the debugger.

Modifying variables while running

Add a set command that sets the value of a variable to a new value in a running program. How do you handle setting array elements?

Making output more readable

Modify the tracing debugger so that the statements inside loops and conditionals are indented for easier reading.

Better loops

Our solution for handling loops is sloppy; fix it.

Using a flag to continue execution

Modify the debugger and virtual machine to use a "continue executing" flag rather than throwing an exception when execution should end. Which approach is easier to understand? Which will be easier to extend in the future?

Numbering lines

Write a tool that takes a JSON program representation without statement numbers and produces one that numbers all of the interesting statements for debugging purposes. Use whatever definition of "interesting" you think would be most useful.

Looping around again

Implement a "next loop iteration" command that runs the program until it reaches the current point in the next iteration of the current loop.

Looking up objects

Rather than having some objects call setXYZ methods in other objects, it is common practice to use a lookup table for mutual dependencies:

Every object initializes calls table.set(name, this) in its constructor.
Whenever object A needs the instance of object B, it calls table.lookup('B'). It does not store the result in a member variable.

Modify the virtual machine and debugger to use this pattern.

Watching for variable changes

Modify the debugger and virtual machine to implement watchpoints that halt the program whenever the value of a variable changes.

Translating JSON to assembler

Write a tool that translates the JSON program representation into the assembly code of Chapter 19. To simplify things, increase the number of registers so that there is always storage for intermediate results when doing arithmetic.