Chapter 3: Asynchronous Programming

Terms defined: call stack, character encoding, class, constructor, event loop, exception, fluent interface, method, method chaining, non-blocking execution, promise, promisification, protocol, UTF-8

Callbacks work, but they are hard to read and debug, which means they only “work” in a limited sense. JavaScript’s developers added promises to the language in 2015 to make callbacks easier to write and understand, and more recently they added the keywords async and await as well to make asynchronous programming easier still. To show how these work, we will create a class of our own called Pledge that provides the same core features as promises. Our explanation was inspired by Trey Huffine’s tutorial, and we encourage you to read that as well.

Section 3.1: How can we manage asynchronous execution?

JavaScript is built around an event loop. Every task is represented by an entry in a queue; the event loop repeatedly takes a task from the front of the queue, runs it, and adds any new tasks that it creates to the back of the queue to run later. Only one task runs at a time; each has its own call stack, but objects can be shared between tasks (Figure 3.1).

The event loop
Figure 3.1: Using an event loop to manage concurrent tasks.

Most tasks execute all the code available in the order it is written. For example, this one-line program uses Array.forEach to print each element of an array in turn:

[1000, 1500, 500].forEach(t => console.log(t))
1000
1500
500

However, a handful of special built-in functions make Node switch tasks or add new tasks to the run queue. For example, setTimeout tells Node to run a callback function after a certain number of milliseconds have passed. Its first argument is a callback function that takes no arguments, and its second is the delay. When setTimeout is called, Node sets the callback aside for the requested length of time, then adds it to the run queue. (This means the task runs at least the specified number of milliseconds later.)

Why zero arguments?

setTimeout‘s requirement that callback functions take no arguments is another example of a protocol. One way to think about it is that protocols allow old code to use new code: whoever wrote setTimeout couldn’t know what specific tasks we want to delay, so they specified a way to wrap up any task at all.

As the listing below shows, the original task can generate many new tasks before it completes, and those tasks can run in a different order than the order in which they were created (Figure 3.2).

[1000, 1500, 500].forEach(t => {
  console.log(`about to setTimeout for ${t}`)
  setTimeout(() => console.log(`inside timer handler for ${t}`), t)
})
about to setTimeout for 1000
about to setTimeout for 1500
about to setTimeout for 500
inside timer handler for 500
inside timer handler for 1000
inside timer handler for 1500
Setting a timeout
Figure 3.2: Using setTimeout to delay operations.

If we give setTimeout a delay of zero milliseconds, the new task can be run right away, but any other tasks that are waiting have a chance to run as well:

[1000, 1500, 500].forEach(t => {
  console.log(`about to setTimeout for ${t}`)
  setTimeout(() => console.log(`inside timer handler for ${t}`), 0)
})
about to setTimeout for 1000
about to setTimeout for 1500
about to setTimeout for 500
inside timer handler for 1000
inside timer handler for 1500
inside timer handler for 500

We can use this trick to build a generic non-blocking function that takes a callback defining a task and switches tasks if any others are available:

const nonBlocking = (callback) => {
  setTimeout(callback, 0)
}

[1000, 1500, 500].forEach(t => {
  console.log(`about to do nonBlocking for ${t}`)
  nonBlocking(() => console.log(`inside timer handler for ${t}`))
})
about to do nonBlocking for 1000
about to do nonBlocking for 1500
about to do nonBlocking for 500
inside timer handler for 1000
inside timer handler for 1500
inside timer handler for 500

Node’s built-in function setImmediate does exactly what our nonBlocking function does: Node also has process.nextTick, which doesn’t do quite the same thing—we’ll explore the differences in the exercises.

[1000, 1500, 500].forEach(t => {
  console.log(`about to do setImmediate for ${t}`)
  setImmediate(() => console.log(`inside immediate handler for ${t}`))
})
about to do setImmediate for 1000
about to do setImmediate for 1500
about to do setImmediate for 500
inside immediate handler for 1000
inside immediate handler for 1500
inside immediate handler for 500

Section 3.2: How do promises work?

Before we start building our own promises, let’s look at how we want them to work:

import Pledge from './pledge.js'

new Pledge((resolve, reject) => {
  console.log('top of a single then clause')
  setTimeout(() => {
    console.log('about to call resolve callback')
    resolve('this is the result')
  }, 0)
}).then((value) => {
  console.log(`in 'then' with "${value}"`)
  return 'first then value'
})
top of a single then clause
about to call resolve callback
in 'then' with "this is the result"

This short program creates a new Pledge with a callback that takes two other callbacks as arguments: resolve (which will run when everything worked) and reject (which will run when something went wrong). The top-level callback does the first part of what we want to do, i.e., whatever we want to run before we expect a delay; for demonstration purposes, we will use setTimeout with zero delay to switch tasks. Once this task resumes, we call the resolve callback to trigger whatever is supposed to happen after the delay.

Now look at the line with then. This is a method of the Pledge object we just created, and its job is to do whatever we want to do after the delay. The argument to then is yet another callback function; it will get the value passed to resolve, which is how the first part of the action communicates with the second (Figure 3.3).

How promises resolve
Figure 3.3: Order of operations when a promise resolves.

In order to make this work, Pledge‘s constructor must take a single function called action. This function must take two callbacks as arguments: what to do if the action completes successfully and what to do if it doesn’t (i.e., how to handle errors). Pledge will provide these callbacks to the action at the right times.

Pledge also needs two methods: then to enable more actions and catch to handle errors. To simplify things just a little bit, we will allow users to chain as many thens as they want, but only allow one catch.

Section 3.3: How can we chain operations together?

A fluent interface is a style of object-oriented programming in which the methods of an object return this so that method calls can be chained together. For example, if our class is:

class Fluent {
  constructor () {...}

  first (top) {
    ...do something with top...
    return this
  }

  second (left, right) {
    ...do something with left and right...
  }
}

then we can write:

  const f = new Fluent()
  f.first('hello').second('and', 'goodbye')

or even

  (new Fluent()).first('hello').second('and', 'goodbye')

Array‘s fluent interface lets us write expressions like Array.filter(...).map(...) that are usually more readable than assigning intermediate results to temporary variables.

If the original action given to our Pledge completes successfully, the Pledge gives us a value by calling the resolve callback. We pass this value to the first then, pass the result of that then to the second one, and so on. If any of them fail and throw an exception, we pass that exception to the error handler. Putting it all together, the whole class looks like this:

class Pledge {
  constructor (action) {
    this.actionCallbacks = []
    this.errorCallback = () => {}
    action(this.onResolve.bind(this), this.onReject.bind(this))
  }

  then (thenHandler) {
    this.actionCallbacks.push(thenHandler)
    return this
  }

  catch (errorHandler) {
    this.errorCallback = errorHandler
    return this
  }

  onResolve (value) {
    let storedValue = value
    try {
      this.actionCallbacks.forEach((action) => {
        storedValue = action(storedValue)
      })
    } catch (err) {
      this.actionCallbacks = []
      this.onReject(err)
    }
  }

  onReject (err) {
    this.errorCallback(err)
  }
}

export default Pledge

Binding this

Pledge‘s constructor makes two calls to a special function called bind. When we create an object obj and call a method meth, JavaScript sets the special variable this to obj inside meth. If we use a method as a callback, though, this isn’t automatically set to the correct object. To convert the method to a plain old function with the right this, we have to use bind. The documentation has more details and examples.

Let’s create a Pledge and return a value:

import Pledge from './pledge.js'

new Pledge((resolve, reject) => {
  console.log('top of a single then clause')
}).then((value) => {
  console.log(`then with "${value}"`)
  return 'first then value'
})
top of a single then clause

Why didn’t this work?

  1. We can’t use return with pledges because the call stack of the task that created the pledge is gone by the time the pledge executes. Instead, we must call resolve or reject.

  2. We haven’t done anything that defers execution, i.e., there is no call to setTimeout, setImmediate, or anything else that would switch tasks. Our original motivating example got this right.

This example shows how we can chain actions together:

import Pledge from './pledge.js'

new Pledge((resolve, reject) => {
  console.log('top of action callback with double then and a catch')
  setTimeout(() => {
    console.log('about to call resolve callback')
    resolve('initial result')
    console.log('after resolve callback')
  }, 0)
  console.log('end of action callback')
}).then((value) => {
  console.log(`first then with "${value}"`)
  return 'first value'
}).then((value) => {
  console.log(`second then with "${value}"`)
  return 'second value'
})
top of action callback with double then and a catch
end of action callback
about to call resolve callback
first then with "initial result"
second then with "first value"
after resolve callback

Notice that inside each then we do use return because these clauses all run in a single task. As we will see in the next section, the full implementation of Promise allows us to run both normal code and delayed tasks inside then handlers.

Finally, in this example we explicitly signal a problem by calling reject to make sure our error handling does what it’s supposed to:

import Pledge from './pledge.js'

new Pledge((resolve, reject) => {
  console.log('top of action callback with deliberate error')
  setTimeout(() => {
    console.log('about to reject on purpose')
    reject('error on purpose')
  }, 0)
}).then((value) => {
  console.log(`should not be here with "${value}"`)
}).catch((err) => {
  console.log(`in error handler with "${err}"`)
})
top of action callback with deliberate error
about to reject on purpose
in error handler with "error on purpose"

Section 3.4: How are real promises different?

Let’s rewrite our chained pledge with built-in promises:

new Promise((resolve, reject) => {
  console.log('top of action callback with double then and a catch')
  setTimeout(() => {
    console.log('about to call resolve callback')
    resolve('initial result')
    console.log('after resolve callback')
  }, 0)
  console.log('end of action callback')
}).then((value) => {
  console.log(`first then with "${value}"`)
  return 'first value'
}).then((value) => {
  console.log(`second then with "${value}"`)
  return 'second value'
})
top of action callback with double then and a catch
end of action callback
about to call resolve callback
after resolve callback
first then with "initial result"
second then with "first value"

It looks almost the same, but if we read the output carefully we can see that the callbacks run after the main program finishes. This is a signal that Node is delaying the execution of the code in the then handler.

A very common pattern is to return another promise from inside then so that the next then is called on the returned promise, not on the original promise (Figure 3.4). This is another way to implement a fluent interface: if a method of one object returns a second object, we can call a method of the second object immediately.

const delay = (message) => {
  return new Promise((resolve, reject) => {
    console.log(`constructing promise: ${message}`)
    setTimeout(() => {
      resolve(`resolving: ${message}`)
    }, 1)
  })
}

console.log('before')
delay('outer delay')
  .then((value) => {
    console.log(`first then: ${value}`)
    return delay('inner delay')
  }).then((value) => {
    console.log(`second then: ${value}`)
  })
console.log('after')
before
constructing promise: outer delay
after
first then: resolving: outer delay
constructing promise: inner delay
second then: resolving: inner delay
Chained promises
Figure 3.4: Chaining promises to make asynchronous operations depend on each other.

We therefore have three rules for chaining promises:

  1. If our code can run synchronously, just put it in then.

  2. If we want to use our own asynchronous function, it must create and return a promise.

  3. Finally, if we want to use a library function that relies on callbacks, we have to convert it to use promises. Doing this is called promisification (because programmers will rarely pass up an opportunity to add a bit of jargon to the world), and most functions in Node have already been promisified.

Section 3.5: How can we build tools with promises?

Promises may seem more complex than callbacks right now, but that’s because we’re looking at how they work rather than at how to use them. To explore the latter subject, let’s use promises to build a program to count the number of lines in a set of files. A few moments of search on NPM turns up a promisified version of fs-extra called fs-extra-promise, so we will rely on it for file operations.

Our first step is to count the lines in a single file:

import fs from 'fs-extra-promise'

const filename = process.argv[2]

fs.readFileAsync(filename, { encoding: 'utf-8' })
  .then(data => {
    const length = data.split('\n').length - 1
    console.log(`${filename}: ${length}`)
  })
  .catch(err => {
    console.error(err.message)
  })
node count-lines-single-file.js count-lines-single-file.js
count-lines-single-file.js: 12

Character encoding

A character encoding specifies how characters are stored as bytes. The most widely used is UTF-8, which stores characters common in Western European languages in a single byte and uses multi-byte sequences for other symbols. If we don’t specify a character encoding, fs.readFileAsync gives us an array of bytes rather than a string of characters. We can tell we’ve made this mistake when we try to call a method of String and Node tells us we can’t.

The next step is to count the lines in multiple files. We can use glob-promise to delay handling the output of glob, but we need some way to create a separate task to count the lines in each file and to wait until those line counts are available before exiting our program.

The tool we want is Promise.all, which waits until all of the promises in an array have completed. To make our program a little more readable, we will put the creation of the promise for each file in a separate function:

import glob from 'glob-promise'
import fs from 'fs-extra-promise'

const main = (srcDir) => {
  glob(`${srcDir}/**/*.*`)
    .then(files => Promise.all(files.map(f => lineCount(f))))
    .then(counts => counts.forEach(c => console.log(c)))
    .catch(err => console.log(err.message))
}

const lineCount = (filename) => {
  return new Promise((resolve, reject) => {
    fs.readFileAsync(filename, { encoding: 'utf-8' })
      .then(data => resolve(data.split('\n').length - 1))
      .catch(err => reject(err))
  })
}

const srcDir = process.argv[2]
main(srcDir)
node count-lines-globbed-files.js .
10
1
12
4
1
...
3
2
5
2
14

However, we want to display the names of the files whose lines we’re counting along with the counts. To do this our then must return two values. We could put them in an array, but it’s better practice to construct a temporary object with named fields (Figure 3.5). This approach allows us to add or rearrange fields without breaking code and also serves as a bit of documentation. With this change our line-counting program becomes:

import glob from 'glob-promise'
import fs from 'fs-extra-promise'

const main = (srcDir) => {
  glob(`${srcDir}/**/*.*`)
    .then(files => Promise.all(files.map(f => lineCount(f))))
    .then(counts => counts.forEach(
      c => console.log(`${c.lines}: ${c.name}`)))
    .catch(err => console.log(err.message))
}

const lineCount = (filename) => {
  return new Promise((resolve, reject) => {
    fs.readFileAsync(filename, { encoding: 'utf-8' })
      .then(data => resolve({
        name: filename,
        lines: data.split('\n').length - 1
      }))
      .catch(err => reject(err))
  })
}

const srcDir = process.argv[2]
main(srcDir)
Temporary objects with named fields
Figure 3.5: Creating temporary objects with named fields to carry values forward.

As in Chapter 2, this works until we run into a directory whose name name matches *.*, which we do when counting the lines in the contents of node_modules. The solution once again is to use stat to check if something is a file or not before trying to read it. And since stat returns an object that doesn’t include the file’s name, we create another temporary object to pass information down the chain of thens.

import glob from 'glob-promise'
import fs from 'fs-extra-promise'

const main = (srcDir) => {
  glob(`${srcDir}/**/*.*`)
    .then(files => Promise.all(files.map(f => statPair(f))))
    .then(files => files.filter(pair => pair.stats.isFile()))
    .then(files => files.map(pair => pair.filename))
    .then(files => Promise.all(files.map(f => lineCount(f))))
    .then(counts => counts.forEach(
      c => console.log(`${c.lines}: ${c.name}`)))
    .catch(err => console.log(err.message))
}

const statPair = (filename) => {
  return new Promise((resolve, reject) => {
    fs.statAsync(filename)
      .then(stats => resolve({ filename, stats }))
      .catch(err => reject(err))
  })
}

const lineCount = (filename) => {
  return new Promise((resolve, reject) => {
    fs.readFileAsync(filename, { encoding: 'utf-8' })
      .then(data => resolve({
        name: filename,
        lines: data.split('\n').length - 1
      }))
      .catch(err => reject(err))
  })
}

const srcDir = process.argv[2]
main(srcDir)
node count-lines-with-stat.js .
10: ./assign-immediately.js
1: ./assign-immediately.out
12: ./await-fs.js
4: ./await-fs.out
1: ./await-fs.sh
...
3: ./x-multiple-catch/example.js
2: ./x-multiple-catch/example.txt
5: ./x-trace-load.md
2: ./x-trace-load/config.yml
14: ./x-trace-load/example.js

This code is complex, but much simpler than it would be if we were using callbacks.

Lining things up

This code uses the expression {filename, stats} to create an object whose keys are filename and stats, and whose values are the values of the corresponding variables. Doing this makes the code easier to read, both because it’s shorter but also because it signals that the value associated with the key filename is exactly the value of the variable with the same name.

Section 3.6: How can we make this more readable?

Promises eliminate the deep nesting associated with callbacks of callbacks, but they are still hard to follow. The latest versions of JavaScript provide two new keywords async and await to flatten code further. async means “this function implicitly returns a promise”, while await means “wait for a promise to resolve”. This short program uses both keywords to print the first ten characters of a file:

import fs from 'fs-extra-promise'

const firstTenCharacters = async (filename) => {
  const text = await fs.readFileAsync(filename, 'utf-8')
  console.log(`inside, raw text is ${text.length} characters long`)
  return text.slice(0, 10)
}

console.log('about to call')
const result = firstTenCharacters(process.argv[2])
console.log(`function result has type ${result.constructor.name}`)
result.then(value => console.log(`outside, final result is "${value}"`))
about to call
function result has type Promise
inside, raw text is 24 characters long
outside, final result is "Begin at t"

Translating code

When Node sees await and async it silently converts the code to use promises with then, resolve, and reject; we will see how this works in Chapter 15. In order to provide a context for this transformation we must put await inside a function that is declared to be async: we can’t simply write await fs.statAsync(...) at the top level of our program outside a function. This requirement is occasionally annoying, but since we should be putting our code in functions anyway it’s hard to complain.

To see how much cleaner our code is with await and async, let’s rewrite our line counting program to use them. First, we modify the two helper functions to look like they’re waiting for results and returning them. They actually wrap their results in promises and return those, but Node now takes care of that for us:

const statPair = async (filename) => {
  const stats = await fs.statAsync(filename)
  return { filename, stats }
}

const lineCount = async (filename) => {
  const data = await fs.readFileAsync(filename, 'utf-8')
  return {
    filename,
    lines: data.split('\n').length - 1
  }
}

Next, we modify main to wait for things to complete. We must still use Promise.all to handle the promises that are counting lines for individual files, but the result is less cluttered than our previous version.

const main = async (srcDir) => {
  const files = await glob(`${srcDir}/**/*.*`)
  const pairs = await Promise.all(
    files.map(async filename => await statPair(filename))
  )
  const filtered = pairs
    .filter(pair => pair.stats.isFile())
    .map(pair => pair.filename)
  const counts = await Promise.all(
    filtered.map(async name => await lineCount(name))
  )
  counts.forEach(
    ({ filename, lines }) => console.log(`${lines}: ${filename}`)
  )
}

const srcDir = process.argv[2]
main(srcDir)

Section 3.7: How can we handle errors with asynchronous code?

We created several intermediate variables in the line-counting program to make the steps clearer. Doing this also helps with error handling; to see how, we will build up an example in stages.

First, if we return a promise that fails without using await, then our main function will finish running before the error occurs, and our try/catch doesn’t help us (Figure 3.6):

async function returnImmediately () {
  try {
    return Promise.reject(new Error('deliberate'))
  } catch (err) {
    console.log('caught exception')
  }
}

returnImmediately()
/u/stjs/async-programming/return-immediately.js:3
Handling asynchronous errors
Figure 3.6: Wrong and right ways to handle errors in asynchronous code.

One solution to this problem is to be consistent and always return something. Because the function is declared async, the Error in the code below is automatically wrapped in a promise so we can use .then and .catch to handle it as before:

async function returnImmediately () {
  try {
    return Promise.reject(new Error('deliberate'))
  } catch (err) {
    return new Error('caught exception')
  }
}

const result = returnImmediately()
result.catch(err => console.log(`caller caught ${err}`))
caller caught Error: deliberate

If instead we return await, the function waits until the promise runs before returning. The promise is turned into an exception because it failed, and since we’re inside the scope of our try/catch block, everything works as we want:

async function returnAwait () {
  try {
    return await Promise.reject(new Error('deliberate'))
  } catch (err) {
    console.log('caught exception')
  }
}

returnAwait()
caught exception

We prefer the second approach, but whichever you choose, please be consistent.

Section 3.8: Exercises

Immediate versus next tick

What is the difference between setImmediate and process.nextTick? When would you use each one?

Tracing promise execution

  1. What does this code print and why?

    Promise.resolve('hello')
    
  2. What does this code print and why?

    Promise.resolve('hello').then(result => console.log(result))
    
  3. What does this code print and why?

    const p = new Promise((resolve, reject) => resolve('hello'))
      .then(result => console.log(result))
    

Hint: try each snippet of code interactively in the Node interpreter and as a command-line script.

Multiple catches

Suppose we create a promise that deliberately fails and then add two error handlers:

const oops = new Promise((resolve, reject) => reject(new Error('failure')))
oops.catch(err => console.log(err.message))
oops.catch(err => console.log(err.message))

When the code is run it produces:

failure
failure
  1. Trace the order of operations: what is created and when is it executed?
  2. What happens if we run these same lines interactively? Why do we see something different than what we see when we run this file from the command line?

Then after catch

Suppose we create a promise that deliberately fails and attach both then and catch to it:

new Promise((resolve, reject) => reject(new Error('failure')))
  .catch(err => console.log(err))
  .then(err => console.log(err))

When the code is run it produces:

Error: failure
    at /u/stjs/promises/catch-then/example.js:1:41
    at new Promise (<anonymous>)
    at Object.<anonymous> (/u/stjs/promises/catch-then/example.js:1:1)
    at Module._compile (internal/modules/cjs/loader.js:1151:30)
    at Object.Module._extensions..js \
 (internal/modules/cjs/loader.js:1171:10)
    at Module.load (internal/modules/cjs/loader.js:1000:32)
    at Function.Module._load (internal/modules/cjs/loader.js:899:14)
    at Function.executeUserEntryPoint [as runMain] \
 (internal/modules/run_main.js:71:12)
    at internal/main/run_main_module.js:17:47
undefined
  1. Trace the order of execution.
  2. Why is undefined printed at the end?

Head and tail

The Unix head command shows the first few lines of one or more files, while the tail command shows the last few. Write programs head.js and tail.js that do the same things using promises and async/await, so that:

node head.js 5 first.txt second.txt third.txt

prints the first five lines of each of the three files and:

node tail.js 5 first.txt second.txt third.txt

prints the last five lines of each file.

Histogram of line counts

Extend count-lines-with-stat-async.js to create a program lh.js that prints two columns of output: the number of lines in one or more files and the number of files that are that long. For example, if we run:

node lh.js promises/*.*

the output might be:

Length Number of Files
1 7
3 3
4 3
6 7
8 2
12 2
13 1
15 1
17 2
20 1
24 1
35 2
37 3
38 1
171 1

Select matching lines

Using async and await, write a program called match.js that finds and prints lines containing a given string. For example:

node match.js Toronto first.txt second.txt third.txt

would print all of the lines from the three files that contain the word “Toronto”.

Find lines in all files

Using async and await, write a program called in-all.js that finds and prints lines found in all of its input files. For example:

node in-all.js first.txt second.txt third.txt

will print those lines that occur in all three files.

Find differences between two files

Using async and await, write a program called file-diff.js that compares the lines in two files and shows which ones are only in the first file, which are only in the second, and which are in both. For example, if left.txt contains:

some
people

and right.txt contains:

write
some
code

then:

node file-diff.js left.txt right.txt

would print:

2 code
1 people
* some
2 write

where 1, 2, and * show whether lines are in only the first or second file or are in both. Note that the order of the lines in the file doesn’t matter.

Hint: you may want to use the Set class to store lines.

Trace file loading

Suppose we are loading a YAML configuration file using the promisified version of the fs library. In what order do the print statements in this test program appear and why?

import fs from 'fs-extra-promise'
import yaml from 'js-yaml'

const test = async () => {
  const raw = await fs.readFileAsync('config.yml', 'utf-8')
  console.log('inside test, raw text', raw)
  const cooked = yaml.safeLoad(raw)
  console.log('inside test, cooked configuration', cooked)
  return cooked
}

const result = test()
console.log('outside test, result is', result.constructor.name)
result.then(something => console.log('outside test we have', something))

Any and all

  1. Add a method Pledge.any that takes an array of pledges and as soon as one of the pledges in the array resolves, returns a single promise that resolves with the value from that pledge.

  2. Add another method Pledge.all that takes an array of pledges and returns a single promise that resolves to an array containing the final values of all of those pledges.

This article may be helpful.