Chapter 17: Module Bundler

Terms defined: entry point, module bundler, transitive closure

JavaScript was designed in a hurry 25 years ago to make web pages interactive. Nobody realized it would become so popular, so it didn’t include support for things that large programs need. One of those things was a way to turn a set of source files into a single file so that browsers could load what they needed with one request.

A module bundler finds all the files that an application depends on and combines them into a single loadable file (Figure 17.1). This file is much more efficient to load: it’s the same number of bytes but just one network request. (See Table 2.1 for a reminder of why this is important.) Bundling files also tests that dependencies actually resolve so that the application has at least a chance of being able to run.

Bundling modules — Figure 17.1: Combining multiple modules into one.

Bundling requires an entry point, i.e., a place to start searching for dependencies. Given that, it finds all dependencies, combines them into one file, and ensures they can find each other correctly once loaded. The sections below go through these steps one by one.

Section 17.1: What will we use as test cases?

Our first test case is a single file that doesn’t require anything:

const main = () => {
  console.log('in main')
}

module.exports = main

in main

For our second test, main.js requires other.js:

const other = require('./other')

const main = () => {
  console.log(other('main'))
}

module.exports = main

and other.js doesn’t require anything:

const other = require('./other')

const main = () => {
  console.log(other('main'))
}

module.exports = main

The output we expect is:

other called from main

Why `require`?

Our tests cases use the old-style require function and assign things that are to be visible outside the module to module.exports rather than using import and export. We tried writing the chapter using the latter, but kept stumbling over whether we were talking about import in Node’s module loader or the import we were building. This kind of confusion is common when building programming tools; we hope that splitting terminology as we have will help.

Our third test case has multiple inclusions in multiple directories and is shown in Figure 17.2:

./main requires all four of the files below.
./top-left doesn’t require anything.
./top-right requires top-left and bottom-right.
./subdir/bottom-left also requires top-left and bottom-right.
./subdir/bottom-right doesn’t require anything.

Module bundler dependencies — Figure 17.2: Dependencies in large module bundler test case.

The main program is:

// main.js

const topLeft = require('./top-left')
const topRight = require('./top-right')
const bottomLeft = require('./subdir/bottom-left')
const bottomRight = require('./subdir/bottom-right')

const main = () => {
  const functions = [topLeft, topRight, bottomLeft, bottomRight]
  functions.forEach(func => {
    console.log(`${func('main')}`)
  })
}

module.exports = main

and the other four files use require and module.exports to get what they need. The output we expect is:

topLeft from main
topRight from main with topLeft from topRight and bottomRight from \
 topRight
bottomLeft from main with topLeft from bottomLeft and bottomRight from \
 bottomLeft
bottomRight from main

We do not handle circular dependencies because require itself doesn’t (Chapter 13).

Section 17.2: How can we find dependencies?

To get all the dependencies for one source file, we parse it and extract all of the calls to require. The code to do this is relatively straightforward given what we know about Acorn:

import acorn from 'acorn'
import fs from 'fs'
import walk from 'acorn-walk'

const getRequires = (filename) => {
  const entryPointFile = filename
  const text = fs.readFileSync(entryPointFile, 'utf-8')
  const ast = acorn.parse(text)
  const requires = []
  walk.simple(ast, {
    CallExpression: (node, state) => {
      if ((node.callee.type === 'Identifier') &&
          (node.callee.name === 'require')) {
        state.push(node.arguments[0].value)
      }
    }
  }, null, requires)
  return requires
}

export default getRequires

import getRequires from './get-requires.js'

const result = getRequires(process.argv[2])
console.log(result)

node test-get-requires.js simple/main.js

[ './other' ]

An unsolvable problem

The dependency finder shown above gives the right answer for reasonable JavaScript programs, but not all JavaScript is reasonable. Suppose creates an alias for require and uses that to load other files:

const req = require
const weWillMissThis = req('./other-file')

We could try to trace variable assignments to catch cases like these, but someone could still fool us by writing this:

const clever = eval(`require`)
const weWillMissThisToo = clever('./other-file')

There is no general solution to this problem other than running the code to see what it does. If you would like to understand why not, and learn about a pivotal moment in the history of computing, we highly recommend [Petzold2008].

To get all of the dependencies a bundle needs we need to find the transitive closure of the entry point’s dependencies, i.e., the requirements of the requirements and so on recursively. Our algorithm for doing this uses two sets: pending, which contains the things we haven’t looked at yet, and seen, which contains the things we have (Figure 17.3). pending initially contains the entry point file and seen is initially empty. We keep taking items from pending until it is empty. If the current thing is already in seen we do nothing; otherwise we get its dependencies and add them to either seen or pending.

Figure 17.3: Implementing transitive closure using two sets.

Finding dependencies is complicated by the fact that we can load something under different names, such as ./subdir/bottom-left from main but ./bottom-left from ./subdir/bottom-right. As with the module loader in Chapter 13, we use absolute paths as unique identifiers. Our code is also complicated by the fact that JavaScript’s Set class doesn’t have an equivalent of Array.pop, so we will actually maintain the “set” of pending items as a list. The resulting code is:

import path from 'path'

import getRequires from './get-requires.js'

const transitiveClosure = (entryPointPath) => {
  const pending = [path.resolve(entryPointPath)]
  const filenames = new Set()
  while (pending.length > 0) {
    const candidate = path.resolve(pending.pop())
    if (filenames.has(candidate)) {
      continue
    }
    filenames.add(candidate)
    const candidateDir = path.dirname(candidate)
    getRequires(candidate)
      .map(raw => path.resolve(path.join(candidateDir, `${raw}.js`)))
      .filter(cooked => !filenames.has(cooked))
      .forEach(cooked => pending.push(cooked))
  }
  return [...filenames]
}

export default transitiveClosure

import transitiveClosure from './transitive-closure-only.js'

const result = transitiveClosure(process.argv[2])
console.log(JSON.stringify(result, null, 2))

node test-transitive-closure-only.js full/main.js

[
  "/u/stjs/module-bundler/full/main.js",
  "/u/stjs/module-bundler/full/subdir/bottom-right.js",
  "/u/stjs/module-bundler/full/subdir/bottom-left.js",
  "/u/stjs/module-bundler/full/top-left.js",
  "/u/stjs/module-bundler/full/top-right.js"
]

This works, but it isn’t keeping track of the mapping from required names within files to absolute paths, so when one of the files in our bundle tries to access something, we might not know what it’s after. The fix is to modify transitive closure to construct and return a two-level structure. The primary keys are the absolute paths to the files being required, while sub-keys are the paths they refer to when loading things (Figure 17.4).

Data structure for modules — Figure 17.4: Data structure used to map names to absolute paths.

Adding this takes our transitive closure code from 23 lines to 28 lines:

import path from 'path'
import getRequires from './get-requires.js'

const transitiveClosure = (entryPointPath) => {
  const mapping = {}
  const pending = [path.resolve(entryPointPath)]
  const filenames = new Set()
  while (pending.length > 0) {
    const candidate = path.resolve(pending.pop())
    if (filenames.has(candidate)) {
      continue
    }
    filenames.add(candidate)
    mapping[candidate] = {}
    const candidateDir = path.dirname(candidate)
    getRequires(candidate)
      .map(raw => {
        mapping[candidate][raw] =
          path.resolve(path.join(candidateDir, `${raw}.js`))
        return mapping[candidate][raw]
      })
      .filter(cooked => cooked !== null)
      .forEach(cooked => pending.push(cooked))
  }
  return mapping
}

export default transitiveClosure

import transitiveClosure from './transitive-closure.js'

const result = transitiveClosure(process.argv[2])
console.log(JSON.stringify(result, null, 2))

node test-transitive-closure.js full/main.js

{
  "/u/stjs/module-bundler/full/main.js": {
    "./top-left": "/u/stjs/module-bundler/full/top-left.js",
    "./top-right": "/u/stjs/module-bundler/full/top-right.js",
    "./subdir/bottom-left": \
    "/u/stjs/module-bundler/full/subdir/bottom-left.js",
    "./subdir/bottom-right": \
    "/u/stjs/module-bundler/full/subdir/bottom-right.js"
  },
  "/u/stjs/module-bundler/full/subdir/bottom-right.js": {},
  "/u/stjs/module-bundler/full/subdir/bottom-left.js": {
    "../top-left": "/u/stjs/module-bundler/full/top-left.js",
    "./bottom-right": \
    "/u/stjs/module-bundler/full/subdir/bottom-right.js"
  },
  "/u/stjs/module-bundler/full/top-left.js": {},
  "/u/stjs/module-bundler/full/top-right.js": {
    "./top-left": "/u/stjs/module-bundler/full/top-left.js",
    "./subdir/bottom-right": \
    "/u/stjs/module-bundler/full/subdir/bottom-right.js"
  }
}

The real cost, though, is the extra complexity of the data structure: it took a couple of tries to get it right, and it will be harder for the next person to understand than the original. Comprehension and maintenance would be a little easier if we could draw diagrams directly in our source code, but as long as we insist that our programs be stored in a punchcard-compatible format (i.e., as lines of text), that will remain a dream.

Section 17.3: How can we safely combine several files into one?

We now need to combine the files we have found into one while keeping each in its own namespace. We do this using the same method we used in Chapter 13: wrap the source code in an IIFE, giving that IIFE a module object to fill in and an implementation of require to resolve dependencies within the bundle. For example, suppose we have this file:

const main = () => {
  console.log('in main')
}

module.exports = main

The wrapped version will look like this:

const wrapper = (module, require) => {
  const main = () => {
    console.log('in main')
  }

  module.exports = main
}

And we can test it like this:

const wrapper = (module, require) => {
  const main = () => {
    console.log('in main')
  }

  module.exports = main
}

const _require = (name) => null
const temp = {}
wrapper(temp, _require)
temp.exports()

in main

We need to do this for multiple files, so we will put these IIFEs in a lookup table that uses the files’ absolute paths as its keys. We will also wrap loading in a function so that we don’t accidentally step on anyone else’s toys:

import fs from 'fs'
import path from 'path'

const HEAD = `const initialize = (creators) => {
`

const TAIL = `
}
`

const combineFiles = (allFilenames) => {
  const body = allFilenames
    .map(filename => {
      const key = path.resolve(filename)
      const source = fs.readFileSync(filename, 'utf-8')
      const func = `(module, require) => {${source}}`
      const entry = `creators.set('${key}',\n${func})`
      return `// ${key}\n${entry}\n`
    })
    .join('\n')
  const func = `${HEAD}\n${body}\n${TAIL}`
  return func
}

export default combineFiles

Breaking this down, the code in HEAD creates a function of no arguments while the code in TAIL returns the lookup table from that function. In between, combineFiles adds an entry to the lookup table for each file (Figure 17.5).

Assembling runnable code — Figure 17.5: Assembling fragments and modules to create a bundle.

We can test that this works in our two-file case:

import combineFiles from './combine-files.js'

console.log(combineFiles(process.argv.slice(2)))

const initialize = (creators) => {

// /u/stjs/stjs/module-bundler/simple/main.js
creators.set('/u/stjs/stjs/module-bundler/simple/main.js',
(module, require) => {const other = require('./other')

const main = () => {
  console.log(other('main'))
}

module.exports = main
})

// /u/stjs/stjs/module-bundler/simple/other.js
creators.set('/u/stjs/stjs/module-bundler/simple/other.js',
(module, require) => {const other = (caller) => {
  return `other called from ${caller}`
}

module.exports = other
})


}

and then load the result and call initialize:

Map(2) {
  '/u/stjs/module-bundler/simple/main.js' => [Function (anonymous)],
  '/u/stjs/module-bundler/simple/other.js' => [Function (anonymous)]
}

Section 17.4: How can files access each other?

The code we have built so far has not created our exports; instead, it has built a lookup table of functions that can create what we asked for. More specifically we have:

a lookup table from absolute filenames to functions that create module exports;
a lookup table from the importer’s absolute filename to pairs storing the name of the required file as it was written and the required file’s absolute filename; and
an entry point.

To turn this into what we want, we must look up the function associated with the entry point and run it, giving it an empty module object and a require function that we will describe below, then get the exports it has added to that module object. Our replacement for require is only allowed to take one argument (because that’s all that JavaScript’s require takes). However, it actually needs four things: the argument to the user’s require call, the absolute path of the file making the call, and the two lookup tables described above. Those two tables can’t be global variables because of possible name collisions: no matter what we call them, the user might have given a variable the same name.

As in Chapter 13 we solve this problem using closures. The result is probably the most difficult code in this book to understand because of its many levels of abstraction. First, we write a function that takes the two tables as arguments and returns a function that takes an absolute path identifying this module. When that function is called, it creates and returns a function that takes a local path inside a module and returns the exports. Each of these wrapping layers remembers more information for us (Figure 17.6), but we won’t pretend that it’s easy to trace.

Functions returning functions returning functions — Figure 17.6: A function that returns functions that return functions.

We also need a third structure: a cache for the modules we’ve already loaded. Putting it all together we have:

import fs from 'fs'
import path from 'path'

import transitiveClosure from './transitive-closure.js'

const HEAD = `const creators = new Map()
const cache = new Map()

const makeRequire = (absPath) => {
  return (localPath) => {
    const actualKey = translate[absPath][localPath]
    if (!cache.has(actualKey)) {
      const m = {}
      creators.get(actualKey)(m)
      cache.set(actualKey, m.exports)
    }
    return cache.get(actualKey)
  }
}

const initialize = (creators) => {
`

const TAIL = `
}

initialize(creators)
`

const makeProof = (entryPoint) => `
const start = creators.get('${entryPoint}')
const m = {}
start(m)
m.exports()
`

const createBundle = (entryPoint) => {
  entryPoint = path.resolve(entryPoint)
  const table = transitiveClosure(entryPoint)
  const translate = `const translate = ${JSON.stringify(table, null, 2)}`
  const creators = Object.keys(table).map(filename => makeCreator(filename))
  const proof = makeProof(entryPoint)
  return [
    translate,
    HEAD,
    ...creators,
    TAIL,
    proof
  ].join('\n')
}

const makeCreator = (filename) => {
  const key = path.resolve(filename)
  const source = fs.readFileSync(filename, 'utf-8')
  const func = `(module, require = makeRequire('${key}')) =>\n{${source}}`
  const entry = `creators.set('${key}',\n${func})`
  return `// ${key}\n${entry}\n`
}

export default createBundle

This code is hard to read because we have to distinguish what is being printed in the output versus what is being executed right now and because of the levels of nesting needed to capture variables safely. Getting this right took much more time per line of finished code than anything we have seen so far except the promises in Chapter 3. However, it is all intrinsic complexity: anything that does what require does is going to be equally convoluted.

To prove that our code works, we will look up the function main in the first file and call it. (If we were loading in the browser, we’d capture the exports in a variable for later use.) First, we create the bundled file:

echo '
node test-create-bundle.js single/main.js >> bundle-single.js

const translate = {
  "/u/stjs/stjs/module-bundler/single/main.js": {}
}
const creators = new Map()
const cache = new Map()

const makeRequire = (absPath) => {
  return (localPath) => {
    const actualKey = translate[absPath][localPath]
    if (!cache.has(actualKey)) {
      const m = {}
      creators.get(actualKey)(m)
      cache.set(actualKey, m.exports)
    }
    return cache.get(actualKey)
  }
}

const initialize = (creators) => {

// /u/stjs/stjs/module-bundler/single/main.js
creators.set('/u/stjs/stjs/module-bundler/single/main.js',
(module, require =
makeRequire('/u/stjs/stjs/module-bundler/single/main.js')) =>
{const main = () => {
  console.log('in main')
}

module.exports = main
})


}

initialize(creators)


const start = creators.get('/u/stjs/stjs/module-bundler/single/main.js')
const m = {}
start(m)
m.exports()

and then we run it:

in main

That was a lot of work to print one line, but what we have should work for other files. The two-file case with main and other works:

const translate = {
  "/u/stjs/stjs/module-bundler/simple/main.js": {
    "./other": "/u/stjs/stjs/module-bundler/simple/other.js"
  },
  "/u/stjs/stjs/module-bundler/simple/other.js": {}
}
const creators = new Map()
const cache = new Map()

const makeRequire = (absPath) => {
  return (localPath) => {
    const actualKey = translate[absPath][localPath]
    if (!cache.has(actualKey)) {
      const m = {}
      creators.get(actualKey)(m)
      cache.set(actualKey, m.exports)
    }
    return cache.get(actualKey)
  }
}

const initialize = (creators) => {

// /u/stjs/stjs/module-bundler/simple/main.js
creators.set('/u/stjs/stjs/module-bundler/simple/main.js',
(module, require =
makeRequire('/u/stjs/stjs/module-bundler/simple/main.js')) =>
{const other = require('./other')

const main = () => {
  console.log(other('main'))
}

module.exports = main
})

// /u/stjs/stjs/module-bundler/simple/other.js
creators.set('/u/stjs/stjs/module-bundler/simple/other.js',
(module, require =
makeRequire('/u/stjs/stjs/module-bundler/simple/other.js')) =>
{const other = (caller) => {
  return `other called from ${caller}`
}

module.exports = other
})


}

initialize(creators)


const start = creators.get('/u/stjs/stjs/module-bundler/simple/main.js')
const m = {}
start(m)
m.exports()

other called from main

and so does our most complicated test with main and four other files:

topLeft from main
topRight from main with topLeft from topRight and bottomRight from \
topRight
bottomLeft from main with topLeft from bottomLeft and bottomRight from \
bottomLeft
bottomRight from main

Section 17.5: Exercises

Using test-driven development

Suppose we wanted to compress the files being stored by the file backup system in Chapter 5 instead of copying them as-is. What tests would you write before adding this feature in order to ensure that it worked correctly once it was implemented?

Finding `import` dependencies

Modify the dependency finder to work with import statements instead of require calls.

Track files using hashes

Modify the dependency finder to track files by hashing them instead of relying on paths, so that if exactly the same file is being required from two locations, only one copy is loaded.

Using asynchronous file operations

Modify the dependency finder to use async and await instead of synchronous file operations.

Unit testing transitive closure

Write unit tests for the tool that finds the transitive closure of files’ requirements using Mocha and mock-fs. (Rather than parsing JavaScript files in the mock filesystem, have each file contain only a list of the names of the files it depends on.)

Exporting multiple functions

Create test cases for the module bundler in which files export more than one function and fix any bugs in the module bundler that they uncover.

Checking integrity

Write a function that checks the integrity of the data structure returned by the transitive closure routine, i.e., that makes sure every cross-reference resolves correctly.

Logging module loading

Write a function called logLoad that takes a module name as an argument and prints a message using console.error saying that the module has been loaded.
Modify the bundle generator to insert calls to this function to report when modules are actually loaded.

Tracing execution

Trace the execution of every function called when the main function in the full bundle is called.

Making bundles more readable

Modify the bundle creator to make its output more readable, e.g., by adding comments and indentation. (This does not matter to the computer, but can help debugging.)