Chapter 13: Module Loader

Terms defined: absolute path, alias, circular dependency, closure, directed graph, encapsulate, immediately-invoked function expression, inner function, Least Recently Used cache, namespace, plugin architecture

Chapter 12 showed how to use eval to load code dynamically. We can use this to build our own version of JavaScript’s require function. Our function will take the name of a source file as an argument and return whatever that file exports. The key requirement for such a function is to avoid accidentally overwriting things: if we just eval some code and it happens to assign to a variable called x, anything called x already in our program might be overwritten. We therefore need a way to encapsulate the contents of what we’re loading. Our approach is based on [Casciaro2020], which contains a lot of other useful information as well.

Section 13.1: How can we implement namespaces?

A namespace is a collection of names in a program that are isolated from other namespaces. Most modern languages provide namespaces as a built-in feature so that programmers don’t accidentally step on each other’s toes. JavaScript doesn’t, so we have to implement them ourselves.

We can do this using closures. Every function is a namespace: variables defined inside the function are distinct from variables defined outside it (Figure 13.1). If we create the variables we want to manage inside a function, then define another function inside the first and return that inner function, that inner function will be the only thing with references to those variables.

How closures work
Figure 13.1: Using closures to create private variables.

For example, let’s create a function that always appends the same string to its argument:

const createAppender = (suffix) => {
  const appender = (text) => {
    return text + suffix
  }
  return appender
}

const exampleFunction = createAppender(' and that')
console.log(exampleFunction('this'))
console.log('suffix is', suffix)

When we run it, the value that was assigned to the parameter suffix still exists but can only be reached by the inner function:

this and that
/u/stjs/module-loader/manual-namespacing.js:10
console.log('suffix is', suffix)
                         ^

ReferenceError: suffix is not defined
    at /u/stjs/module-loader/manual-namespacing.js:10:26
    at ModuleJob.run (internal/modules/esm/module_job.js:152:23)
    at async Loader.import (internal/modules/esm/loader.js:166:24)
    at async Object.loadESM (internal/process/esm_loader.js:68:5)

We could require every module to define a setup function like this for users to call, but thanks to eval we can wrap the file’s contents in a function and call it automatically. To do this we will create something called an immediately-invoked function expression (IIFE). The syntax () => {...} defines a function. If we put the definition in parentheses and then put another pair of parentheses right after it:

(() => {...})()

we have code that defines a function of no arguments and immediately calls it. We can use this trick to achieve the same effect as the previous example in one step:

const contents = (() => {
  const privateValue = 'private value'
  const publicValue = 'public value'
  return { publicValue }
})()

console.log(`contents.publicValue is ${contents.publicValue}`)
console.log(`contents.privateValue is ${contents.privateValue}`)
contents.publicValue is public value
contents.privateValue is undefined

Unconfusing the parser

The extra parentheses around the original definition force the parser to evaluate things in the right order; if we write:

() => {...}()

then JavaScript interprets it as a function definition followed by an empty expression rather than an immediate call to the function just defined.

Section 13.2: How can we load a module?

We want the module we are loading to export names by assigning to module.exports just as require does, so we need to provide an object called module and create a IIFE. (We will handle the problem of the module loading other modules later.) Our loadModule function takes a filename and returns a newly created module object; the parameter to the function we build and eval must be called module so that we can assign to module.exports. For clarity, we call the object we pass in result in loadModule.

import fs from 'fs'

const loadModule = (filename) => {
  const source = fs.readFileSync(filename, 'utf-8')
  const result = {}
  const fullText = `((module) => {${source}})(result)`
  console.log(`full text for eval:\n${fullText}\n`)
  eval(fullText)
  return result.exports
}

export default loadModule
Implementing modules with IIFEs (part 1)
Figure 13.2: Using IIFEs to encapsulate modules and get their exports (part 1).
Implementing modules with IIFEs (part 2)
Figure 13.3: Using IIFEs to encapsulate modules and get their exports (part 2).

Figure 13.2 and Figure 13.3 show the structure of our loader so far. We can use this code as a test:

const publicValue = 'public value'

const privateValue = 'private value'

const publicFunction = (caller) => {
  return `publicFunction called from ${caller}`
}

module.exports = { publicValue, publicFunction }

and this short program to load the test and check its exports:

import loadModule from './load-module-only.js'

const result = loadModule(process.argv[2])
console.log(`result.publicValue is ${result.publicValue}`)
console.log(`result.privateValue is ${result.privateValue}`)
console.log(result.publicFunction('main'))
node test-load-module-only.js small-module.js
full text for eval:
((module) => {const publicValue = 'public value'

const privateValue = 'private value'

const publicFunction = (caller) => {
  return `publicFunction called from ${caller}`
}

module.exports = { publicValue, publicFunction }
})(result)

result.publicValue is public value
result.privateValue is undefined
publicFunction called from main

Section 13.3: Do we need to handle circular dependencies?

What if the code we are loading loads other code? We can visualize the network of who requires whom as a directed graph: if X requires Y, we draw an arrow from X to Y. Unlike the directed acyclic graphs we met in Chapter 10, though, these graphs can contain cycles: we say a circular dependency exists if X depends on Y and Y depends on X either directly or indirectly. This may seem nonsensical, but can easily arise with plugin architectures: the file containing the main program loads an extension, and that extension calls utility functions defined in the file containing the main program.

Most compiled languages can handle circular dependencies easily: they compile each module into low-level instructions, then link those to resolve dependencies before running anything (Figure 13.4). But interpreted languages usually run code as they’re loading it, so if X is in the process of loading Y and Y tries to call X, X may not (fully) exist yet.

Circularity test case
Figure 13.4: Testing circular imports.

Circular dependencies work in Python, but only sort of. Let’s create two files called major.py and minor.py:

# major.py

import minor

def top():
    print("top")
    minor.middle()

def bottom():
    print("bottom")

top()
# minor.py

import major

def middle():
    print("middle")
    major.bottom()

Loading fails when we run major.py from the command line:

top
Traceback (most recent call last):
  File "major.py", line 3, in <module>
    import minor
  File "/u/stjs/module-loader/checking/minor.py", line 3, in <module>
    import major
  File "/u/stjs/module-loader/checking/major.py", line 12, in <module>
    top()
  File "/u/stjs/module-loader/checking/major.py", line 7, in top
    minor.middle()
AttributeError: module 'minor' has no attribute 'middle'

but works in the interactive interpreter:

$ python
>>> import major
top
middle
bottom

The equivalent test in JavaScript also has two files:

// major.js
const { middle } = require('./minor')

const top = () => {
  console.log('top')
  middle()
}

const bottom = () => {
  console.log('bottom')
}

top()

module.exports = { top, bottom }
// minor.js
const { bottom } = require('./major')

const middle = () => {
  console.log('middle')
  bottom()
}

module.exports = { middle }

It fails on the command line:

top
middle
/u/stjs/module-loader/checking/minor.js:6
  bottom()
  ^

TypeError: bottom is not a function
    at middle (/u/stjs/module-loader/checking/minor.js:6:3)
    at top (/u/stjs/module-loader/checking/major.js:6:3)
    at Object.<anonymous> (/u/stjs/module-loader/checking/major.js:13:1)
    at Module._compile (internal/modules/cjs/loader.js:1063:30)
    at Object.Module._extensions..js \
 (internal/modules/cjs/loader.js:1092:10)
    at Module.load (internal/modules/cjs/loader.js:928:32)
    at Function.Module._load (internal/modules/cjs/loader.js:769:14)
    at Function.executeUserEntryPoint [as runMain] \
 (internal/modules/run_main.js:72:12)
    at internal/main/run_main_module.js:17:47

and also fails in the interactive interpreter (which is more consistent):

$ node
> require('./major')
top
middle
/u/stjs/module-loader/checking/minor.js:6
  bottom()
  ^

TypeError: bottom is not a function
    at middle (/u/stjs/module-loader/checking/minor.js:6:3)
    at top (/u/stjs/module-loader/checking/major.js:6:3)
    at Object.<anonymous> (/u/stjs/module-loader/checking/major.js:13:1)
    at Module._compile (internal/modules/cjs/loader.js:1063:30)
    at Object.Module._extensions..js \
 (internal/modules/cjs/loader.js:1092:10)
    at Module.load (internal/modules/cjs/loader.js:928:32)
    at Function.Module._load (internal/modules/cjs/loader.js:769:14)
    at Module.require (internal/modules/cjs/loader.js:952:19)
    at require (internal/modules/cjs/helpers.js:88:18)
    at [stdin]:1:1

We therefore won’t try to handle circular dependencies. However, we will detect them and generate a sensible error message.

import vs. require

Circular dependencies work JavaScript’s import syntax because we can analyze files to determine what needs what, get everything into memory, and then resolve dependencies. We can’t do this with require-based code because someone might create an alias and call require through that or eval a string that contains a require call. (Of course, they can also do these things with the function version of import.)

Section 13.4: How can a module load another module?

While we’re not going to handle circular dependencies, modules do need to be able to load other modules. To enable this, we need to provide the module with a function called require that it can call as it loads. As in Chapter 12, this function checks a cache to see if the file being asked for has already been loaded. If not, it loads it and saves it; either way, it returns the result.

Our cache needs to be careful about how it identifies files so that it can detect duplicates loading attempts that use different names. For example, suppose that major.js loads subdir/first.js and subdir/second.js. When subdir/second.js loads ./first.js, our system needs to realize that it already has that file even though the path looks different. We will use absolute paths as cache keys so that every file has a unique, predictable key.

To reduce confusion, we will call our function need instead of require. In order to make the cache available to modules while they’re loading, we will make it a property of need. (Remember, a function is just another kind of object in JavaScript; every function gets several properties automatically, and we can always add more.) Since we’re using the built-in Map class as a cache, the entire implementation of need is just 15 lines long:

import path from 'path'

import loadModule from './load-module.js'

const need = (name) => {
  const absPath = path.resolve(name)
  if (!need.cache.has(absPath)) {
    const contents = loadModule(absPath, need)
    need.cache.set(absPath, contents)
  }
  return need.cache.get(absPath)
}
need.cache = new Map()

export default need

We now need to modify loadModule to take our function need as a parameter. (Again, we’ll have our modules call need('something.js') instead of require('something') for clarity.) Let’s test it with the same small module that doesn’t need anything else to make sure we haven’t broken anything:

import need from './need.js'

const small = need('small-module.js')
console.log(`small.publicValue is ${small.publicValue}`)
console.log(`small.privateValue is ${small.privateValue}`)
console.log(small.publicFunction('main'))
full text for eval:
((module, need) => {
const publicValue = 'public value'

const privateValue = 'private value'

const publicFunction = (caller) => {
  return `publicFunction called from ${caller}`
}

module.exports = { publicValue, publicFunction }

})(result, need)

small.publicValue is public value
small.privateValue is undefined
publicFunction called from main

What if we test it with a module that does load something else?

import need from './need'

const small = need('small-module.js')

const large = (caller) => {
  console.log(`large from ${caller}`)
  small.publicFunction(`${caller} to large`)
}

export default large
import need from './need.js'

const large = need('large-module.js')
console.log(large.large('main'))
full text for eval:
((module, need) => {
import need from './need'

const small = need('small-module.js')

const large = (caller) => {
  console.log(`large from ${caller}`)
  small.publicFunction(`${caller} to large`)
}

export default large

})(result, need)

undefined:2
import need from './need'
^^^^^^

SyntaxError: Cannot use import statement outside a module
    at loadModule (/u/stjs/module-loader/load-module.js:8:8)
    at need (/u/stjs/module-loader/need.js:8:22)
    at /u/stjs/module-loader/test-need-large-module.js:3:15
    at ModuleJob.run (internal/modules/esm/module_job.js:152:23)
    at async Loader.import (internal/modules/esm/loader.js:166:24)
    at async Object.loadESM (internal/process/esm_loader.js:68:5)

This doesn’t work because import only works at the top level of a program, not inside a function. Our system can therefore only run loaded modules by needing them:

const small = need('small-module.js')

const large = (caller) => {
  return small.publicFunction(`large called from ${caller}`)
}

module.exports = large
import need from './need.js'

const large = need('large-needless.js')
console.log(large('main'))
full text for eval:
((module, need) => {
const small = need('small-module.js')

const large = (caller) => {
  return small.publicFunction(`large called from ${caller}`)
}

module.exports = large

})(result, need)

full text for eval:
((module, need) => {
const publicValue = 'public value'

const privateValue = 'private value'

const publicFunction = (caller) => {
  return `publicFunction called from ${caller}`
}

module.exports = { publicValue, publicFunction }

})(result, need)

publicFunction called from large called from main

“It’s so deep it’s meaningless”

The programs we have written in this chapter are harder to understand than most of the programs in earlier chapters because they are so abstract. Reading through them, it’s easy to get the feeling that everything is happening somewhere else. Programmers’ tools are often like this: there’s always a risk of confusing the thing in the program with the thing the program is working on. Drawing pictures of data structures can help, and so can practicing with closures (which are one of the most powerful ideas in programming), but a lot of the difficulty is irreducible, so don’t feel bad if it takes you a while to wrap your head around it.

Section 13.5: Exercises

Counting with closures

Write a function makeCounter that returns a function that produces the next integer in sequence starting from zero each time it is called. Each function returned by makeCounter must count independently, so:

left = makeCounter()
right = makeCounter()
console.log(`left ${left()`)
console.log(`right ${right()`)
console.log(`left ${left()`)
console.log(`right ${right()`)

must produce:

left 0
right 0
left 1
right 1

Objects and namespaces

A JavaScript object stores key-value pairs, and the keys in one object are separate from the keys in another. Why doesn’t this provide the same level of safety as a closure?

Testing module loading

Write tests for need.js using Mocha and mock-fs.

Using module as a name

What happens if we define the variable module in loadModule so that it is in scope when eval is called rather than creating a variable called result and passing that in:

const loadModule = (filename) => {
  const source = fs.readFileSync(filename, 'utf-8')
  const module = {}
  const fullText = `(() => {${source}})()`
  eval(fullText)
  return module.exports
}

Implementing a search path

Add a search path to need.js so that if a module isn’t found locally, it will be looked for in each directory in the search path in order.

Using a setup function

Rewrite the module loader so that every module has a function called setup that must be called after loading it to create its exports rather than using module.exports.

Handling errors while loading

  1. Modify need.js so that it does something graceful if an exception is thrown while a module is being loaded.

  2. Write unit tests for this using Mocha.

Refactoring circularity

Suppose that main.js contains this:

const PLUGINS = []

const plugin = require('./plugin')

const main = () => {
  PLUGINS.forEach(p => p())
}

const loadPlugin = (plugin) => {
  PLUGINS.push(plugin)
}

module.exports = {
  main,
  loadPlugin
}

and plugin.js contains this:

const { loadPlugin } = require('./main')

const printMessage = () => {
  console.log('running plugin')
}

loadPlugin(printMessage)

Refactor this code so that it works correctly while still using require rather than import.

An LRU cache

A Least Recently Used (LRU) cache reduces access time while limiting the amount of memory used by keeping track of the N items that have been used most recently. For example, if the cache size is 3 and objects are accessed in the order shown in the first column, the cache’s contents will be as shown in the second column:

Item Action Cache After Access
A read A [A]
A get A from cache [A]
B read B [B, A]
A get A from cache [A, B]
C read C [C, A, B]
D read D [D, C, A]
B read B [B, D, C]
  1. Implement a function cachedRead that takes the number of entries in the cache as an argument and returns a function that uses an LRU cache to either read files or return cached copies.

  2. Modify cachedRead so that the number of items in the cache is determined by their combined size rather than by the number of files.

Make functions safe for renaming

Our implementation of need implemented the cache as a property of the function itself.

  1. How can this go wrong? (Hint: thing about aliases.)

  2. Modify the implementation to solve this problem using a closure.