Chapter 13: Module Loader
Terms defined: absolute path, alias, circular dependency, closure, directed graph, encapsulate, immediately-invoked function expression, inner function, Least Recently Used cache, namespace, plugin architecture
Chapter 12 showed how to use eval
to load code dynamically.
We can use this to build our own version of JavaScript’s require
function.
Our function will take the name of a source file as an argument
and return whatever that file exports.
The key requirement for such a function is to avoid accidentally overwriting things:
if we just eval
some code and it happens to assign to a variable called x
,
anything called x
already in our program might be overwritten.
We therefore need a way to encapsulate the contents of what we’re loading.
Our approach is based on [Casciaro2020],
which contains a lot of other useful information as well.
Section 13.1: How can we implement namespaces?
A namespace is a collection of names in a program that are isolated from other namespaces. Most modern languages provide namespaces as a built-in feature so that programmers don’t accidentally step on each other’s toes. JavaScript doesn’t, so we have to implement them ourselves.
We can do this using closures. Every function is a namespace: variables defined inside the function are distinct from variables defined outside it (Figure 13.1). If we create the variables we want to manage inside a function, then define another function inside the first and return that inner function, that inner function will be the only thing with references to those variables.
For example, let’s create a function that always appends the same string to its argument:
const createAppender = (suffix) => {
const appender = (text) => {
return text + suffix
}
return appender
}
const exampleFunction = createAppender(' and that')
console.log(exampleFunction('this'))
console.log('suffix is', suffix)
When we run it,
the value that was assigned to the parameter suffix
still exists
but can only be reached by the inner function:
this and that
/u/stjs/module-loader/manual-namespacing.js:10
console.log('suffix is', suffix)
^
ReferenceError: suffix is not defined
at /u/stjs/module-loader/manual-namespacing.js:10:26
at ModuleJob.run (internal/modules/esm/module_job.js:152:23)
at async Loader.import (internal/modules/esm/loader.js:166:24)
at async Object.loadESM (internal/process/esm_loader.js:68:5)
We could require every module to define a setup function like this for users to call,
but thanks to eval
we can wrap the file’s contents in a function and call it automatically.
To do this we will create something called
an immediately-invoked function expression (IIFE).
The syntax () => {...}
defines a function.
If we put the definition in parentheses and then put another pair of parentheses right after it:
(() => {...})()
we have code that defines a function of no arguments and immediately calls it. We can use this trick to achieve the same effect as the previous example in one step:
const contents = (() => {
const privateValue = 'private value'
const publicValue = 'public value'
return { publicValue }
})()
console.log(`contents.publicValue is ${contents.publicValue}`)
console.log(`contents.privateValue is ${contents.privateValue}`)
contents.publicValue is public value
contents.privateValue is undefined
Unconfusing the parser
The extra parentheses around the original definition force the parser to evaluate things in the right order; if we write:
() => {...}()
then JavaScript interprets it as a function definition followed by an empty expression rather than an immediate call to the function just defined.
Section 13.2: How can we load a module?
We want the module we are loading to export names by assigning to module.exports
just as require
does,
so we need to provide an object called module
and create a IIFE.
(We will handle the problem of the module loading other modules later.)
Our loadModule
function takes a filename and returns a newly created module object;
the parameter to the function we build and eval
must be called module
so that we can assign to module.exports
.
For clarity,
we call the object we pass in result
in loadModule
.
import fs from 'fs'
const loadModule = (filename) => {
const source = fs.readFileSync(filename, 'utf-8')
const result = {}
const fullText = `((module) => {${source}})(result)`
console.log(`full text for eval:\n${fullText}\n`)
eval(fullText)
return result.exports
}
export default loadModule
Figure 13.2 and Figure 13.3 show the structure of our loader so far. We can use this code as a test:
const publicValue = 'public value'
const privateValue = 'private value'
const publicFunction = (caller) => {
return `publicFunction called from ${caller}`
}
module.exports = { publicValue, publicFunction }
and this short program to load the test and check its exports:
import loadModule from './load-module-only.js'
const result = loadModule(process.argv[2])
console.log(`result.publicValue is ${result.publicValue}`)
console.log(`result.privateValue is ${result.privateValue}`)
console.log(result.publicFunction('main'))
node test-load-module-only.js small-module.js
full text for eval:
((module) => {const publicValue = 'public value'
const privateValue = 'private value'
const publicFunction = (caller) => {
return `publicFunction called from ${caller}`
}
module.exports = { publicValue, publicFunction }
})(result)
result.publicValue is public value
result.privateValue is undefined
publicFunction called from main
Section 13.3: Do we need to handle circular dependencies?
What if the code we are loading loads other code? We can visualize the network of who requires whom as a directed graph: if X requires Y, we draw an arrow from X to Y. Unlike the directed acyclic graphs we met in Chapter 10, though, these graphs can contain cycles: we say a circular dependency exists if X depends on Y and Y depends on X either directly or indirectly. This may seem nonsensical, but can easily arise with plugin architectures: the file containing the main program loads an extension, and that extension calls utility functions defined in the file containing the main program.
Most compiled languages can handle circular dependencies easily: they compile each module into low-level instructions, then link those to resolve dependencies before running anything (Figure 13.4). But interpreted languages usually run code as they’re loading it, so if X is in the process of loading Y and Y tries to call X, X may not (fully) exist yet.
Circular dependencies work in Python,
but only sort of.
Let’s create two files called major.py
and minor.py
:
# major.py
import minor
def top():
print("top")
minor.middle()
def bottom():
print("bottom")
top()
# minor.py
import major
def middle():
print("middle")
major.bottom()
Loading fails when we run major.py
from the command line:
top
Traceback (most recent call last):
File "major.py", line 3, in <module>
import minor
File "/u/stjs/module-loader/checking/minor.py", line 3, in <module>
import major
File "/u/stjs/module-loader/checking/major.py", line 12, in <module>
top()
File "/u/stjs/module-loader/checking/major.py", line 7, in top
minor.middle()
AttributeError: module 'minor' has no attribute 'middle'
but works in the interactive interpreter:
$ python
>>> import major
top
middle
bottom
The equivalent test in JavaScript also has two files:
// major.js
const { middle } = require('./minor')
const top = () => {
console.log('top')
middle()
}
const bottom = () => {
console.log('bottom')
}
top()
module.exports = { top, bottom }
// minor.js
const { bottom } = require('./major')
const middle = () => {
console.log('middle')
bottom()
}
module.exports = { middle }
It fails on the command line:
top
middle
/u/stjs/module-loader/checking/minor.js:6
bottom()
^
TypeError: bottom is not a function
at middle (/u/stjs/module-loader/checking/minor.js:6:3)
at top (/u/stjs/module-loader/checking/major.js:6:3)
at Object.<anonymous> (/u/stjs/module-loader/checking/major.js:13:1)
at Module._compile (internal/modules/cjs/loader.js:1063:30)
at Object.Module._extensions..js \
(internal/modules/cjs/loader.js:1092:10)
at Module.load (internal/modules/cjs/loader.js:928:32)
at Function.Module._load (internal/modules/cjs/loader.js:769:14)
at Function.executeUserEntryPoint [as runMain] \
(internal/modules/run_main.js:72:12)
at internal/main/run_main_module.js:17:47
and also fails in the interactive interpreter (which is more consistent):
$ node
> require('./major')
top
middle
/u/stjs/module-loader/checking/minor.js:6
bottom()
^
TypeError: bottom is not a function
at middle (/u/stjs/module-loader/checking/minor.js:6:3)
at top (/u/stjs/module-loader/checking/major.js:6:3)
at Object.<anonymous> (/u/stjs/module-loader/checking/major.js:13:1)
at Module._compile (internal/modules/cjs/loader.js:1063:30)
at Object.Module._extensions..js \
(internal/modules/cjs/loader.js:1092:10)
at Module.load (internal/modules/cjs/loader.js:928:32)
at Function.Module._load (internal/modules/cjs/loader.js:769:14)
at Module.require (internal/modules/cjs/loader.js:952:19)
at require (internal/modules/cjs/helpers.js:88:18)
at [stdin]:1:1
We therefore won’t try to handle circular dependencies. However, we will detect them and generate a sensible error message.
import
vs. require
Circular dependencies work JavaScript’s import
syntax
because we can analyze files to determine what needs what,
get everything into memory,
and then resolve dependencies.
We can’t do this with require
-based code
because someone might create an alias
and call require
through that
or eval
a string that contains a require
call.
(Of course, they can also do these things with the function version of import
.)
Section 13.4: How can a module load another module?
While we’re not going to handle circular dependencies,
modules do need to be able to load other modules.
To enable this,
we need to provide the module with a function called require
that it can call as it loads.
As in Chapter 12,
this function checks a cache
to see if the file being asked for has already been loaded.
If not, it loads it and saves it;
either way, it returns the result.
Our cache needs to be careful about how it identifies files
so that it can detect duplicates loading attempts that use different names.
For example,
suppose that major.js
loads subdir/first.js
and subdir/second.js
.
When subdir/second.js
loads ./first.js
,
our system needs to realize that it already has that file
even though the path looks different.
We will use absolute paths as cache keys
so that every file has a unique, predictable key.
To reduce confusion,
we will call our function need
instead of require
.
In order to make the cache available to modules while they’re loading,
we will make it a property of need
.
(Remember,
a function is just another kind of object in JavaScript;
every function gets several properties automatically,
and we can always add more.)
Since we’re using the built-in Map
class as a cache,
the entire implementation of need
is just 15 lines long:
import path from 'path'
import loadModule from './load-module.js'
const need = (name) => {
const absPath = path.resolve(name)
if (!need.cache.has(absPath)) {
const contents = loadModule(absPath, need)
need.cache.set(absPath, contents)
}
return need.cache.get(absPath)
}
need.cache = new Map()
export default need
We now need to modify loadModule
to take our function need
as a parameter.
(Again, we’ll have our modules call need('something.js')
instead of require('something')
for clarity.)
Let’s test it with the same small module that doesn’t need anything else to make sure we haven’t broken anything:
import need from './need.js'
const small = need('small-module.js')
console.log(`small.publicValue is ${small.publicValue}`)
console.log(`small.privateValue is ${small.privateValue}`)
console.log(small.publicFunction('main'))
full text for eval:
((module, need) => {
const publicValue = 'public value'
const privateValue = 'private value'
const publicFunction = (caller) => {
return `publicFunction called from ${caller}`
}
module.exports = { publicValue, publicFunction }
})(result, need)
small.publicValue is public value
small.privateValue is undefined
publicFunction called from main
What if we test it with a module that does load something else?
import need from './need'
const small = need('small-module.js')
const large = (caller) => {
console.log(`large from ${caller}`)
small.publicFunction(`${caller} to large`)
}
export default large
import need from './need.js'
const large = need('large-module.js')
console.log(large.large('main'))
full text for eval:
((module, need) => {
import need from './need'
const small = need('small-module.js')
const large = (caller) => {
console.log(`large from ${caller}`)
small.publicFunction(`${caller} to large`)
}
export default large
})(result, need)
undefined:2
import need from './need'
^^^^^^
SyntaxError: Cannot use import statement outside a module
at loadModule (/u/stjs/module-loader/load-module.js:8:8)
at need (/u/stjs/module-loader/need.js:8:22)
at /u/stjs/module-loader/test-need-large-module.js:3:15
at ModuleJob.run (internal/modules/esm/module_job.js:152:23)
at async Loader.import (internal/modules/esm/loader.js:166:24)
at async Object.loadESM (internal/process/esm_loader.js:68:5)
This doesn’t work because import
only works at the top level of a program,
not inside a function.
Our system can therefore only run loaded modules by need
ing them:
const small = need('small-module.js')
const large = (caller) => {
return small.publicFunction(`large called from ${caller}`)
}
module.exports = large
import need from './need.js'
const large = need('large-needless.js')
console.log(large('main'))
full text for eval:
((module, need) => {
const small = need('small-module.js')
const large = (caller) => {
return small.publicFunction(`large called from ${caller}`)
}
module.exports = large
})(result, need)
full text for eval:
((module, need) => {
const publicValue = 'public value'
const privateValue = 'private value'
const publicFunction = (caller) => {
return `publicFunction called from ${caller}`
}
module.exports = { publicValue, publicFunction }
})(result, need)
publicFunction called from large called from main
“It’s so deep it’s meaningless”
The programs we have written in this chapter are harder to understand than most of the programs in earlier chapters because they are so abstract. Reading through them, it’s easy to get the feeling that everything is happening somewhere else. Programmers’ tools are often like this: there’s always a risk of confusing the thing in the program with the thing the program is working on. Drawing pictures of data structures can help, and so can practicing with closures (which are one of the most powerful ideas in programming), but a lot of the difficulty is irreducible, so don’t feel bad if it takes you a while to wrap your head around it.
Section 13.5: Exercises
Counting with closures
Write a function makeCounter
that returns a function
that produces the next integer in sequence starting from zero each time it is called.
Each function returned by makeCounter
must count independently, so:
left = makeCounter()
right = makeCounter()
console.log(`left ${left()`)
console.log(`right ${right()`)
console.log(`left ${left()`)
console.log(`right ${right()`)
must produce:
left 0
right 0
left 1
right 1
Objects and namespaces
A JavaScript object stores key-value pairs, and the keys in one object are separate from the keys in another. Why doesn’t this provide the same level of safety as a closure?
Testing module loading
Write tests for need.js
using Mocha and mock-fs
.
Using module
as a name
What happens if we define the variable module
in loadModule
so that it is in scope when eval
is called
rather than creating a variable called result
and passing that in:
const loadModule = (filename) => {
const source = fs.readFileSync(filename, 'utf-8')
const module = {}
const fullText = `(() => {${source}})()`
eval(fullText)
return module.exports
}
Implementing a search path
Add a search path to need.js
so that if a module isn’t found locally,
it will be looked for in each directory in the search path in order.
Using a setup function
Rewrite the module loader so that every module has a function called setup
that must be called after loading it to create its exports
rather than using module.exports
.
Handling errors while loading
-
Modify
need.js
so that it does something graceful if an exception is thrown while a module is being loaded. -
Write unit tests for this using Mocha.
Refactoring circularity
Suppose that main.js
contains this:
const PLUGINS = []
const plugin = require('./plugin')
const main = () => {
PLUGINS.forEach(p => p())
}
const loadPlugin = (plugin) => {
PLUGINS.push(plugin)
}
module.exports = {
main,
loadPlugin
}
and plugin.js
contains this:
const { loadPlugin } = require('./main')
const printMessage = () => {
console.log('running plugin')
}
loadPlugin(printMessage)
Refactor this code so that it works correctly while still using require
rather than import
.
An LRU cache
A Least Recently Used (LRU) cache reduces access time while limiting the amount of memory used by keeping track of the N items that have been used most recently. For example, if the cache size is 3 and objects are accessed in the order shown in the first column, the cache’s contents will be as shown in the second column:
Item | Action | Cache After Access |
---|---|---|
A | read A | [A] |
A | get A from cache | [A] |
B | read B | [B, A] |
A | get A from cache | [A, B] |
C | read C | [C, A, B] |
D | read D | [D, C, A] |
B | read B | [B, D, C] |
-
Implement a function
cachedRead
that takes the number of entries in the cache as an argument and returns a function that uses an LRU cache to either read files or return cached copies. -
Modify
cachedRead
so that the number of items in the cache is determined by their combined size rather than by the number of files.
Make functions safe for renaming
Our implementation of need
implemented the cache as a property of the function itself.
-
How can this go wrong? (Hint: thing about aliases.)
-
Modify the implementation to solve this problem using a closure.