Chapter 12: File Interpolator
Terms defined: header file, literate programming, loader, sandbox, search path, shell variable
Many of the examples in these lessons are too long to show comfortably in one block of code on a printed page, so we needed a way to break them up. As an experiment, we wrote a custom module loader that reads a source file containing specially-formatted comments and then reads and inserts the files specified in those comments before running the code (Figure 12.1). Modern programming languages don’t work this way, but C and C++ do this with header files, and static site generators (Chapter 9) do this to share fragments of HTML.
The special comments in our source files contain the text to put in the displayed version and file to include when loading:
class Something {
/*+ constructor + constructor.js +*/
/*+ a long method + long_method.js +*/
/*+ another method + another_method.js +*/
}
We got this to work, but decided to use a different approach in this book. The stumbling block was that the style-checking tool ESLint didn’t know what to make of our inclusions, so we would either have to modify it or build a style checker of our own. (We will actually do that in Chapter 14, but we won’t go nearly as far as ESLint.)
Despite being a dead end, the inclusion tool is a good way to show how JavaScript turns source code into something it can execute. We need to be able to do this in the next couple of chapters, so we might as well tackle it now.
Section 12.1: How can we evaluate JavaScript dynamically?
We want to display files as they are on the web and in print,
but interpolate the files referenced in special comments
when we load things with import
.
To do this,
we need to understand the lifecycle of a JavaScript program.
When we ask for a file,
Node reads the text,
translates it into runnable instructions,
and runs those instructions.
We can do the second and third steps whenever we want using a function called eval
,
which takes a string as input and executes it as if it were part of the program
(Figure 12.2).
eval
vs. normal translation and execution.This is not a good idea
eval
is a security risk:
arbitrary code can do arbitrary things,
so if we take a string typed in by a user and execute it without any checks
it could email our bookmark list to villains all over the world,
erase our hard drive,
or do anything else that code can do (which is pretty much anything).
Browsers do their best to run code in a sandbox for safety,
but Node doesn’t,
so it’s up to us to be (very) careful.
To see eval
in action,
let’s evaluate an expression:
console.log(eval('2 + 2'))
4
Notice that the input to eval
is not 2 + 2
,
but rather a string containing the digit 2,
a space,
a plus sign,
another space,
and another 2.
When we call eval
,
it translates this string
using exactly the same parser that Node uses for our program
and immediately runs the result.
We can make the example a little more interesting by constructing the string dynamically:
const x = 1
const y = 3
const z = 5
for (const name of ['x', 'y', 'z', 'oops']) {
const expr = `${name} + 1`
console.log(name, '+ 1 =', eval(expr))
}
x + 1 = 2
y + 1 = 4
z + 1 = 6
undefined:1
oops + 1
^
ReferenceError: oops is not defined
at eval (eval at <anonymous> \
(/u/stjs/file-interpolator/eval-loop.js:7:30), <anonymous>:1:1)
at /u/stjs/file-interpolator/eval-loop.js:7:30
at ModuleJob.run (internal/modules/esm/module_job.js:152:23)
at async Loader.import (internal/modules/esm/loader.js:166:24)
at async Object.loadESM (internal/process/esm_loader.js:68:5)
The first time the loop runs the string is 'x + 1'
;
since there’s a variable called x
in scope,
eval
does the addition and we print the result.
The same thing happens for the variables y
and z
,
but we get an error when we try to evaluate the string 'oops + 1'
because there is no variable in scope called oops
.
eval
can use whatever variables are in scope when it’s called,
but what happens to any variables it defines?
This example creates a variable called x
and runs console.log
to display it,
but as the output shows,
x
is local to the eval
call
just as variables created inside a function
only exist during a call to that function:
const code = `
const x = 'hello'
console.log('x in eval is', x)
`
eval(code)
console.log('typeof x after eval', typeof x)
x in eval is hello
typeof x after eval undefined
However,
eval
can modify variables defined outside the text being evaluated
in the same way that a function can modify global variables:
let x = 'original'
eval('x = "modified"')
console.log('x after eval is', x)
x after eval is modified
This means that
if the text we give to eval
modifies a structure that is defined outside the text,
that change outlives the call to eval
:
const seen = {}
for (const name of ['x', 'y', 'z']) {
const expr = `seen["${name}"] = "${name.toUpperCase()}"`
eval(expr)
}
console.log(seen)
{ x: 'X', y: 'Y', z: 'Z' }
The examples so far have all evaluated strings embedded in the program itself,
but eval
doesn’t care where its input comes from.
Let’s move the code that does the modifying into to-be-loaded.js
:
// Modify a global structure defined by whoever loads us.
Seen.from_loaded_file = 'from loaded file'
This doesn’t work on its own because Seen
isn’t defined:
/u/stjs/file-interpolator/to-be-loaded.js:3
Seen.from_loaded_file = 'from loaded file'
^
ReferenceError: Seen is not defined
at /u/stjs/file-interpolator/to-be-loaded.js:3:1
at ModuleJob.run (internal/modules/esm/module_job.js:152:23)
at async Loader.import (internal/modules/esm/loader.js:166:24)
at async Object.loadESM (internal/process/esm_loader.js:68:5)
But if we read the file and eval
the text after defining Seen
,
it does what we want:
import fs from 'fs'
const Seen = {}
const filename = process.argv[2]
const content = fs.readFileSync(filename, 'utf-8')
console.log('before eval, Seen is', Seen)
eval(content)
console.log('after eval, Seen is', Seen)
node does-the-loading.js to-be-loaded.js
before eval, Seen is {}
after eval, Seen is { from_loaded_file: 'from loaded file' }
Section 12.2: How can we manage files?
The source files in this book are small enough that we don’t have to worry about reading them repeatedly, but we would like to avoid re-reading things unnecessarily in large systems or when there might be network delays. The usual approach is to create a cache using the Singleton pattern that we first met in Chapter 4. Whenever we want to read a file, we check to see if it’s already in the cache (Figure 12.3). If it is, we use that copy; if not, we read it and add it to the cache using the file path as a lookup key.
We can write a simple cache in just a few lines of code:
import fs from 'fs'
class Cache {
constructor () {
this.loaded = new Map()
}
need (name) {
if (this.loaded.has(name)) {
console.log(`returning cached value for ${name}`)
return this.loaded.get(name)
}
console.log(`loading ${name}`)
const content = fs.readFileSync(name, 'utf-8')
const result = eval(content)
this.loaded.set(name, result)
return result
}
}
const cache = new Cache()
export default (name) => {
return cache.need(name)
}
Since we are using eval
, though,
we can’t rely on export
to make things available to the rest of the program.
Instead,
we rely on the fact that the result of an eval
call is the value of
the last expression evaluated.
Since a variable name on its own evaluates to the variable’s value,
we can create a function and then use its name
to “export” it from the evaluated file:
// Define.
const report = (message) => {
console.log(`report in import-01.js with message "${message}"`)
}
// Export.
report
To test our program,
we load the implementation of the cache using import
,
then use it to load and evaluate another file.
This example expects that “other file” to define a function,
which we call in order to show that everything is working:
import need from './need-simple.js'
const imported = need('./import-simple.js')
imported('called from test-simple.js')
node test-simple.js
Section 12.3: How can we find files?
Each of the files included in our examples is in the same directory as the file including it, but in C/C++ or a page templating system we might include a particular file in several different places. We don’t want to have to put all of our files in a single directory, so we need a way to specify where to look for files that are being included.
One option is to use relative paths,
but another option is to give our program
a list of directories to look in.
This is called a search path,
and many programs use them,
including Node itself.
By convention,
a search path is written as a colon-separated list of directories on Unix
or using semi-colons on Windows.
If the path to an included file starts with ./
,
we look for it locally;
if not,
we go through the directories in the search path in order
until we find a file with a matching name
(Figure 12.4).
That’s just how it is
The rules about search paths in the paragraph above are a convention: somebody did it this way years ago and (almost) everyone has imitated it since. We could implement search paths some other way, but as with configuration file formats, variable naming conventions, and many other things, the last thing the world needs is more innovation.
Since the cache is responsible for finding files, it should also handle the search path. The outline of the class stays the same:
import fs from 'fs'
import path from 'path'
class Cache {
constructor () {
this.loaded = new Map()
this.constructSearchPath()
}
need (fileSpec) {
if (this.loaded.has(fileSpec)) {
console.log(`returning cached value for ${fileSpec}`)
return this.loaded.get(fileSpec)
}
console.log(`loading value for ${fileSpec}`)
const filePath = this.find(fileSpec)
const content = fs.readFileSync(filePath, 'utf-8')
const result = eval(content)
this.loaded.set(fileSpec, result)
return result
}
}
const cache = new Cache()
export default (fileSpec) => {
return cache.need(fileSpec)
}
To get the search path,
we look for the shell variable NEED_PATH
.
(Writing shell variables’ names in upper case is another convention.)
If NEED_PATH
exists,
we split it on colons to create a list of directories:
constructSearchPath () {
this.searchPath = []
if ('NEED_PATH' in process.env) {
this.searchPath = process.env.NEED_PATH
.split(':')
.filter(x => x.length > 0)
}
}
When we need to find a file we first check to see if the path is local. If it’s not, we try the directories in the search path in order:
constructSearchPath () {
this.searchPath = []
if ('NEED_PATH' in process.env) {
this.searchPath = process.env.NEED_PATH
.split(':')
.filter(x => x.length > 0)
}
}
To test this,
we put the file to import in a subdirectory called modules
:
// Define.
const report = (message) => {
console.log(`in LEFT with message "${message}"`)
}
// Export.
report
and then put the file doing the importing in the current directory:
import need from './need-path.js'
const imported = need('imported-left.js')
imported('called from test-import-left.js')
We now need to set the variable NEED_PATH
.
There are many ways to do this in shell;
if we only need the variable to exist for a single command,
the simplest is to write it as:
NAME=value command
right before the command (on the same line).
Here’s the shell command that runs our test case
using $PWD
to get the current working directory:
NEED_PATH=$PWD/modules/ node test-import-left.js
loading value for imported-left.js
trying /u/stjs/file-interpolator/modules/imported-left.js for \
imported-left.js
in LEFT with message "called from test-import-left.js"
Now let’s create a second importable file in the modules
directory:
// Define.
const report = (message) => {
console.log(`in RIGHT with message "${message}"`)
}
// Export.
report
and load that twice to check that caching works:
import need from './need-path.js'
const imported = need('imported-right.js')
imported('called from test-import-right.js')
const alsoImported = need('imported-right.js')
alsoImported('called from test-import-right.js')
loading value for imported-right.js
trying /u/stjs/file-interpolator/modules/imported-right.js for \
imported-right.js
in RIGHT with message "called from test-import-right.js"
returning cached value for imported-right.js
in RIGHT with message "called from test-import-right.js"
Section 12.4: How can we interpolate pieces of code?
Interpolating files is straightforward once we have this machinery in place.
We modify Cache.find
to return a directory and a file path,
then add an interpolate
method to replace special comments:
class Cache {
// ...
interpolate (fileDir, outer) {
return outer.replace(Cache.INTERPOLATE_PAT,
(match, comment, filename) => {
filename = filename.trim()
const filePath = path.join(fileDir, filename)
if (!fs.existsSync(filePath)) {
throw new Error(`Cannot find ${filePath}`)
}
const inner = fs.readFileSync(filePath, 'utf-8')
return inner
})
}
// ...
}
Cache.INTERPOLATE_PAT = /\/\*\+(.+?)\+(.+?)\+\*\//g
We can now have a file like this:
class Example {
constructor (msg) {
this.constructorMessage = msg
}
/*+ top method + import-interpolate-topmethod.js +*/
/*+ bottom method + import-interpolate-bottommethod.js +*/
}
Example
and subfiles like this:
topMethod (msg) {
this.bottomMethod(`(topMethod ${msg})`)
}
and this:
bottomMethod (msg) {
console.log(`(bottomMethod ${msg})`)
}
Let’s test it:
node test-import-interpolate.js
(bottomMethod (topMethod called from test-import-interpolate.js))
When this program runs, its lifecycle is:
- Node starts to run
test-import-interpolate.js
. - It sees the
import
ofneed-interpolate
so it reads and evaluates that code. - Doing this creates a singleton cache object.
- The program then calls
need('./import-interpolate.js')
. - This checks the cache: nope, nothing there.
- So it loads
import-interpolate.js
. - It finds two specially-formatted comments in the text…
- …so it loads the file described by each one and inserts the text in place of the comment.
- Now that it has the complete text, it calls
eval
… - …and stores the result of
eval
(which is a class) in the cache. - It also returns that class.
- We then create an instance of that class and call its method.
This works, but as we said in the introduction we decided not to use it because it didn’t play well with other tools. No piece of software exists in isolation; when we evaluate a design, we always have to ask how it fits into everything else we have.
Section 12.5: What did we do instead?
Rather than interpolating file fragments,
we extract or erase parts of regular JavaScript files
based on specially formatted comments
like the <fragment>...</fragment>
pair shown below.
class Example {
constructor (name) {
this.name = name
}
// <fragment>
fragment (message) {
console.log(`${name}: ${message}`)
}
// </fragment>
}
The code that selects the part of the file we want to display is part of our page templating system. It re-extracts code for display every time the web version of this site is built, which ensures that we always shows what’s in the current version of our examples. However, this system doesn’t automatically update the description of the code: if we write, “It does X,” then modify the code to do Y, our lesson can be inconsistent. Literate programming was invented to try to prevent this from happening, but it never really caught on—unfortunately, most programming systems that describe themselves as “literate” these days only implement part of Donald Knuth’s original vision.
Section 12.6: Exercises
Security concerns
-
Write a function
loadAndRun
that reads a file, evaluates it, and returns the result. -
Create a file
trust-me.js
that prints “nothing happening here” when it is evaluated, but also deletes everything in the directory calledtarget
. -
Write tests for this using
mock-fs
.
Please be careful doing this exercise.
Loading functions
Write a function that reads a file containing single-argument functions like this:
addOne: (x) => x + 1
halve: (x) => x / 2
array: (x) => Array(x).fill(0)
and returns an object containing callable functions.
Registering functions
Write a function that loads one or more files containing function definitions like this:
const double = (x) => {
return 2 * x
}
EXPORTS.append(double)
and returns a list containing all the loaded functions.
Indenting inclusions
Modify the file inclusion system so that inclusions are indented by the same amount as the including comment. For example, if the including file is:
const withLogging = (args) => {
/*+ logging call + logging.js +*/
}
withLogging
and the included file is:
console.log('first message')
console.log('second message')
then the result will be:
const withLogging = (args) => {
console.log('first message')
console.log('second message')
}
withLogging
i.e., all lines of the inclusion will be indented to match the first.
Interpolating from subdirectories
Modify the file interpolator so that snippets can be included from sub-directories using relative paths.
Recursive search for inclusions
-
Modify the file interpolator so that it searches recursively through all subdirectories of the directories on the search path to find inclusions.
-
Explain why this is a bad idea.
Defining variables
Modify the file inclusion system so that users can pass in a Map
containing name-value pairs
and have these interpolated into the text of the files being loaded.
To interpolate a value,
the included file must use @@name@@
.
Specifying markers
Modify the file inclusion system so that the user can override the inclusion comment markers.
For example, the user should be able to specify that /*!
and !*/
be used to mark inclusions.
(This is often used in tutorials that need to show the inclusion markers without them being interpreted.)
Recursive inclusions
Modify the file interpolator to support recursive includes, i.e., to handle inclusion markers in files that are being included. Be sure to check for the case of infinite includes.
Slicing files
Write a function that reads a JavaScript source file containing specially-formatted comments like the ones shown below and extracts the indicated section.
const toBeLeftOut = (args) => {
console.log('this should not appear')
}
// <keepThis>
const toBeKept = (args) => {
console.log('only this function should appear')
}
// </keepThis>
Users should be able to specify any tag they want, and if that tag occurs multiple times, all of the sections marked with that tag should be kept. (This is the approach we took for this book instead of file interpolation.)