Can't Get (Directly) There From Here

One of the projects I’m contributing to these days is writing a first-year Computer Science textbook using Python. We’re using DrProject to manage it: after all, LaTeX files are really just another kind of source code, and what better way to keep track of who’s supposed to be doing what than ticketing?

Well, since you asked… The truth is, we’re storing to-do information in two ways: as tickets in DrProject, and as specially-formatted text in the LaTeX. All the big items use the former; the little notes to ourselves like, “This sentence is cheesy,” are inside the .tex files like this:

\FIXME{This sentence is cheesy.}

Embedding “tickets” in the source is a bad idea for several reasons. First, the embedded items are invisible to DrProject: they can’t be searched, ordered, assigned to particular users, and so on [1]. Second, whenever you store information in two places, you run the risk of duplication, contradiction, or omission. Right now, we have no way of knowing which of our FIXMEs are also recorded as tickets, and which aren’t; we could ask people to file a ticket each time they create a FIXME, and delete the FIXME when they close the ticket, but that’s a lot of extra work.

Despite these problems, embedding little notes in code is such a popular working practice that Eclipse and other IDEs have tools to collate and present markers of this kind. The reason is simple: embedding in code is easy. Even if you have Mylar installed, so that you can work with your ticketing system from within Eclipse [2], throwing a TODO comment or a FIXME macro into your source file disrupts your train of thought—your flow—much less than filing a ticket [3].

There’s another, subtler issue here as well. Suppose you did want to file a ticket to say that a particular sentence was cheesy, or that a particularly complex assignment statement should be refactored. Would you quote it in the ticket? There goes your flow, but what else can you do? You can’t point to it (e.g., quote file name and line number), because the text or code in question might be reorganized between the time the ticket is created, and the time someone gets to it. The only thing to do is to follow where literate programming led, and Javadoc half-heartedly followed: store the “documentation” with the “code”, and tell ourselves that it’s the least of the available evils.

But wait: our source files are under version control, aren’t they? And DrProject can see the version control repository. Why can’t DrProject scan the files in the repository, extract the FIXMEs and TODOs, and turn them into tickets? Better yet, why not have it look for FIXMEs like the one above and insert an automatically-generated ticket ID, i.e., turn the FIXME into something like this:

\FIXME[179]{This sentence is cheesy.}

Those “tickets” can then be managed like any others: if someone closes one in the database, DrProject can delete the corresponding line from the source file, and vice versa.

What I’m really proposing is to treat information-in-the-database and information-in-the-repository on an equal footing. At some level of abstraction (which we have to define and implement), it shouldn’t matter how or where the ticket is stored. All that really matters is the information it contains, and the operations users can perform on it. If it’s easiest for them to enter that data by adding a line to their source code, great—we can handle that. If there’s enough data to justify them switching tools (e.g., a one-page description of how to reproduce a complicated synchronization bug), we can support that too.

It’s tempting—but it won’t work. The problem is that the editors people use when they’re working with source code are unstructured. The editor I’m using to create this posting knows about HTML; if I press the < key on my keyboard, it adds the string < to the file. In contrast, the editor in Eclipse lets me put whatever I want in my Java files—even text that can’t possibly be legal Java. We would therefore have to trust users (a) to format their FIXMEs and TODOs exactly the right way when initially adding them to files, and (b) not to mess up any of the information the system added. Experience with first-generation CASE tools and similar systems proves (at least to me) that people will get both wrong often enough to find the system a hindrance rather than a help.

Teaching Eclipse’s editor how to format a \FIXME[...]{...} in a .tex file is not the right answer: different issue tracking tools will have different conventions, and anyway, what do we do when someone wants to add \CODEREVIEW{...} or \QUESTION{...} or something else? The right answer is to allow developers to create custom micro-editors and bind them to particular flavors of micro-content. The document then becomes an assemblage of strongly-typed elements, the presence of which causes display/modify/diff/merge handlers [4] to be loaded and run.

So once again it comes back to extensible programming. We separate models, views, and controllers when we’re building tools for other people, but we still, in the early 21st Century, insist that our files be unstructured text. It’s easy to see the cost of changing—legacy tools would stop working, and we might have to (shock horror) put Vi or Emacs out to pasture. I think it’s time we started thinking about the cost of staying stuck in the 1970s; I think we ought to start paying attention to all the neat tools that aren’t feasible to build because we’re afraid of embracing the very future that we’ve dedicated ourselves to creating.

[1] OK, we could add another parameter to the \FIXME macro to record a user ID, but then we’d have to validate it. And what about priority? Should \FIXME have as many parameters as tickets have fields? Brr…

[2] DrProject doesn’t support Mylar yet, though there is a plugin for Trac. If anyone is looking for a challenging, useful CSC49X project…

[3] The problem isn’t the time it takes to fix whatever you’ve noticed; the problem is that you have to put aside whatever problem you were thinking about to do so. People talk about “pushing” and “popping” issues on a mental stack, but the brain doesn’t actually work that way: lots of studies have shown that it takes several minutes to get back in a flow state after any significant interruption.

[4] Display and modify should be obvious; diff and merge are needed so as not to discourage users from putting files containing such content under version control. (I would give you real cash money right now for an Excel merge-and-diff tool, and no, export as CSV and use text diff is not an answer.)