Yesterday, I asked if anyone was building something like Fossil using a NoSQL database. In response, someone named Pete (who didn’t leave a real email address) asked, “Why NoSQL? This is a perfect application for an SQL database.” Respectfully, I disagree, but my reason will take a little explaining.

Over the past 16 months, several waves of students have been working with me on a replacement for Trac called Basie. Like Trac, Basie is meant to be a minimalist software forge: it combines version control, ticketing, wikis, the like into one package, but is much simpler than the open source and commercial forges that Jordi Cabot and I surveyed. Unlike Trac, Basie is built using modern web tools (Django and jQuery), and supports multiple projects per forge and per-project mailing lists out of the box.

We’ve dealt with quite a few design challenges while building Basie, and have a few more piled up to worry about in January—see, for example, Ian Lienert’s post about deleting vs. hiding, or Andrew Schurman’s look at why integrating with IRC is hard (short answer: channel management). Many of the hardest challenges, though, have a common root cause: relational databases don’t support polymorphism. Take tagging, for example: in order to find all items tagged with “upgrade”, we have to issue and aggregate multiple queries, because the entities that have been tagged are stored in separate tables. Not only does this hurt performance and make the code (much) harder to understand, it also means makes plug-and-play extensibility a lot harder, since anyone who wants to add a new module to Basie has to either edit the tagging code to reference that new module, or wrestle with some not-yet-implemented registration and callback mechanism that moves the grief out of Django’s ORM (where it belongs) and into pure Python code.

We ran into the same problem building the status dashboard, where we wanted one module (the dashboard) to be able to ask questions of others without knowing exactly what those others were or how they were implemented. This is trivial in a programming language that provides polymorphism (as almost all modern languages do), but there’s no standard, straightforward way to do it with SQL.

Hence my interest in NoSQL databases. What they’re explicitly doing is setting aside the “rows and columns” model in favor of—well, in favor of a bunch of different things, depending on which one we’re talking about. But in doing that, they’re sort-of-accidentally making a lot of other innovations possible. After all, if PostgreSQL came out with some kind of polymorphism extension, I probably wouldn’t use it, because I wouldn’t want to be tied to any one relational database. If I choose to use MongoDB or CouchDB, though, I’m committing to a single-source solution anyway, so why not make full use of everything it offers? Simon Willison (who knows much more about all of this than I do) made a similar point in July when asked how hard it would be to get Django running on top of MongoDB:

I remain sceptical of projects that attempt to map Django's extremely relational ORM to non-relational backends. Why would you want to do this in the first place? Presumably because you want to use parts of the Django ecosystem - in particular the admin, generic views and pagination - with a different persistent store. I would argue that you don't want a ORM backend for MongoDB - instead, you want the admin, generic views and pagination to work with alternative storage mechanisms. Instead of depending directly on the ORM, they should make use of an abstract interface which can be mapped to the ORM but can also map to other types of persistent store.

See also this article from John Nunemaker.