DrProject Internals: Email

And now, DrProject’s email. This was the first completely new subsystem we added after we forked; running Mailman in parallel with Trac worked well enough while we were bootstrapping, but the fact that neither system knew about the other made both a pain to use. We couldn’t, for example, run a single search query that would hit both a project’s wiki and its mailing list archive; we also had to do a little dance to keep project and mailing list memberships in sync, and we never even tried to modify Mailman so that wiki-syntax shortcuts to tickets and pages in mail messages would be turned into links automatically.

The broad outline of our design was straightforward:

  1. We didn't want to compete with people's existing mail clients, so DrProject would provide relay and archiving only---there'd be no way to compose or receive messages from within DrProject, and no way to send mail to individual (just projects). That simplified things at least as much as only supporting reads simplified the Subversion repository viewer.
  2. We could safely assume that the Linux host running DrProject had a mail transfer agent (MTA) such as sendmail. We could further require that whoever installed DrProject be able to modify the MTA's configuration to route messages for particular addresses to a program we provided. In particular, we could tell the MTA to store all messages for A+B@host.name in a directory called /drproject/A/B. In this scheme, A identifies a particular instance of DrProject on the host, while B identifies the project within that instance. For us, A would be a course ID, like csc408, and B would identify the student team.
  3. Every time the DrProject CGI ran, it could check the spool directories for new messages. If it found any, it could copy them into the database, index them for searching, and forward them to the members of the project the message was for.
  4. The wiki parser could be modified to recognize @123@ as a shortcut to message #123. We decided to use an '@' before and after to avoid worrying about the possible ambiguity in abc@123.yourhost.com.

Simple enough—but as always, there were a lot of sharp-toothed details lurking in the underbrush. First (and simplest), the program invoked by the MTA to copy messages into the spooling directory, and the DrProject CGI, had to lock files at the proper times, so that DrProject wouldn’t try to read a message while the handler invoked by the MTA was still writing it.

Second, we needed a way to prevent the project lists from being spammed. After some discussion, we decided to use a whitelisting: every user would have to tell DrProject the addresses from which she wanted to be able to send mail, and select one of those for DrProject to forward mail to. The procedure we’re currently using is far from original:

  1. After logging in, the user goes to the preferences page and enters the email address she wants to register.
  2. DrProject stores that address in an UnconfirmedEmail table, then sends a message to that address requesting validation.
  3. Once the user gets the message and validates the address, it is added to the set associated with her account. One of those must always be marked for forwarding: all mail sent to projects of which she is a member will be forwarded to that address. She can turn forwarding on or off on a per-project basis, but we didn't see any reason to allow mail to different projects to be forwarded to different addresses.

There’s still a bit of room for abuse here: if I told DrProject that yourname@yourhost.com was my address, but never reply to the validation request, you wouldn’t be able to claim it. We figured that was a pretty minor issue, and that it could be resolved by divine (i.e., administrative) intervention, so we didn’t worry about it.

What we did have to worry about was exactly what constituted “membership” in a project for the purpose of message forwarding. Our authorization scheme doesn’t actually include a notion of “membership”; instead, every user has a role (possibly a default role) with respect to each project, and each role is a collection of capabilities. Should roles have a MEMBERSHIP attribute? Or should we infer “membership-for-the-purpose-of-mail-forwarding” from something else?

We went with the latter: if your role with respect to a project gives you MAIL_POST privileges, then messages to the project list are forwarded to you. MAIL_VIEW isn’t enough, since we may want to give anonymous users the ability to read the archives of “public” projects, but don’t want a special-case rule saying, “Forward to anyone with this capability unless they’re anonymous or nobody.”

It all worked well under test, but failed when we first deployed it last fall. The problem turned out to be some missing quotes in a shell script—the commands all worked when run directly from an interactive shell prompt, but failed when the script was invoked. Once that was fixed, we began noticing that messages would sometimes be delayed for hours—even days—before being delivered.

That one turned out to be a simple oversight. DrProject is a long-lived CGI (we actually use SCGI); when it’s not actually processing an HTTP request, it just sits and waits. That means that it only looks for new mail messages when someone interacts with it over the web (e.g., files a ticket or views a wiki page). Messages sent to project lists were therefore piling up until someone went to check on them, at which point they were all forwarded.

The solution we’re now using is a cron job that sends a dummy HTTP request to DrProject every two minutes or so. It was a simple thing to write, but we’re still unhappy with it, since it’s difficult for developers to test, and is yet another scraplet that administrators have to remember to deploy and restart. I’d like to fold the cron job into the SCGI process some day, but it’s well down my wish list.

Later: how could I have forgotten the address rewriting problem? We’re currently hosting course-related instances of DrProject on Stanley, a medium-hefty server donated by the kind folks at the Jonah Group. For the first few weeks of term, mail forwarded by DrProject had instance+project@stanley.cs.toronto.edu as a return address. The problem was, the CS department’s mailer was rewriting this as instance+project@cs.toronto.edu. That makes perfect sense for mail from real people (you probably don’t care that the machine I compose my messages on is jalkelainen.cs.toronto.edu), but since the department’s mail server didn’t know about DrProject’s project-related addressing, anyone who just hit “reply” to a forwarded message got a bounce-back. The “solution” (and yes, I think the quotes are justified) is to take advantage of the fact that drproject.org is hosted at stanley.cs.toronto.edu, and use instance+project@drproject.org as the return address. It’s these kinds of integration issues that make real software hard…