Archive

Archive for the ‘DrProject’ Category

REST APIs for Batch Operations

September 15th, 2008

I have a question about the “right” way to design a REST API, and am hoping someone out there on the Interweb will point me in the right direction. The short version of the question is, “How should batch operations be structured?” The long version goes something like this:

Suppose your web application has to keep track of fruit flies. Each fruit fly has a unique (system-assigned) integer ID, a name (hey, even flies can be cuddly), and a genome (represented as a string of characters). If you only had to work with one fly at a time, the API might look something like this (with the data values formatted as XML, JSON, or what have you):

Operation URL HTTP Verb Request Data Response Code(s) Response Data
Find out what flies exist /api/fly GET 200 {id, id, …}
Get a fly’s record /api/fly/id GET 200 {id, name, genome}
Create a fly in the database /api/fly POST {name, genome} 200 {id}
Update a fly’s record /api/fly/id PUT {new name and/or genome} 200 {id}
Delete a fly’s record /api/fly/id DELETE 200 {id}

I’ve left out error cases because they aren’t relevant to my question—at least, I don’t think they are.

But now suppose that you want to do batch operations, i.e., that you want to create, read, update, or delete hundreds or thousands of flies at once. Your client (which may be a desktop application or something else that isn’t a browser) can POST data for lots of flies at once, but you do not want to handle the set of values like this:

result = OK

for chunk_of_data in HTTP_Request:

    start_database_transaction

    result = result and process(chunk_of_data)

    end_database_transaction

return result

The first reason you don’t want to do this is that it’s not atomic: if anything goes wrong partway through, you could have five hundred flies updated, and five hundred not. The second reason is that the process function is actually very slow: if you call it five hundred times, there’s a real risk of taking so long that the web server will time out the request. (Note: in reality, a lot more is going on inside process than just a few SQL queries—files are being opened, parsed, and closed, log entries are being created, etc., so “get a faster web server” is not a valid solution.)

The solution I’ve come up with is to make batch operations fundamental to the REST API, and to define the single-fly operations in terms of them. This leads to API entries like this:

Operation URL HTTP Verb Request Data Response Code(s) Response Data
Update flies’ records /api/fly PUT {{id_0, name_0, genome_0}, {id_1, name_1, genome_1}, …} 200 number_of_updates

with the obvious definition of a PUT to /api/fly/id as a multi-fly PUT with only one fly.

This doesn’t feel right, but I’m not sure where I’ve gone wrong. My performance constraints (i.e., the need to support batch operations) isn’t going to go away, but the whole point of REST seems to be a fundamental one-to-one mapping between URLs and entities, which the batch API seems to violate. So, how do other people (or APIs) do this? And why?

DrProject

DrProject Status Update

July 20th, 2008
Comments Off

http://www.drproject.org hosts two projects: All, which is for people interested in announcements and general news, and DrProject itself, which is for developers (and those wishing to file tickets against our code). This message is going out to the list belonging to the former; if you’d like to join the latter, you can do so by following the Preferences link and requesting membership. (There are about half a dozen messages per day.)

Along with the usual bug fixes, we are working on some new features:

  1. Updating the administration panel to simplify workflow. Qiyu Zhu and Liz Blankenship have been making good progress, and this will definitely be in the end-of-August release.
  2. Integration with IRC. Kosta Zabashta presented this at DemoCamp a few days ago; we have some issues to work out with administering channels, but again, this will be in the end-of-August release.
  3. Status charts. Kosta has been working with Jeremy Handcock to integrate a few simple charts to show projects’ status. We’ll know by the end of July whether this will make it into the next release.
  4. A configurable ticketing system. This is the most ambitious of our current projects; Nick Jamil posted a video showing what it can do, and once Jeff Balogh finishes his Dojo-based drag-and-drop form editor, we’ll put another one up. This needs a lot of testing before we put it in a release, but Luke Petrolekas has already started, and if we don’t make the end-of-August release, we ought to have it in your hands by Christmas.

As always, if you need help getting DrProject installed, please mail help@drproject.org—we’d be happy to help you out.

DrProject

Nick’s Last Day

July 17th, 2008
Comments Off

Tomorrow is Nick Jamil’s last day with us—he’ss done a great job of building a variable-speed ticketing system for DrProject, but now he’s got to put all that energy into getting married :-) To wrap up, he has put two posts on his blog:

  1. The problems that join limits in SQLite are causing (and the ways he’s tried to get around them).
  2. A new-and-improved screencast of what the system can do.

It’s been a pleasure working with him; I hope I get the chance to do so again.  Happy nuptials, dude.

DrProject

A Little Warm…

July 16th, 2008

…but mostly fun: DemoCamp 18 was last night, and as Lillian’s review says, it went pretty well. The venue was too small, and as she said, some of the off-color humor was a little tiresome, but it was good to see such a strong turnout from U of T, and Kosta Zabashta (with Victoria Mui’s help) did a great job of showing off his integration of DrProject and IRC. Special thanks to the sponsors, and Willis Haviland Carrier for the luxury of modern air conditioning.

The next ‘camp will probably be in September — look forward to seeing you all there.

That’s Me On the Right

(Image from Thomas Purves.)

DemoCamp, DrProject

A Guide to Distributed Version Control Systems

July 6th, 2008

This guide at InfoQ is a nice counterpoint to my list of reasons for not switching to Git.

DrProject

Why We’re Not Switching to Git

July 5th, 2008

I got the following a couple of days ago from a colleague in the US:

Our development teams are debating the use of GIT vs SVN. [Project name] has standardized on SVN, but some of the projects are considering GIT. Do you have an opinion on their relative merits, particularly for computational science and engineering applications?

It’s a timely question—two of my best students are keen to switch us from Subversion to Git, and claim the latter makes them much more productive. I’ve decided we’re not going to (not before the end of 2009, anyway) for a couple of reasons:

  1. Documentation: there’s a ton of good stuff about SVN, but Git’s docs are still spotty in places. I’m reviewing an early draft of a book about it, and that’ll go a long way toward closing the gap, but it’s not going to be ready until early 2009.
  2. Supporting tools: we delayed switching from CVS to SVN until the Eclipse and Visual Studio plugins for the latter were solid, and we’re going to delay switching to something else (such as Git) until support for it is as ubiquitous, and as dependable. Again, that probably means some time in 2009.
  3. Reality check: we’ve already had one big snafu due to a developer (a bright one) creating lots of local branches, losing track of his work, then having the hard drive die. Fans of fully distributed version control don’t seem to take this into account when talking about relative productivity, and I think it’s easier to fall into bad habits when there isn’t the visibility of a central repository.
  4. Backing the wrong horse: Git seems to be the most popular VCS of its kind right now, but there are several others, and it’s not yet clear which is going to dominate. Having chosen Python for my web programming projects, only to watch Ruby on Rails grow by leaps and bounds, I’m willing to wait a while to see which horse is going to win the race before placing my bet.

Later: see also this guide to distributed version control systems.

DrProject

Another DrProject Design Question

June 25th, 2008

We’ve hit another “what should it do?” question in DrProject, and I’d welcome opinions from readers. As I’ve mentioned previously, the new ticketing system for DrProject is going to be extensible. Each project’s tickets will initially contain just four fields: sequence number, date created, creator ID, and one line of text. The first three will be filled in automatically; the user will only have to type the fourth. From experience, a simple “to do list” like this is all student teams really want or need.

However, almost everyone wants to add “just one more field” to this design. Sometimes it’s a person responsible; other times it’s priority, due date, a larger text area for a detailed description of the problem, or attachments. The new ticketing system will therefore allow the developers in a project to change the ticket schema for that project using a drag-and-drop form editor.

(Not that it matters right now, but we’re not just building this to support teams’ chosen workflows. It will also give us a way to find out what those workflows actually are—if we deploy DrProject, wait a few months, then look at what people have chosen to record, it should give us some insight into how they’re thinking about their work.)

Now here’s the problem. DrProject currently offers two “personal” views called “All Projects” and “All Tickets”. The first shows a merge of the event logs from all the projects the user belongs to; the second shows all the tickets assigned to the user from across all the same projects. The question is, what should we show for “All Tickets” if every project’s ticketing schema can be different? To make this more concrete, imagine that project Telepathy’s schema has grown to:

(id INTEGER, created DATE, creator USER_ID, title TEXT, duedate DATE, priority ENUM(“hi”, “med”, “low”))

while project Antigravity’s schema has grown to:

(id INTEGER, created DATE, creator USER_ID, title TEXT, priority(“urgent”, “optional”), owner USER_ID)

Options we can see are:

  1. Only show the common fields. (Unlike user-added fields, the four basic fields can never be deleted from a project’s ticket schema, so we know they’ll always be there.)
  2. Show the union of all the columns. This is awkward for all the obvious reasons: it’ll be very wide, many tickets will have blanks in many columns, enumerations with the same name but different values will be a pain, etc.
  3. Have one table for each project, but put all the tables on the same HTML page. A basic version is easy to implement, but sorting and filtering would be difficult.

Anyone have strong preferences? Can anyone see anything better? The ticket for this problem is #1506, and as I said, input would be welcome.

DrProject

Navigating IRC Logs, Nested Forms, Et Cetera

June 23rd, 2008
Comments Off

DrProject

Feedback Time

June 11th, 2008
Comments Off

Daniel Servos is just about ready to put a text-based version of his Moodle stats plugin up for public comment—screenshots are already available. Your feedback would be greatly appreciated. I’ll be begging for feedback on other students’ projects as well just as soon as they have pixels on the screen.

Meanwhile, Nick and Jeff are making steady progress on DrProject‘s new ticketing system. If this schematic is any indication, we’re going to need a lot of maintenance documentation…

DrProject

Determining When You Should Stop Adding Features to a Version is Hard

June 10th, 2008
Comments Off