Archive

Archive for September, 2008

Win a Trip to Boulder (and Get a Job)

September 25th, 2008

From http://boulder.me:

Want a FREE trip to beautiful Boulder, Colorado? The Boulder tech scene is growing like crazy. Twenty of our top tech startups have banded together to fly in one hundred top software developers, programmers and engineers from across the country, all expenses paid. You can apply to be one of the hundred.

I’ve just bought a house, and find myself humming the Backyardigans theme without realizing it, but if you’re young and mobile, this would be a great opportunity.  It’s a beautiful place to live, too…

Announcements

Change the Rules, Change the Outcome

September 24th, 2008
Comments Off

Via Andy Lumsdaine, some practical ideas about how to make open source development less unwelcoming for women. I’d be very interested in hearing from projects that have tried some of these out.

Equity

Another Use for Extensible Programming

September 19th, 2008

I’ve grumbled before about the fact that mass-market tools like Firefox and Microsoft Word allow people to mix pictures and text, but programmers’ editors (including IDEs) do not.  My standard answer when people ask why I’d want that is, “So that I can put before and after pictures of data structures for methods, just like I would in a textbook.”  Discussion about the Django port of DrProject has brought up another use case, though.  Suppose you need to initialize a set of objects for a test fixture, and their mutual reference graph contains cycles.  Using text, you have to do things like this:

left = Something(null)

right = Something(left)

left.setPartner(right)

It’s non-declarative, it’s error-prone, and worst of all, you have to relax the error-checking in classes so that they can be initialized in  states that you don’t want them to be in the real program.

Now, wouldn’t it be great if you could just draw what you want?  Imagine an editor that would let you create two circles (one for left, one for right) and join them up with arrows labeled “partner” to show the final state you wanted.  A combination of clever compilation and reflection could then stitch everything together for you.  What’s even nicer, you wouldn’t have to worry about how to bootstrap the fixture into a legal state.

Oh, but that’s crazy talk, isn’t it?  Non-ASCII content in programs?  Why, next you’ll be wanting pictures in debuggers, too…

Extensible Programming

If It’s on the Web…

September 18th, 2008

…it must be (almost) real: our introduction to Computer Science using Python is now listed on Amazon.  Yay!

Book Cover

Learning

Risk Budget

September 18th, 2008

Ward Cunningham coined the phrase “technical debt” to describe the situation where poor design and/or implementation results in developers paying “interest” in the form of extra maintenance or other work that doesn’t add value for users.  Inspired by that, I’ve started asking my students to think about the “risk budget” for their projects.  Everyone is familiar with budgeting time: if you have X hours to do something, the individual times for the tasks making up that something had better not exceed X.  If they do, you have to move the deadline back (i.e., get more time), cut features (i.e., reduce the time required), or get help (which is really just another way of getting more time).

Similarly, any given project can only “afford” to take on a certain amount of risk.  Trying out one new tool is a good idea—you have to keep learning just to keep up— but three?  Uh oh: thanks to network effects, the odds of one of them being broken or interacting badly with other things you’re using is more than three times greater.  The only way to compensate is to ask for time in the schedule to deal with—well, you’re not quite sure yet, but something’s bound to come up.  Even the best managers find it hard to say “yes” to requests like that.

For example, some of my students are porting DrProject to Django this term, while another group is moving the Online Marking Tool (OLM) to Ruby On Rails. In both cases the platform is new to the students, and in both cases, one student in the group has argued that we should use Git instead of Subversion for version control.  There’s no data showing that one makes programmers more productive than the other (strong opinions, yes, but data, no), so which one should we choose?  The answer has to be Subversion, because it’s the one that minimizes the risk of the project failing.  It may not be as shiny right now as the distributed version control systems all the cool kids are using, but moving to a new platform is risk enough.  (In fact, since the Django port is using virtualenv, buildout, and svnmerge, all of which are new to most participants, we’re already over-budget on risk.)

So: what are you working on today? How much risk have you taken on? How does it compare to the risk in the last project you were part of, and how well did that go?

Learning

Startup Nation November 13-14

September 18th, 2008

“Canada’s Conference for Startups” — Jevon MacDonald has the details.  Hope to see lots of you there!

Announcements

I Used To Make Jokes…

September 16th, 2008
Comments Off

…about having a Cray of my own — now I almost could.

Uncategorized

Comments in JSON?

September 16th, 2008

Some of my students have discovered that JSON doesn’t support comments — they’re not in the syntax diagram on the json.org home page or the RFC, and various discussion threads bemoan their absence.  We’d like to use JSON both for data interchange and for specifying test fixtures; we could live without comments for the former if we had to, but it’s a real pain to (for example) have the encrypted version of a password in a test fixture, but not the plaintext password it was derived from.  One possibility is to add a “comment” field to every data type that needs commenting (e.g., the dictionary that represents a User object would have “comment” as one of its keys), but then the mapping from JSON to object or database entry is no longer 1-1.  How are other people dealing with this?

Uncategorized

Life? Don’t Talk to Me About Life…

September 16th, 2008

From Jon Pipitone, unprompted, very early this morning:

Sigh

Uncategorized

REST APIs for Batch Operations

September 15th, 2008

I have a question about the “right” way to design a REST API, and am hoping someone out there on the Interweb will point me in the right direction. The short version of the question is, “How should batch operations be structured?” The long version goes something like this:

Suppose your web application has to keep track of fruit flies. Each fruit fly has a unique (system-assigned) integer ID, a name (hey, even flies can be cuddly), and a genome (represented as a string of characters). If you only had to work with one fly at a time, the API might look something like this (with the data values formatted as XML, JSON, or what have you):

Operation URL HTTP Verb Request Data Response Code(s) Response Data
Find out what flies exist /api/fly GET 200 {id, id, …}
Get a fly’s record /api/fly/id GET 200 {id, name, genome}
Create a fly in the database /api/fly POST {name, genome} 200 {id}
Update a fly’s record /api/fly/id PUT {new name and/or genome} 200 {id}
Delete a fly’s record /api/fly/id DELETE 200 {id}

I’ve left out error cases because they aren’t relevant to my question—at least, I don’t think they are.

But now suppose that you want to do batch operations, i.e., that you want to create, read, update, or delete hundreds or thousands of flies at once. Your client (which may be a desktop application or something else that isn’t a browser) can POST data for lots of flies at once, but you do not want to handle the set of values like this:

result = OK

for chunk_of_data in HTTP_Request:

    start_database_transaction

    result = result and process(chunk_of_data)

    end_database_transaction

return result

The first reason you don’t want to do this is that it’s not atomic: if anything goes wrong partway through, you could have five hundred flies updated, and five hundred not. The second reason is that the process function is actually very slow: if you call it five hundred times, there’s a real risk of taking so long that the web server will time out the request. (Note: in reality, a lot more is going on inside process than just a few SQL queries—files are being opened, parsed, and closed, log entries are being created, etc., so “get a faster web server” is not a valid solution.)

The solution I’ve come up with is to make batch operations fundamental to the REST API, and to define the single-fly operations in terms of them. This leads to API entries like this:

Operation URL HTTP Verb Request Data Response Code(s) Response Data
Update flies’ records /api/fly PUT {{id_0, name_0, genome_0}, {id_1, name_1, genome_1}, …} 200 number_of_updates

with the obvious definition of a PUT to /api/fly/id as a multi-fly PUT with only one fly.

This doesn’t feel right, but I’m not sure where I’ve gone wrong. My performance constraints (i.e., the need to support batch operations) isn’t going to go away, but the whole point of REST seems to be a fundamental one-to-one mapping between URLs and entities, which the batch API seems to violate. So, how do other people (or APIs) do this? And why?

DrProject