Home > DrProject > REST APIs for Batch Operations

REST APIs for Batch Operations

September 15th, 2008

I have a question about the “right” way to design a REST API, and am hoping someone out there on the Interweb will point me in the right direction. The short version of the question is, “How should batch operations be structured?” The long version goes something like this:

Suppose your web application has to keep track of fruit flies. Each fruit fly has a unique (system-assigned) integer ID, a name (hey, even flies can be cuddly), and a genome (represented as a string of characters). If you only had to work with one fly at a time, the API might look something like this (with the data values formatted as XML, JSON, or what have you):

Operation URL HTTP Verb Request Data Response Code(s) Response Data
Find out what flies exist /api/fly GET 200 {id, id, …}
Get a fly’s record /api/fly/id GET 200 {id, name, genome}
Create a fly in the database /api/fly POST {name, genome} 200 {id}
Update a fly’s record /api/fly/id PUT {new name and/or genome} 200 {id}
Delete a fly’s record /api/fly/id DELETE 200 {id}

I’ve left out error cases because they aren’t relevant to my question—at least, I don’t think they are.

But now suppose that you want to do batch operations, i.e., that you want to create, read, update, or delete hundreds or thousands of flies at once. Your client (which may be a desktop application or something else that isn’t a browser) can POST data for lots of flies at once, but you do not want to handle the set of values like this:

result = OK

for chunk_of_data in HTTP_Request:

    start_database_transaction

    result = result and process(chunk_of_data)

    end_database_transaction

return result

The first reason you don’t want to do this is that it’s not atomic: if anything goes wrong partway through, you could have five hundred flies updated, and five hundred not. The second reason is that the process function is actually very slow: if you call it five hundred times, there’s a real risk of taking so long that the web server will time out the request. (Note: in reality, a lot more is going on inside process than just a few SQL queries—files are being opened, parsed, and closed, log entries are being created, etc., so “get a faster web server” is not a valid solution.)

The solution I’ve come up with is to make batch operations fundamental to the REST API, and to define the single-fly operations in terms of them. This leads to API entries like this:

Operation URL HTTP Verb Request Data Response Code(s) Response Data
Update flies’ records /api/fly PUT {{id_0, name_0, genome_0}, {id_1, name_1, genome_1}, …} 200 number_of_updates

with the obvious definition of a PUT to /api/fly/id as a multi-fly PUT with only one fly.

This doesn’t feel right, but I’m not sure where I’ve gone wrong. My performance constraints (i.e., the need to support batch operations) isn’t going to go away, but the whole point of REST seems to be a fundamental one-to-one mapping between URLs and entities, which the batch API seems to violate. So, how do other people (or APIs) do this? And why?

DrProject

  1. Glenn
    September 15th, 2008 at 15:29 | #1

    You should post this on Stackoverflow.com

  2. September 15th, 2008 at 15:51 | #2

    Hi there,

    What about a URI like “/batch/fly”. A URI just refers to a resource and a batch of flies could be considered a resource in and of itself.

    Even if the interface supports only PUT and returns links to the newly created fly resources, that still qualifies as a resource for batches of flies.

    Just a thought. Hope it helps. Good luck! :)

  3. September 15th, 2008 at 16:05 | #3

    Hi Greg,

    I’m not sure I fully understand the proposed batch API. How will clients know which flies were updated? What happens if an update fails? How does it solve the problem of possible request timeouts?

    Regardless of the architectural style (REST, XML/RPC, etc.), have you considered an asynchronous API? From what I’ve read in your post, it sounds like you might be better off that way. The client PUTs a list of flies, it receives a requestId, and then the client polls the server to check the status of the batch request. When the batch request status is complete, the client can get a report that indicates what fly updates succeeded/failed (if it cares about such things).

    So the API would look something like this:

    PUT /api/flies {id_0, name_0, genome_0}, {id_1, name_1, genome_1}, …}
    RESPONSE {requestId}

    GET /api/batchRequest/requestId
    RESPONSE {status}, ie. IN_PROGRESS, COMPLETE, FAILED, etc.

    GET /api/batchReport/requestId
    RESPONSE {report}

    Of course you could maintain your original single-fly synchronous API if there are situations where you need it.

  4. September 15th, 2008 at 18:02 | #4

    This may be obvious, but the poster child for RESTful APIs is the Atom Publishing Protocol (http://bitworking.org/projects/atom/rfc5023.html). While that spec doesn’t explicitly tell how to PUT multiple items, you can extrapolate from their lists, which GET multiple items. Simply provide a url that expects a list of items rather than single items. It can then return a list of the item ids/urls for GET or POST (if editable) as needed.

  5. September 15th, 2008 at 18:39 | #5

    REST creator Roy T. Fielding recently wrote on his blog (http://roy.gbiv.com/untangled/):

    “Web architects must understand that resources are just consistent mappings from an identifier to some set of views on server-side state. If one view doesn’t suit your needs, then feel free to create a different resource that provides a better view (for any definition of “better”). These views need not have anything to do with how the information is stored on the server, or even what kind of state it ultimately reflects. It just needs to be understandable (and actionable) by the recipient.”

    URLs such as /fly/1, /fly/2, etc. that have a one-to-one mapping from URL to entity are good as a starting point because they are widely understood, but don’t think that by creating another resource to handle batch input for transactional and performance reasons in anyway violates REST principles.

  6. rgz
    September 15th, 2008 at 19:33 | #6

    This shows what’s wrong with REST, a web api is not like a library but rather a resource like a database server. While I urge you to forgo REST purity, I suggest you forgo it with style. Considering you update objects PUTting to “/api/fly/id” it makes sense to batch update by PUTting to “/api/fly/id1,id3,id5-id8,id9″ basically adding range semantics to REST.

    What about it?

  7. September 15th, 2008 at 20:18 | #7

    This might be a clash between Rails and what I think I may know about REST, but…

    Single
    /api/fly/id, GET
    /api/fly/new, POST
    /api/fly/id, DELETE

    Batch:
    /api/flies, GET, {id, id, …}
    /api/flies, POST,

    The plurality of the name gives it context. It also forces the caller to think about the usage and the access method.

    -adam

  8. September 15th, 2008 at 20:48 | #8

    REST + Batch don’t play well right now.

  9. September 15th, 2008 at 21:20 | #9

    I don’t have anything technical to add, but I thought I’d drop a comment anyway just to say that your post made me smile because my roommate studies fruit flies. If you ever need to catch some, I’ve learned some of her lab techniques for escapees, and I could demonstrate with photos. :-)

  10. September 16th, 2008 at 18:25 | #10

    This was a hotly debated/heavily discussed topic between me and a former officemate while doing some SOAP/WSDL API design. Mike wrote it up quite well: http://www.michaelgilfix.com/techblog/2007/02/25/web-service-interface-design
    The API design pattern was used in a high-throughput SOAP-over-HTTP building block underneath telecommunications software.

    - Rhys

  11. L. Daniel Burr
    September 16th, 2008 at 21:05 | #11

    I know the REST camp debates this issue frequently, but in my own applications I just made the decision to support a “transaction” resource; I POST to obtain a transaction id; I PUT the set of operations to be processed in said transaction. Done.

    Admittedly, this doesn’t work for everyone, but it has worked like a charm for my own use cases, and it certainly can work well for the fruit-fly example above.

Comments are closed.