Monthly Archives: May 2011

My New Job

In all the excitement, I forgot to mention that I started a new job three weeks ago: I’m now working for Side Effects Software, makers of a world-class visual effects tool called Houdini. It’s been fun so far—I’m learning lots (you can tell because I occasionally use language that’s not suitable for home).

Coming Up Next (We Hope)

The Architecture of Open Source Applications was always meant to be the start of something, not the end. We’d really like to collect more descriptions of complex systems’ architectures and the lessons to be learned from them, but to do that, we need your help. If you are, or know, the key designers or developers associated with the projects listed below, please let them know that we’d like to hear from them. Where we don’t have an application, just a category, please suggest a particular project and make an introduction, and if there’s something missing that you think would teach people a lesson that would otherwise go untaught, please let us know that too.

  1. GDB (or any other industrial-strength debugger).
  2. Gecko, WebKit, or another HTML rendering engine.
  3. A JITting JavaScript implementation.
  4. BZFlag or some other real-time multiplayer game (we have two turn-based games in vol 1).
  5. The Thunderbird desktop email client.
  6. Moodle.
  7. Inkscape and/or The Gimp.
  8. OpenOffice Calc or Gnumeric (i.e., a spreadsheet).
  9. Vim or Emacs (old-school text editor).
  10. The Arduino IDE (which is written in Processing).
  11. Something (anything) for small-memory/small-power devices.
  12. Puppet.
  13. A penetration testing toolkit.
  14. OpenSSH (please please please oh please).
  15. OpenStreetMap.
  16. GnuPlot or matplotlib.
  17. nginx (or another modern lightweight web server).

What are we missing? What would be an opportunity to describe and explain design principles that we haven’t already covered? Remember, it doesn’t have to be a beautiful architecture to be instructive… But please note, we’re looking for things whose designs can be described in essays—there are entire books on the Linux kernel.

Later: as per another post, the best way to get something included in volume 2 is to offer to write a chapter yourself. If you don’t know enough to do that, please take a few moments to collect the names and email addresses of people who could and forward them to me.

You Doesn’t Exist

The title of this post is not grammatically incorrect, and therein lies a story. I’ve had more than a dozen emails since The Architecture of Open Source Applications was announced saying, “You should do a chapter on [name of application goes here].”  My stock reply is now, “Yes, you should.” Most people don’t respond, but the handful who do have all said, “I didn’t mean you, I meant someone.” To which I’ve been tempted to reply, “I agree—please mail her/him and let me know what s/he says.” I’ve never actually sent that, though; I figure anyone who doesn’t get the point of the first message won’t get it if it’s made a second time.

So here it is spelled out: there is no “you”. There is no “someone”. There’s me, and Amy Brown, and the individuals who volunteered to write chapters and then actually delivered on their promise, and there’s you, you the person who sent that email, you the person who is reading this, you the person who could volunteer just like we did and put your head down and take time away from your friends, your family, your thesis, your volleyball team, or your favorite TV series and write something. That isn’t the only way that open source and open content happen, but that’s where it started, and that’s how things ranging from this book to Wikipedia came to be. If you want a chapter on The Gimp or Arduino, then write it. If you don’t know enough to do that, then find some people who could and say, “I’d be happy to draw the diagrams and proof read and translate from your preferred format into others and whatever else it takes if you’ll turn what you know into prose.”

I think that making something where nothing existed before is the greatest adventure there is. If you’d like to give it a try, please get in touch.

So What’s It Like Publishing a Book Yourself?

Several authors and would-be authors have asked us what it’s like doing a book with a print-on-demand publisher like Lulu. Overall, we’re pretty pleased: you have complete control over content and schedule, and since traditional publishers are pushing the work of publicizing technical books onto authors anyway, the only thing you really lose is professional copy-editing. You can always hire a freelancer to do that, though (which is what traditional publishers mostly do); if you’re interested, Amy Brown (my co-editor) now has some time…

There are some caveats, though. First, you’re entirely responsible for creating electronic versions of your book, and it’s much harder than I had expected.  A couple of volunteers have been working steadily since Monday to produce e-pub and .mobi versions of AOSA, and they still aren’t right—tools like Calibre, kindlegen, and eCat do a mediocre job quickly, but doing something that looks professional turns out to be much more difficult than it should be in the early 21st Century.

Second, while Lulu and other print-on-demand services cross-list at Amazon, Barnes & Noble, and other outlets, the royalty structure is—well, if you have a look at the http://aosabook.org home page, you’ll see the breakdown.

A third complaint is that Lulu won’t ship to a post office box: they say, “Because our expediter won’t,” but that just begs the question of why they don’t use a competent expediter.  It doesn’t affect most people, but it’s a real inconvenience for those it does. Please let them know if it’s a problem for you (or if you bump into Bob Young, ask him why they haven’t fixed this yet).

Finally, we have run into cases where it’s clear that Lulu’s right hand doesn’t know what it’s left hand is doing. Right now, for example, we’re trying to reconcile a complaint from Lulu about missing fonts in our PDF with their simultaneous report that our uploaded file is fit for use. I’ve had similar problems with all four of the traditional tech publishers I’ve worked with over the last 20 years, though, so while Lulu isn’t better, it isn’t really worse.

What’s There Instead

“The real grand challenge for software engineering research is relevance.”

I’m not particularly observant at the best of times—it’s one of the reasons I don’t drive—but even so, I’ve been kicking myself for not noticing something about The Architecture of Open Source Applications in time to include a note in the introduction. There are over 100 diagrams in the book, only half a dozen of which use any standard modeling notation like UML. Putting it another way, experienced software architects don’t believe that a standard modeling notation is the best way to communicate their ideas roughly 95% of the time. It’s not ignorance: I’m positive that everyone who helped write a chapter for the book would immediately recognize UML, and could draw a more-or-less legal class diagram or sequence diagram if asked to. And it wasn’t laziness, either, or a side effect of them dumbing down their work for a lay audience.

Of course, that observation raises a question: what do they use instead? What exactly do experienced designers describe when they describe the “architecture” of their software? I think that AOSA is a unprecedented opportunity to find out. A careful analysis of what its contributors do and don’t say (the kind of qualitative study typified by several of the chapters in Making Software) could tell us what modeling notations for software architecture need to include for software architects to find them compelling.

It would be a hell of a paper, but I’m not an academic any more, and other things are more pressing. If you’d like to take a crack at it, please let me know—I’d be happy to introduce you to our contributors.

It’s Not Theory vs. Practice, It’s Two Solitudes

In Canada, the phrase “two solitudes” refers to the lack of communication—and the lack of interest in communicating—between Anglophones and Francophones. I think of that phrase every time someone uses the phrase “theory versus practice” when talking about academia and industry. Having worked in both, I don’t think that’s the real dividing line: lots of academics actually do build things (look at Berkeley DB and SnowFlock), and the people writing optimizing compilers for IBM, or doing machine learning for Google, are as lost in the theoretical stratosphere as anyone. Instead, I think of industry and academia as two branches of an extended family that send each other Christmas cards, and occasionally show up for each other’s weddings or funerals, but aren’t in day-to-day or even year-to-year contact [1].

I think this divide is unhealthy, and while I failed to to bridge it personally, I’m hoping that the two books I’ve worked on in the past year will encourage others to do so. The second one to come out, The Architecture of Open Source Applications, has been getting some attention among practitioners since its launch this week. It’s too early to tell if academics will pay attention to it [2], but I’m hoping that someone at IEEE Software or Communications of the ACM will think it’s worth bringing to the attention of their audience [3].

The first of the two, Making Software, didn’t draw nearly as much attention (i.e., it wasn’t Slashdotted), but I think it’s a natural and necessary complement to AOSA. MS is a summary of what we actually know about how software is developed: what studies have been done, what they found, what conclusions we can draw from them, and why anyone should believe any of it. If AOSA is “what practitioners have built”, MS is “what researchers know”; my aim in doing both back to back was to give each of those two communities something they could give the other, something that would give them an excuse to sit down together and catch up with how Aunt Yena’s sciatica is doing and oh, isn’t little Zuffi just the cutest baby you ever saw?

In my dreams, what happens next is that people use these books as an opportunity to reach out to one another. I’m still digesting notes from yesterday’s ICSE panel session on “What Industry Wants From Research” [4], but it’s clear that a lot of researchers would love to talk practitioners about what problems really matter, and what would count as answers. At the same time, I think researchers could get some useful reality checks, and maybe even some redirection, by looking at what practitioners choose to describe when asked to describe the most important features of their applications.

As a first step, if you’re in academia, think about going to OSCON or Agile this year, telling the people there what you do, and listening to what they talk about when they talk about the things that they think are important. If you’re not an academic, but planning to go to either of those conferences, why not call up one of your old professors and invite them to join you? Or flip through Making Software and ask the author(s) of one of the chapters you find interesting to try it out. If nothing else, you’ll get a great t-shirt out of it…

[1] And yes, most people who get an undergraduate degree in CS do go out into industry, but for most of them, it’s a one-way trip, and very little of what they do or learn ever filters back to campus. To continue my analogy, they’re the young ‘uns that leave the old country to go work for Uncle Willi in Chicago, but stop writing home after mummi and vati pass away.

[2] I’m still baffled that there isn’t a “news for software engineering researchers” blog along the lines of Lambda the Ultimate to help people stay up to date with things like this. If I had more energy, I’d start one; if you have more energy than me, please do so.

[3] What I’d really like, of course, is for people to start using it as a textbook in advanced undergraduate software courses, but since those courses mostly don’t exist, that’s probably a vain hope…

[4] See Jorge Aranda’s post for a thoughtful summary of the answers that he, Daniela Damian, Marian Petre, and Peggy Storey uncovered before the panel session by interviewing industry practitioners.

How We Got Here, and Where We’re Going

I got my first programming job in the summer of 1982, rewriting an RSA encryption library in C for Prof. Selim Akl at Queen’s University. One of the older students eventually took pity on me and gave me a copy of Kernighan and Plauger’s Software Tools. My first reaction was, “Fortran? What’s that got to do with anything?”. But then I got past the first few pages and realized that this was exactly what I’d been looking for. Except for Wirth’s Algorithms + Data Structures = Programs, most of the other programming books I could find talked about the specifics of particular languages or systems, rather than about how to design programs or what good designs looked like. The few that did raise their sights only got as far as programming style, but with an English teacher for a father, I’d already internalized good variable names and Goldilocks modules (“not too big, not too small”).

My complaint wasn’t a new one, of course. Over the years, lots of people have pointed out that we only teach students how to write programs, not how to read them, and never show them the great programs of the past. Lions’ Commentary on Unix, Tanenbaum’s description of Minix—if you wanted to see how good programmers built things that were more than a couple of pages long, the list was pretty short.

Fast forward to 2006, when I was asked to teach a course at the University of Toronto called “CSC407: Software Architecture”. It had been created by a professor who, like me, had come to the department after many years in industry (and who, like I would later, gave up on academia after a few years and went back to the real world). I taught the course three times, then told the department to cancel it because the raw material needed to teach it properly simply didn’t exist. I must have reviewed a dozen textbooks with “Software Architecture” in their titles, but they all seemed to follow the same pattern:

  • Gosh, good architecture is really important, isn’t it?
  • So here’s some fuzz about high-level design principles.
  • And N kinds of diagrams you can use to describe architectures.
  • Um…that’s it.
  • Oh, wait, we should include some examples. Wel, here’s pipe-and-filter—not an actual pipe-and-filter system, of course, just, you know, pipe-and-filter in general. And client/server, and model-view-controller, and (optionally) peer-to-peer, though again, not really truly actual existing systems.

Yes, I’m exaggerating a bit—Gorton’s Essential Software Architecture and Reekie & McAdam’s A Software Architecture Primer were both useful exceptions—but there really was a mile of clear blue water between what was in the books, and what students actually wanted and needed to know. That’s why I organized Beautiful Code: I wanted examples of good design that I could put in front of students, and more importantly, some informed discussion of why those designs were good.

BC did well, but once the dust had settled, I realized it still wasn’t exactly what I was after. I had asked contributors, “What is the most beautiful piece of software you’ve ever seen, and what makes it beautiful?” They had answered in a lot of different ways, many of which had nothing to do with “architecture” (a term I was still struggling to define). And so, last year, while at PyCon, I stood up and said, “OK, let’s fix this.” This time, though, I was more specific than I had been with BC. I told people, “Imagine that a new developer has joined your team. You have one hour to explain its architecture to them—what would you say?” That turned out to be as good a definition of “architecture” as anything else: it’s what you draw on the whiteboard when you’re telling the new guy how things fit together and why they are the way they are.

Fast forward another few weeks. Having googled my fingers to nubbins finding email addresses for people who might be able to contribute, I had enough volunteers to make the project viable. Twelve months later, after some ups and downs with publishers and a lot of help from Amy Brown, we had a book—one that’s pretty close to what my 19-year-old self wanted twenty-nine years ago.

I now realize, though, that The Architecture of Open Source Applications should be the start of something, not its culmination. There are a lot of other interesting software systems out there crying out to be described, and a lot of people who would benefit from reading those descriptions. Some of those systems are pretty crufty (yes, GDB, I’m looking at you), but that doesn’t mean they should be ignored. Every lurking horror in your favorite program was put there for a reason that made sense to someone once upon a time. We might do things differently today, but if we don’t analyze, critique, and learn from what came before, we will almost certainly do no better.

So: if you would like to make the world a better place while doing something of lasting value, and if you know enough about the innards of some reasonably well known, reasonably complicated open source application, please get in touch—we’d be happy to welcome you aboard.

“The Architecture of Open Source Applications” is Now Available

It has been slightly over a year in the making, but it’s finally here: The Architecture of Open Source Applications has been published.  You can buy the book directly from Lulu.com at http://www.lulu.com/product/paperback/the-architecture-of-open-source-applications/15819207, or view the contents online at http://aosabook.org.

My thanks to all the people who contributed to the book, and especially to Amy Brown, my tireless and diligent co-editor. We hope you enjoy it.

Why Does Flask Lose My Post Data When It Redirects?

I’m trying to decide which Python web programming framework to use in the Software Carpentry course. Simplicity is the main criterion: in fact, other than “still a live project”, it’s the only criterion. Flask is a strong contender despite the fact that its documentation is written for people who already understand web programming—I can fill that in. What I’m struggling with now, though, is that the same “hide the details” approach that makes it approachable is also making debugging difficult.

For example, here’s a simple application:

from flask import Flask, render_template, request
DEBUGGING = True
app = Flask(__name__)

def load_data():
    return [
        [True, "luke", "2011-05-03", "luke", "Figure this out"],
        [True, "luke", "2011-05-03", "yele", "Figure this out"],
        [False, "luke", "2011-05-03", "gvwilson", "Figure this out"]
    ]

@app.route('/show/', methods=['POST', 'GET'])
def show():
    data = load_data()
    try:
        items = request.form['item']
        count = str(len(items))
    except KeyError:
        count = "no items"
    return render_template('show.html', data=data, count=count)

if __name__ == '__main__':
    app.run(debug=DEBUGGING)

which I put in show.py. Here’s the corresponding template, which I put in templates/show.html:

<html>
<body>
  <p><strong>count: {{ count }}</strong></p>
  <form name="todo" action="/show" method="POST">
  <table border="1">
    <tr>
      <th colspan="6">Active</th>
    </tr>
    <tr>
      <th>Select</th>
      <th>Number</th>
      <th>Creator</th>
      <th>Created</th>
      <th>Owner</th>
      <th>Task</th>
    </tr>
    {% for item in data %}
      {% if item[0] %}
        <tr class="active">
          <td><input type="checkbox" name="item" value="{{ loop.index0 }}"/></td>
          <td>{{ loop.index0 }}</td>
          <td>{{ item[1] }} </td>
          <td>{{ item[2] }} </td>
          <td>{{ item[3] }} </td>
          <td>{{ item[4] }} </td>
        </tr>
      {% endif %}
    {% endfor %}
    <tr>
      <th colspan="6">Completed</th>
    </tr>
  <p><input type="submit" value="make active" /></p>
  </form>
</body>
</html>

I run the application:

$ python show.py
 * Running on http://127.0.0.1:5000/
 * Restarting with reloader...

and then point my browser at http://127.0.0.1:5000/show/ (note the trailing slash). As expected, Firebug tells me there has been one GET request, with no parameters.

I then tick off the first of the checkboxes and click “Submit”. According to Firebug, this causes a POST with “item=0″ as the submitted data (good), which immediately redirects (code 301) to a GET to the same URL, but without any of the posted data. Now, according to the section on “Unique URLs / Redirection Behaviour” in the Flask documentation, if the URL specified in app.route has a trailing ‘/’, then accessing the URL without the trailing slash automatically redirects to the URL with the trailing slash. But I’m submitting a URL with a trailing slash, so there shouldn’t be a redirect, and even if there is, why is the posted data being thrown away?

I’m clearly doing something wrong, and if it wasn’t a warm Saturday afternoon, I’d probably have figured it out by now. (But please don’t let that stop you from sending me tips :-) . What I’d really like to know, though, is how the hell to explain whatever is going on to someone who’s new to web programming. I’ve tried teaching raw CGI programming to grad students in science and engineering—it usually fails because there are too many low-level details to master. I’d hoped that high-level frameworks like Flask and web2py would make this stuff more accessible, but the law of leaky abstractions may put them out of practical reach as well: if things that look like they ought to work don’t, and there’s no easy way for someone who is just learning this stuff to debug it, we’re dead in the water.