Archive

Archive for December, 2005

My File System

December 30th, 2005
Comments Off

Here’s a tree view of my computer’s file system, courtesy of SequoiaView. The big yellow block is M4A music files; the big green one is MP3′s, the gray rectangle in the upper right hand corner is pagefile.sys, and the gold-gray-and-blue fishscales in the middle near the bottom are the Windows system DLLs. Pretty.

Uncategorized

External Programming Interfaces

December 29th, 2005
Comments Off

This article, by Eamonn McManus, is a nice little summary of API design principles. It contains a bit of motherhood and apple pie—nobody would ever set out to make an API difficult to learn or hard to use, for example—but the specifics are good (particularly the discussion of why interfaces are often the wrong thing to use). The article contains a link to this essay at the NetBeans site, which talks about some of the same ideas in more detail.

Together, the articles got me thinking: why don’t we ever talk about or document a module’s external programming interface (which I hereby dub “XPI”)? This is the classes, methods, and system calls that the module depends on; more particularly, it is their semantics. Design by contract allows a piece of code to specify what it provides; allowing that code to specify what it requires using similar pre- and post-conditions would be a big help in managing the asynchronous evolution of libraries that makes postmodern programming so hard.

Extensible Programming

$67 million a year

December 28th, 2005
Comments Off

The US Dept. of Energy has just announced the next round of funding for SciDAC, its flagship supercomputing program. US$67 million per year for three to five years. Supercomputing Online reports:

Research proposals funded under the SciDAC program will help create a comprehensive, scientific computing software infrastructure that integrates applied mathematics, computer science and computational science in the physical, biological and environmental sciences for scientific discovery on petascale computers.

My bet is that, once again, most projects will depend on heroic effort, rather than good development techniques, to reach their goals. I’m also willing to bet that anyone who wants to use most of the software these projects create will have to put in heroic effort of their own to get it built and deployed. I (obviously) believe that a little bit of training would go a long way, but I’m not optimistic that the people who need it most will listen: as is so often the case, those who know they need it are already halfway home, while those who need it most don’t even know what they’re missing.

Software Carpentry

New Year’s Schedule for Software Carpentry

December 27th, 2005
Comments Off

I’m teaching a cut-down version of Software Carpentry at the IASSE in two and a half weeks. I’ll have students half days for the weeks of January 16 and 23, and full days for the week of February 6. That’s only 20 lectures (rather than 26), so the question is, what to cut? The answer has wider implications, since this will be the version of the course I take to the AAAS workshop.

My plan is:

Jan 16 Introduction Revised to be a forward summary of the whole course.
17 Shell 1
18 Shell 2
19 Version Control
20 Make Revised so that it doesn’t depend on Python.
Jan 23 Python 1 Basic features.
24 Python 2 Strings and lists.
25 Python 3 Functions and Libraries.
26 Testing Basic concepts.
27 Mini-Project 1 Build something useful with Python.
Feb 06 Python 4 Dictionaries and exceptions.
Debugging Deepened to include material from Zeller.
07 Python 5 Object-oriented programming.
Unit Testing Use the unit test framework to show what good OO design looks like.
08 Coding Style Update to include an actual Python style guide.
Reflection Complete rewrite: exec, eval, sub-processes, etc.
09 Regular Expressions
XML and DOM
10 Development Process Describe how a good shop actually works (with nods to XP and RUP).
Teamware Based on Trac.

Client-side and CGI web programming, security, and databases have disappeared completely; the three lectures on process have been folded into one; and there’s no end-of-course summary. I’m comfortable with those changes; what I don’t like is the amount of time spent teaching Python-the-language. I’d rather spend those hours showing them how to use Python to automate development activities, but you can’t cut trees ’til you have an ax.

Second, there’s no place in this new scheme for a lecture based on Paul Dubois’s CiSE article on maintaining correctness. There really ought to be: it shows the jigsaw puzzle of which many good practices are pieces.

Third, I’d like a second project lecture, showing students part of the build system for the course notes. This would let them see regular expressions and DOM in action, and would tie together many of the earlier ideas on automation. It’s this or teamware, though, and I think the latter is more important. Having made that decision, I’m wavering on whether to pull out the material on regular expressions and DOM.

Finally, everything I have to say about the development process is now squeezed into a single hour. It makes sense in this case, since IASSE students will get several more courses on the subject, but it’s definitely under weight for the AAAS workshop.

So: in order to pull this off, I’m going to have to revise one lecture per day from January 2 onward (including diagrams). I’ll post the new materials here until they’re polished, at which point I’ll swap them into the standard location. I’ll blog each time a lecture goes up: timely feedback would be greatly appreciated.

Software Carpentry

Visual Studio vs. Eclipse

December 26th, 2005
Comments Off

A nice list of some things that Eclipse does better than Visual Studio (which is still my favorite IDE). I got it from Mike Gunderloy’s always-excellent Larkware blog; if anyone sees a follow-up post describing things that VS does better than Eclipse, please let me know.

(And note in passing how few of these things are debugging aids—looks like there’s lots of room to turn the ideas in Andreas Zeller’s book into plugins, if anyone wants to make a name for themselves.)

Uncategorized

Review: Why Programs Fail

December 24th, 2005
Comments Off

2005 was an excellent year for books. Not only were there a lot of good ones, some covered topics that hadn’t been covered before (at least, not well or recently). Fogel’s Producing Open Source Software, Doar’s Practical Development Environments, Feathers’ Working Effectively with Legacy Code, Thompson and Chase’s Software Vulnerability Guide…

…and now Zeller’s Why Programs Fail. Zeller is the creator of DDD (http://www.gnu.org/software/ddd/), a graphical front end for the GNU debugger that uses box-and-arrow displays and charts (among other things) to show what programs are doing. Now a professor at Saarland University in Germany, Zeller is still building and studying new tools to help developers figure out what’s going wrong in their programs, where, and why.

This well-written, copiously-illustrated book is, in many ways, a status report from the front lines. After setting the scene in Chapter 1, Zeller dives straight into the first debugging tool every developer should: a bug tracker. He explains what good bug reports ought to contain, and how to manage them as their numbers grow.

He then gives us a chapter each on “Making Programs Fail”, “Reproducing Problems”, and “Simplifying Problems”. These chapters set the tone for the rest of the book: instead of high-level handwaving, we get a detailed look at what particular tools do, how, and (most importantly) why. He describes, for example, how to decouple a program from external components, and looks at the pro’s and con’s of replay debugging. Later, the chapter on simplification presents an automatic divide-and-conquer tool that can strip a test case down to its essentials.

Subsequent chapters are equally technical. Chapter 7, for example, looks at how dependency analysis and program slicing can be used to isolate faults. Chapter 8 discusses both interactive debuggers and logging frameworks; Chapter 11 looks at automatic anomaly detectors that compare execution traces from successful and failing tests, and Chapter 14 discusses cause-effect chains and backward reasoning tools like IGOR.

There are a lot of things to like in this book: the clarity, the references, the “How To” markers and end-of-chapters summaries. What I enjoyed most, though, was the feeling I got of watching over Zeller’s shoulder as he sorts through the jumble of parts on his workbench, sorting things into categories and figuring out how they all fit together. Most of us spend more time trying to figure code out than we do writing new code. Despite that, debuggers have always been poor cousins to editors, compilers, and other development tools. If, ten years from now, they have caught up with their peers—if the new approaches that Zeller describes have matured enough to be taken for granted—I think much of the credit will go to this book.


Andreas Zeller: Why Programs Fail: A Guide to Systematic Debugging. Morgan Kaufmann, 2006, 1558608664, 448 pages.

Books

Procrastination: One of the Few Things in Life Nicer Than Toast

December 23rd, 2005

I finished rewriting the build system for the Software Carpentry course notes yesterday. Doing so was an extended form of procrastination: the system I built over the summer and used through the fall was adequate, but I wanted to clean a few things up, and then, well, I might as well make it easier for other instructors to add site-specific content, and make tables inclusions instead of inlining them, and mumble mumble mumble type type type…

Of course, none of this has actually advanced the content of the course one whit. I have over seventy tickets to close, ranging in size from making sure that a particular Make example does what I claim to rewriting the lecture on security. And diagrams: no one was happy with the isometric ones created this term (not least because they’re kind of fuzzy), so I have over a hundred diagrams to re-do. In a perfect world, they’d be ready before I teach at the IASSE in mid-January. In this universe, I’ll be happy if they’re in place for the Essential Software Skills for Research Scientists workshop at the AAAS Annual Meeting on February 17.

We all do this. We all fold laundry instead of paying bills, or invent an antigravity drive when we’re supposed to be studying for an Economics final. (OK, maybe that was just me.) But it seems particularly common among software developers, many of whom would rather spend two hours creating a new (not better, just new) serialization class hierarchy than take five minutes to center-align the titles at the top of the product’s help page. One of the characters in Mark Costello’s Big If (reviewed here) is a prime example: his company desperately needs him to add some new monsters to a video game, so he spends a week adding shadows to clouds.

But back to the build system… What I have is a set of XML files marked up with a homegrown tag set, and what I want is some HTML pages. The files are organized into several directories: the main page is in the root, while all of the lectures are in lec/, and site-specific content is in sub-directories underneath sites/. Each directory that contains source XML files may also contain img/, inc/, and tbl/ sub-directories; in turn, each of those has one sub-directory for each of the source files, which holds images, sample code inclusions, and tables.

The build system consists of the following tools:

  1. A 500-line Makefile in the root directory that drives everything else. Roughly half of those lines are comments (which can be extracted and formatted as a wiki page to create on-line documentation). This Makefile includes another file called config.mk, in which users must specify the lectures they want to include in the course.
  2. A Python script called linkages.py that scans the source files and builds a data structure that records such things as the order of lectures, where glossary terms are defined, the two-part numerical IDs of figures and tables, and so on. linkages.py writes this data structure directly to a file called tmp/linkages.tmp.py, which other tools then import. Persisting the data structure directly saved me from having to mess around with parsers or serializers. The clever bit (ahem) is that I only write it out if (a) the file doesn’t already exist, or (b) the contents have changed. That way, if I change a source file in a way that doesn’t affect cross-linkages, Make doesn’t do a lot of unnecessary rebuilding.
  3. Once the linkages file is up to date, preprocess.py kicks in. This script creates copies of the source files under the tmp/ directory (preserving the directory structure), and adds information to those copies to make XSLT’s job easier. Among other things, it:
    • adds a unique file ID, and the path to the root of the build, to the lecture’s root element;
    • copies content from table files into the lectures;
    • adds citation information to bibliography references;
    • does multi-column layout of length tables;
    • inserts figure and table counter values (the “4.2″ in “Figure 4.2″);
    • fills in cross-references between source files;
    • replaces the <lecturelist/> element with a point-form list of links to lectures;
    • fills in the <figlist> and <tbllist> tags with lists of figures and tables respectively;
    • links terms in the glossary back to their first uses;
    • inserts included program source files;
    • links to external references;
    • adds “previous” and “next” linkage information to lectures;
    • generates a syllabus; and
    • adds tracing information, such as file version numbers and the time the files were processed.

    Each stage ought to be a filter of its own, and in fact I wrote them all that way to begin with. However, launching fifteen or more copies of the Python interpreter for each source file made the build rather slow; doing the piping internally reduced the time per source file from eight or nine seconds to less than a second.

  4. util/individual.xsl is an XSL script that translates the filled-in XML lecture file into HTML. This script handles the outer skeleton directly, handing specific tasks like the bibliography and special lists to other XSL files that it includes.
  5. A Python script called util/unify.py and an XSL script called util/unified.xsl work together to create a single-page version of the whole course. unify.py stitches the filled-in lecture files together; unified.xsl then applies the same transformations as individual.xsl, but formats hyperlinks differently (since they’re all in-file).
  6. I use another Python script called validate.py to check the internal consistency of the source files. Do any of them contain tabs or unprintable characters? Do all the required images, source files, and tables exist? I run this before checking in changes; it catches something about one time in five.
  7. And then there are the minor tools:
    • util/fixentities.py replaces character entities with character codes (to work around a problem with Expat);
    • util/wiki.py extracts specially-formatted comments from Makefiles and XSL files, and docstrings from Python, to create wiki documentation pages; and
    • util/revdtd.py reverse engineers the actual DTD of either the source files, their filled-in counterparts, or the generated HTML files.

It’s a lot of code; it was a lot of work; I’m pleased with how smoothly it all runs; and most of the time I spent building it should probably have gone into upgrading the actual content of the course. But small(ish) tasks are seductive: you can start work at 8:30, confident that you’ll have something to show (even if only to yourself) by noon. Editing course notes, well, the payoff is usually a long way away, and may not come at all: people who read through the first, flawed, version of the notes probably aren’t going to come back and tell you how much better the second version is.

That last observation is the key ingredient of my cure for procrastination: find some partners. I am always more productive when I’m working with people than I am on my own. Not only does a small team wander down fewer blind alleys than someone working alone, team members can keep each other honest, and give each other feedback and encouragement. They can also appreciate just how big an accomplishment it is to have replaced all the a’s and b’s in twenty-eight short examples of list manipulation with the names of minerals, beetles, and mathematicians.

It’s now ten to eleven, and I’ve managed to fend off productivity for almost an hour. Should I look on eBay for a WACOM Cintiq 17SX that I can afford? It’d make drawing diagrams much more fun. Or maybe I should try Nose: Miles Thibault says it’s much friendlier than the unit testing framework in the Python standard library. Hm… A cup of tea will probably help me decide. A cup of tea, and a slice of toast with strawberry jam…

Software Carpentry

Insanity vs. Stateful Programming

December 22nd, 2005
Comments Off

From Elizabeth Keogh:

When you do the same thing again and again, and expect a different result, that’s insanity.

When you do the same thing again and again, and get a different result, that’s stateful programming.

Uncategorized

Documents vs. Conversations

December 22nd, 2005

Amateurs playing chess think in terms of positions; sharks care more about combinations of moves. Amateurs think, “I’m going to build a pawn wall, get my bishops onto good squares, and castle so that my kind is somewhere safe.” Sharks think, “I’m going to advance this pawn so my rook can get to that square to cover an advance by my queen.” Since the board is constantly in motion, one piece per turn, the latter style of thinking almost always wins.

I’m starting to think that we’ve been thinking like amateurs when it comes to software requirements. We’ve been trying to create requirements documents, and then connect them to designs, code, tests, and so on. But real requirements are rarely static; they’re never all present and accounted for at one point in spacetime [1].

What happens if we think about requirements conversations instead? What if we stop trying to say “X must Y” and start saying “Having read P and Q, R believed at time T that X must Y”? This shifts the focus from absolute facts (which implicitly assume that omniscience is possible) to relative beliefs (which is all we really have anyway). It also makes the temporal and causal aspects of “requirements” explicit: you believe something at a particular time because of something you read, heard, or thought of at some earlier time.

Many successful open source groups already work this way. Their “specs” are mailing list threads, and the comment streams attached to feature requests and bug reports. It ought to be chaotic, but as Karl Fogel describes in his recent excellent book Producing Open Source Software (reviewed here), in practice it is often very efficient.

So, what would a conversation-centric requirements management tool look like? My first guess would be a search engine that paid close attention to chronological order, reply-to headers, and the like. I’d want it to detect, highlight, and stitch together relevant subsections of composite items—e.g., to notice that only the middle third of the message Alan sent last march was about authentication. The goal would be to allow a developer to put her cursor over a method or test case, right click, and bring up a list of links to the things she needed to read to understand what the code was (supposed to be) doing [2]. I’d also want to be able to drive the tool in the other direction, and ask, “Which bits of this project depend on what was said on this topic before last week’s mailstorm?”

Automating this completely, with no extra human input, is a non-starter, as it would require software that understood natural language. A more realistic tool could combine AI techniques [3], human tagging, sheep entrails, or anything else. The key requirements are:

  1. The extra effort required from stakeholders must be small.
  2. The payoff must be immediately obvious.
  3. It must mine conversations in the form they actually take, including email, bug reports, wiki pages, code comments, test case names, and so on.

Any takers?


Coincidentally, Jon Udell just posted this piece on scannable conversation summaries, which includes links back to his earlier discussion of heads, decks, and leads.


[1] This is true in those rare cases when requirements actually have been fixed and finalized. Since our short-term memories is limited, we can only ever hold part of even a medium-sized spec in our minds at once. Wandering around a fixed spec is, I believe, no different from standing still and watching one part of it evolve.

[2] Note that conversation-centric development is orthogonal to the question of agile vs. design-first development. In my experience, for example, it’s equally hard to trace cause-and-effect after the fact in programs developed using XP and RUP. What both lack is a methodical way to connect tests and methods back to pronouncements: neither user stories on 3×5 cards, nor use cases hyperlinked to sequence and class diagrams, come with (for example) a canned query that will pull up the relevant antecedent conversation.

[3] Prof. Jane Hayes has been using Information Retrieval (IR) algorithms to match requirements to code, and Bin Liang (an undergraduate student at the University of Toronto) investigated IR’s effectiveness with test cases in the Fall of 2005. Both, however, assumes a static requirements document, rather than a dynamic conversation.

Research

Choosing Sides

December 21st, 2005

Bruce Schneier, on revelations that President
Bush authorized the NSA to engage in domestic spying
:


Any debate over laws is predicated on the belief that the
executive branch will follow the law.

Vladimir Bukovsky (who spent 12 years in Soviet prisons for human
rights activities), on the ineffectiveness of torture:


…why would democratically elected leaders of the United
States ever want to legalize what a succession of Russian monarchs
strove to abolish?

Canadians are a month away from choosing a new government. Most
don’t care; most think, “It couldn’t happen here,” or, “There’s
nothing I can do about it anyway.” I think they’re wrong. I think
the time has come to choose sides, and I’m on Maher Arar‘s. Please, tell the
candidates in your riding that you want a full inquiry, not the
whitewash job we were given this year.

In Germany they first came for the Communists, and I didn’t speak up because I wasn’t a Communist.

Then they came for the Jews, and I didn’t speak up because I wasn’t a Jew.

Then they came for the trade unionists, and I didn’t speak up because I wasn’t a trade unionist.

Then they came for the Catholics, and I didn’t speak up because I was a Protestant.

Then they came for me—and by that time no one was left to speak up.

Uncategorized