Archive

Archive for February, 2008

Building Filters

February 15th, 2008

I decided earlier this week that the time had come to convert the Software Carpentry
notes to a wiki to make it easier for other people to contribute.  My decision was motivated partly by thinking about converting DrProject to use Markdown syntax for its wiki, and partly by the realization that I’m not going to have time in the next ten months to fix all the typos people keep pointing out, add new content, bring the examples up to date with Python 3000, and so on.

The first step was to pick a wiki syntax.  That was easy: there are Markdown processors for Perl, PHP, and Python, several wikis support them, and my hands are going to be learning those typing rules anyway.  The second step was to convert the existing notes, which are marked up in a homegrown XML format.  This seemed like a good candidate for a classic Unix read-process-print-repeat filter, and sure enough, a few hours later, I have something working.  I took notes as I did it; I’m posting them here as a record of how a moderately experienced developer tackles a routine problem.

  1. Copy fifteen lines of code from one of the filters I use to turn the .swc XML files into HTML; this gives me something that parses XML to create an xml.dom.minidom tree in Python.
  2. Write a recursive function that takes an output stream and a DOM node as inputs, and writes a representation of the latter to the former.  If the node is a TEXT node, print its content to the stream; if it’s an ELEMENT, switch on the tag, then recurse on its children.  If it’s anything else, print a warning to standard error and halt.
  3. Fill in that switch (which in Python is a chain of if/elif/elif/… statements).  Initially, each branch’s body is just ‘pass’; the ‘else’ clause prints the tag’s name with stars around it.  After running the 34 .swc files through this a couple of times, I have branches to fill in for all the tags I’m using.  (No, there isn’t an up-to-date DTD…)
  4. After typing the shell commands to loop over all the .swc files a couple of times, I double back and put them into a Makefile.  I use a pattern rule to say that %.txt depends on ../lec/%.swc; I’m not embarrassed about hard-coding paths, because this tool is only going to be used in this context.  I also define a ‘clean’ target that gets rid of all the generated .txt files and other shrapnel.
  5. Start filling in the branches.  Some are easy: the ‘<t>’ (text) tag has no analog in Markdown, while ‘<b1>’, ‘<b2>’, and ‘<b3>’ (bullets at different levels) are just appropriate levels of indentation plus a star. Then I hit ‘<em>’ (emphasis), which requires a closing tag after the children. No problem: I define a list variable called ‘stack’, append the text of the closing tag(s), then print those items in reversed order after iterating over the children.
  6. Next is cross-references.  The .swc file has ‘<scref id=”intro”/>’, which in HTML is converted to ‘<a href=”intro.html”>Introduction</a>’: the word “Introduction” is taken from a lookup table that’s built by a preprocessor that scans all of the .swc files and archives things like page titles, bibliography citations, glossary terms, and so on. I could either modify my existing script to read all the .swc files at once, extract this information, then process them, or write a separate preprocessor.  Since I already have a preprocessor that does 200% of what I need (i.e., everything I’ll need for this conversion, plus more), I copy that and chop out the bits that I don’t need. Note that I don’t have to think about the format for this extra information: the preprocessor builds it as a dictionary of dictionaries, then prints that object to a file.  The SWC-to-DOM program then uses ‘eval’ to load that data (which is a legal Python expression).
  7. OK, now I have cross-references, glossary items, and bibliography citations;that just leaves inclusions. The .swc files use ‘<inc path=”…”>’ to include code fragments, and ‘<tbl path=”…”>’ to include tables.  (I chose to do the former so that all my code examples would still be runnable from the command line; I can’t remember why I chose to do the latter, but it was overkill.)   Code files are just lines of text; that easy.  Table files are marked up with ‘<tbl>’, ‘<row>’, ‘<col>’, and so on; after putting the XML reading code into a function (which I should have done off the bat), and adding a few more branches to the big switch statement, they’re taken care of too.
  8. I’m now generating text files that look fine to me.  What will Markdown think of them?  I add five lines to my Makefile to convert .txt files to .html using markdown.py, and… Oh.  OK, the whitespace in the .txt files I’m generating is confusing Markdown.  And my code fragments need to be indented. And I’d forgotten that Markdown doesn’t directly support tables (they’re an add-on).  Mutter mutter fix fix fix… There.  Half a dozen fixes to the SWC-to-Markdown script, and a little postprocessing to strip off extraneous newlines (it turns out to be easier to do this at the end than to keep track during translation of whether it needs to be done), and voila: the HTML is almost right.  The few places where it isn’t are things I’ll take care of by hand, like double escaping of accented characters in people’s names.

So what are the takeaways?

  1. Real programming involves a lot of opportunistic bricolage (a fancy way of saying “re-using bits and pieces that are lying around, or can be torn out of wherever they are and re-purposed”). You can only do this effectively if you keep track of what you have, know your way around the standard libraries, and so on, but hey, if 15-year-old DJs can keep thousands of tracks at their fingertips for sampling, you ought to be able to as well.
  2. I have no idea whether a read-process-print-repeat filter was the “best” way to solve this problem or not, and I don’t care. I could immediately see how to fit my problem into that model, and I have enough practice writing such filters that I was confident I’d be able to deal with anything unexpected that came up.  I could have done some up-front design, realized that I was going to have to deal with cross-references, and put together the tool that parses all of the files to extract link endpoints before doing anything else, but in this case, doing things in the “wrong” order probably didn’t cost me any time. The more experienced you are, the more often you can work this way; remember, though, that experience comes from making mistakes…
  3. My tool only solves the first 99% of the SWC-to-Markdown conversion problem. If I was going to release it to the world, I’d do the last 1%, and the X% after that (docs, an Egg for distribution, a page at the Cheese Shop, etc.). However, the Software Carpentry notes are the only .swc files in the world, so this is definitely the point of diminishing returns; the little bits that are left will be easy enough to fix up by hand.

Teaching

Rationalizing the Admin Interface

February 15th, 2008

Anyone who has ever worked with me knows that I should not be allowed to design user interfaces. Nature, nurture—dunno why, but anything that I find intuitive and pleasing leaves most people queasy and confused.

Which is why I’m appealing for help. DrProject‘s browser-based administration interface is invaluable, but we’re finding the workflow frustrating. For example, in order to add a new user, make her a member of the ‘All’ and ‘fribble’ projects, and turn on mail forwarding for her for both of those lists, I have to:

  1. Go to the ‘add user’ page.
  2. Fill in her user ID, default email address, real name, and affiliation.
  3. Submit.
  4. Go to the ‘list users’ page (the refreshed ‘add users’ page tells me her user ID has been added, but that ID isn’t a hyperlink to a page where I can administer her information, and even if it was, what would I do if I was adding a bunch of people at once, which the ‘add user’ page also supports?).
  5. Scroll down to her ID.
  6. Click on it to bring up a page where I can edit her personal settings.
  7. Add her to the ‘All’ project as a ‘viewer’ and submit.
  8. Add her to the ‘fribble’ project as a ‘developer’ in the refreshed page and submit.
  9. Scroll down to the bottom of the refreshed page and tick off the boxes, turn on mail forwarding for her for ‘All’ and ‘fribble’, and submit.

Other tasks are similarly arduous. The screens in question are below the cut; if you have suggestions for redesign, I’d love to hear them.

Read more…

Uncategorized

Grumpy Minds Think Alike

February 14th, 2008

Ned Gulley pointed me at this talk by fellow-Mathworkser Steve Eddins that hits a lot of the same core ideas as Software Carpentry.  Time to convert SC to a wiki, I think, and start pushing it again…

Software Carpentry

Google HOP Wraps Up

February 12th, 2008
Comments Off

Google’s Highly Open Participation project (a high school equivalent of Summer of Code) has just wrapped up, and the ten grand prize winners have been announced.  Congratulations to everyone involved—it was a great idea, and I hope Toronto can be involved next time around.

Uncategorized

Hotwire Shell

February 12th, 2008
Comments Off

Via Jeff Balogh, a pointer to Hotwire Shell, a free object-oriented hypershell inspired by PowerShell that runs on Linux, and is being ported to Windows and Mac OS X.  The principal author seems to be Colin Walters; I’ll post more info as I get it.

Extensible Programming

Reviewing Markdown

February 11th, 2008

I was talking to my software engineering class about code reviews last week, while simultaneously thinking about how to replace the wiki parser in DrProject. Two birds one stone, I decided to review a wiki parser written in Python and keep notes on what I was thinking as I went through it. I’ve posted the result as a separate page on this web site; I’d be interested in comments and feedback.

Later: Diomidis Spinellis posted a similar think-aloud back in 2004 that’s worth reading.

DrProject, Teaching

Rebooting Environmentalism

February 11th, 2008
Comments Off

Interesting piece on the changing environmental movement from Ross Robertson, via Ned Gulley.

Uncategorized

Yet More Weight Behind OpenID

February 7th, 2008

This morning the OpenID Foundation announced that Google, IBM, Microsoft, VeriSign, and Yahoo! have joined the board. We really need to get this into DrProject — I’m definitely proposing it as a Google Summer of Code project this year. (If it’s good enough for Estonia, it’s good enough for me ;-) ).

DrProject

The First Check

February 6th, 2008

I just got sales figures for Beautiful Code from O’Reilly: since its release last summer, it has raised more than US$38,000 for Amnesty International. My thanks once again to the authors, and to everyone at O’Reilly who helped make the project a reality — and, of course, to everyone who bought the book as well.

Beautiful Code

Another Reminder

February 6th, 2008
Comments Off

I had another reminder today of why I like to teach. I loaned a Ruby on Rails book to a student before Christmas; this morning, he sent me a link to the kanji flash card game he’d built. Neat.

Uncategorized