Archive

Archive for December, 2008

Leslie Hawthorn Writes About Summer of Code

December 18th, 2008
Comments Off

Leslie Hawthorn (who has made thousands of students’ lives better by keeping Google Summer of Code on track for the last four years) has written a thoughtful article about the program.  Worth reading…

Uncategorized

Scott Leslie on “Just Share Already”

December 18th, 2008
Comments Off

Good expression of the frustration someone who wants to “just do it” feels when dealing with people who want to plan/organize/discuss it.

Uncategorized

Segaran on the Excluded Middle

December 18th, 2008
Comments Off

Nice post from Toby Segaran (author of a very good book called Programming Collective Intelligence) that discusses the “excluded middle” of technical books—worth reading if you’re thinking about writing anything for the geek market.

Books

How Far We Got

December 18th, 2008

There’s a quote (attributed to various people—I’d welcome a pointer to the original) to the effect that if you show me your code, I don’t know what you’re doing, but if you show me your data structures, I’ll understand.  To figure out just how far our students got rebuilding DrProject on top of Django this term, I asked one of them to generate a schema diagram for the database tables.  The result, included below, was created by running the following commands in a virtual environment:

$ svn checkout http://django-command-extensions.googlecode.com/svn/trunk/ django-extensions
$ cd django-extensions
$ python setup.py install
$ django graph_models -ag > schema.dot
$ dot -o schema.png -Tpng schema.dot

Basie 0.1 schema

(Note: I moved the three tables floating in the bottom middle from the upper right corner to make it more printable.)

Basie

Entry-Level Code Review Procedures?

December 18th, 2008

Since September, half a dozen students at four universities have been rebuilding DrProject (our lightweight classroom-friendly replacement for Trac) on top of Django (a Rails-like web programming framework written in Python). What’s made this project different—and IMHO better—is the use of code reviews. Blake Winton, a local Python hacker, reviewed every single commit that came into the SVN repository in the first couple of weeks of the project. Thanks to his example, the students started reviewing each other’s work as well (Jeff Balogh, a two-time Google Summer of Code veteran who’s starting a full-time job with Mozilla in January, being the most prominent culprit).

It made a huge difference to productivity and code quality, and we’d like to do it again next term, but are wondering how best to implement it. We managed reviews this term by having each commit diff echoed to a mailing list; a self-appointed reviewer would reply to the email with comments, the author of the diff would reply, others would chip in, etc. I thought it worked pretty well (especially relative to the near-zero setup cost), but some students said in the post mortem that some commits got lost in the cracks, while others said they found it hard to track what was going on, since code review threads often turned into design discussions without a signal going to the larger group.

So, my question is, what could/should we do next term without either a big investment in infrastructure, or weeks of retraining? (I have no objection in principle to doing either, but since students only work 10 hours a week on this project, and usually have four other courses on the go, I have to focus on the absolutely smallest thing that could possibly work.) One suggestion has been to prepare a diff and send it to the reviewers’ mailing list before committing, so that reviews happen before code goes into the repo. Another is to pseudo-randomly assign commits to other team members for review (so that nothing gets dropped on the floor), and to use a “three strikes” rule to promote discussion from the review list to the dev list. What would you suggest? What do you think would work for a larger group (say, a class of 50 students, working in teams of 5, each team doing an 8-hour-a-week term-long project in parallel)?

Basie, Teaching

Three Reasons to Distrust Microarray Results

December 10th, 2008

Interesting post:

…the paper actually demonstrated that is it possible to distinguish microarray experiments conducted on one day from experiments conducted another day. That is, batch effects from the lab were much larger than differences between patients who did and did not respond to therapy…  As is so often the case, data were mislabeled. In fact, 3/4 of the samples were mislabeled.

Software Carpentry

Random Library Entries

December 10th, 2008
Comments Off

As I noted back in May, I’m using LibraryThing to keep track of my reading these days.  The “Library” tab on my site now displays a random selection of books that I’ve enjoyed; hope you enjoy ‘em too.

Books

Monkeys, Bananas, and a Fire Hose

December 8th, 2008

This story has been repeated in so many places that it has to be an urban myth, but all good myths contain a grain of truth. I was reminded of that this morning, when I found the new U of T phonebook in my mailbox.  That’s right, a printed-on-paper phonebook: cheap paper, but a glossy cover. I’m guessing at least $2 per copy to produce.  Multiply that by 5000 copies (I’ll bet the real number is closer to double that), and double it again for human labor and recycling cost, and you’ve got 1/3 of a person’s salary at a time when the university is facing a serious financial crunch, all for something that would have been out of date even before the presses started rolling. *sigh*

Uncategorized

Adam’s review of “Clean Code”

December 8th, 2008
Comments Off

Adam Goucher has posted a review of Robert Martin’s Clean Code. It’s much more detailed than mine, but reaches the same conclusion: good book, worth reading.

Books

How Scientists Manage Code

December 7th, 2008

The latest issue of Computing in Science & Engineering has a paper by David Matthews, Steve Easterbrook, and myself titled “Configuration Management for Large-Scale Scientific Computing at the UK Met Office”. It describes how the folks who do climate modeling in the UK built a configuration management tool on top of Trac to handle their million-line code base.

Research