Three Years Old
Today marks the third anniversary of this blog. As I near my thousandth post, I figured I’d bore you all with a look back.
DrProject
Unsurprisingly, DrProject has been the blog’s single biggest topic (under several different names):
- 2004-06-22: the first mention of Helium (a Java-based predecessor of DrProject that we intended to build from scratch).
- 2004-07-26: my first gripe about Python having too many half-finished web frameworks. I've taken some flak for saying this, but I still think that by failing to unite behind one, the Python community gave the game away to Rails. Speaking of which, this post, from August 10, is my first mention of Ruby, and this one, from December 14, is my first mention of Trac, the system from which today's DrProject is descended.
- 2004-08-04: Helium was over-engineered in many ways; this post discussed some of the tricky problems created by the user-and-project model I foisted on it.
We abandoned Java and the Hippo code base early in 2005. The main reasons were (a) the XML configuration files that tools like Hibernate depend on aren’t debuggable, and (b) it took students so long to set up a working environment that they couldn’t make much progress in a term. This post from September 14, 2004, was the first sign of trouble; things went downhill from there.
But back to DrProject: Chris Lenz did some cleanup on Trac for us in the spring of 2005, in exchange for which I mentored a Google Summer of Code project for him (where “mentored” means “stayed out of his way and was awed by his productivity”). The project picked up speed after that, thanks in no small part to generous donations from the Jonah Group and Perforce:
- 2005-08-22: after a summer of hard work, our fork of Trac (which at the time we were calling Argon) was up and running, but very slow. The cause turned out to be the fifty million calls to a Unicode end-of-line marker Python was making each time we ran the CGI. Those N2 algorithms will get you every time…
- January 2006: Sean Dawson, Jason Montojo, and Chris Lenz started working full-time on DrProject. A new version went live on January 13.
- 2006-02-20: we switched to Kid as a templating engine. Progress was slow (as was DrProject itself for a while), but steady.
- 2006-03-29: Sean Dawson wrote a piece about processes hanging in DrProject. It turned out to be a taste of things to come…
- 2006-04-16: Igor Foox, Greg Lapouchnian, and Pat Smith produced the first screencast of DrProject in action. Greg won a Google Summer of Code award that year, and built us a web administration interface.
- 2006-06-26: 98% of the tickets for DrProject 1.0 were closed. I thought we were almost done…
- …but then Billy Chun figured out why DrProject was running so slowly, and it turned out we weren't done after all.
- The first screenshots of Version 1 went on the web on July 15; we did the final release on July 17, and updated the screencast on August 14.
- 2006-10-23: I started a series of posts about DrProject's internals, largely to get them clear in my own head. I later used these posts in my Software Architecture course. They got me wondering whether users would prefer an IDE-based interface to DrProject.
- 2007-01-07: the first release candidate for DrProject 1.2 went up on the web. It had grown quite complex, but we still managed to get the final release out on February 6.
- 2006-03-08: after a meeting with people from several local companies, I posted a proposal for a new ticketing system. A month later, DrProject got its first ticket spam.
Software Carpentry
Improving scientists’ software development skills is the reason I got interested in software engineering in the first place:
- 2004-12-30: the Python Software Foundation gave me a grant to develop an open source course on software development for scientists and engineers called Software Carpentry.
- 2005-07-08: I put an alpha version of the notes on the web. (Calling them "alpha" is pretty generous…)
- 2005-07-29: Nature ran a short blurb about the course.
- 2005-08-22: Andy Lumsdaine decided to offer Software Carpentry at Indiana University.
- 2005-09-14: the Toronto edition of the course met for the first time. (This was actually the second time I'd taught the course at U of T, but the first time it was on the books and for credit.) There were initially 93 people in the course (!), of whom about half stuck with it to the end.
- 2005-12-09: American Scientist ran an article I wrote about the motivation for the course called "Where's the Real Bottleneck in Scientific Computing?". It was later republished in German in Spektrum magazine.
- 2006-02-17: I ran a workshop about the course at the annual meeting of the AAAS.
- 2006-03-26: I got angry about a report from Microsoft Research called 2020 Science that gushed about the future of scientific computing without any more than passing mention of the problem of making sure scientists' programs actually work.
- 2006-04-28: all the minor corrections to the notes were finished.
- 2006-06-25: the course notes moved to their permanent home on one of Enthought's servers.
- 2006-07-14: Version 2.0 of the course notes went up on the web.
- 2006-08-04: HPCWire interviewed me about the course.
- 2006-08-17: I gave a talk at SciPy'06 on selling Python to scientists and engineers. (Hint: don't. Instead, sell them solutions to their problems that happen to use Python, then wait.)
- 2006-10-31: a result published in Nature was retracted because the code used to produce it had been flaky. Five months later, scientists from the Scripps Institute had to retract five papers published in various prestigious journals because of a sign error in a computer program. Stories like these are making the course easier to sell…
- 2006-11-28: Computing in Science and Engineering ran an article I wrote about the course.
- 2007-04-02: Titus Brown got a contract to teach the course at Lawrence Livermore National Laboratory.
49X
I’ve been supervising undergraduate student projects under the CSC494 and CSC495 headings since 2002. We’ve accomplished quite a bit:
- 2004-08-28: my student teams scored 5 (or 6) out of 12 on the Joel Test. Today, the good ones score 8.
- The summer student teams have always worked well together. As I noted on 2004-09-14, social activities are a large part of the reason.
- Managing student projects via auto-generated blogs was a revelation when I first tried it in January 2005. Even then, I was wondering how to integrate instant messaging into software engineering as well.
- I used to be very sceptical about Extreme Programming and other agile methodologies. Discussions with Peter Hansen and others have since convinced me that it's a valuable tool in some contexts, and worth teaching.
- The Psiphon project made the Globe and Mail in February 2006.
- I've always run post-mortems on 49X projects, and learned a lot from this results.
Research Projects
- I first thought of Bayesian filtering to detect duplicate posts in newsgroups in October 2004; Helen Bretzke and Jonathan Lung looked into it in 2005-06, but it didn't seem to lead anywhere.
- Quantifying the learning curve for tools and languages (2005-01-02): this was briefly fashionable in the 1970s and 1980s (look up a conference called "Empirical Studies of Programmers"); I think it'd be worth revisiting now that software engineering has a clearer idea of how to do empirical studies.
- What does an entry-level requirements tool look like? I started wondering back in May 2005, prompted partly by discussions with Steve Easterbrook and Jorge Aranda, and partly by the realization that none of the widely-used software project management portals (like SourceForge) offer any help with requirements. In December 2005, I believed a (the?) solution would be to treat requirements as conversations to be searched, rather than documents to be assembled, but I'm still open to suggestions; Jorge, Steve, and I are now studying how small companies actually manage requirements in the real world in order to get more ideas. The simplest so far is to allow hierarchical organization of tickets.
- Many unhappy experiences have convinced me that the world really needs someone to figure out a "debugger" for configuration files ought to work. Do this, and every Ant, Apache, Hibernate, and Tomcat user in the world will be your new best friend. (I keep coming back to this topic; one attempt to build a UML model debugger is now on SourceForge.)
- A library's API is the set of functions it allows others to call. Why don't we also care about the functions a library (or application) needs to call (which I dubbed its XPI in December 2005)?
- Your version control system is only as good as your diffs, but sadly, after forty years, the only things we can diff reliably are plain ol' text files. There's research waiting to happen here…
- Or if that seems too mundane, how about looking at ways to express temporal information about variables as part of their types?
- Or maybe you'd like to measure the value of modeling? (Hint: I don't think it's that useful…)
Books
- Data Crunching appeared on Amazon in April 2005. Jon Udell liked it; so did Focus on Java and StickyMinds.
- I first asked the web for pointers to good programmers in April 2006. At SIGCSE'07, Grady Booch let the cat out of the bag by telling the world about Beautiful Code. I posted a table of contents on March 27.
Miscellaneous
- I first mentioned code review, and the problem of making it a normal part of the undergraduate curriculum, on 2004-06-26. I tripped over the issue again in March 2005. In 2006, Jennifer Campbell and I got a grant from U of T to hire two students full-time to develop OLM. The first screencast of their work went up on the web on 2006-08-29, and we've been using OLM in courses ever since.
- Women are under-represented in computing, and the situation is steadily deteriorating. Michelle Levesque and I looked at why the gender ratio in open source is so much worse than in the industry as a whole (2004-10-08).
- : I was HP's rep on the Groovy JSR for three months in the summer of 2004 before giving up. These posts from September 7 and September 11 explain why. Groovy 1.0 finally appeared in January 2007; interest was…muted.
- I really liked Joel Spolsky's explanation of Unicode. I've asked twice (2004-11-22 and 2006-04-04) for someone to write a similar explanation of calendars, time zones, and the like, but to no avail.
- My post on interviewing at Google from 2005-01-19 is still the most popular in the blog. Michelle Levesque's description of life at Google from October 2006 explains why ;-).
- I first realized in May 2005 that Javascript has a real chance to become the dominant scripting language. I haven't put any money into it, though… ;-)
- I started redesigning U of T's software engineering courses in July 2006. I posted the new syllabus in March 2007.
- I've been a mentor for Google's Summer of Code every year that it's run. I think it's a great program, but as I said in August 2006, I think it could be better.
- Todd Veldhuizen's paper Software Libraries and the Limits of Reuse: Entropy, Kolmogorov Complexity, and Zipf's Law keeps prompting new ideas. In January 2007, for example, I wondered whether his ideas imply an intrinsic tradeoff between abstraction and debuggability.
- I wrote an article on extensible programming for ACM Queue in 2004. In March 2007, I saw the first signs that it was actually happening.
And Then
Our daughter Madeleine was born on March 31, 2007. Suddenly, all my other projects seemed a lot less important.