Currently Juggling
I keep telling my students not to over-commit themselves. It’s a shame I don’t take my own advice
. Here’s what I’ve currently got on the go:
Software Carpentry teaches basic software development skills to scientists and engineers. I have 80% of the funding I need to spend a year upgrading its content and delivery. I hope to raise the last 20% of the money in the next few weeks. If I can pull it off, the major challenges will be:
- Learning how to create effective online course material: there’s lots of handwaving out there about wikis in the classroom, but nothing substantive about instructional design for mature learners using present-day internet technologies.
- Assessment. We don’t know how to measure the productivity of programmers, or the productivity of scientists; trying to gauge this course’s impact on the productivity of scientific programmers will therefore be something of a challenge. (One of the reasons I left industry for academia in 2006 was to figure out how to do this, but my attempts to find research funding all failed.)
- Mechanics. Site5 only allows one shell account per domain, which makes it difficult to open up the project’s Subversion repository to other contributors. And I’ll have to choose a format for the lecture notes: LaTeX, plain HTML, S5, one of the many wiki formats… And figure out a better way to create and manage images and video. And pick a bibliography format. And…
A professional Master’s degree in Computer Science at the University of Toronto to complement the department’s existing research Master’s. The program consists of five regular graduate courses, a course each on business skills and professional communication, and an eight-month industrial internship in which students have to show that they can translate theory into practice. We are now accepting applications for September 2010 entry, so if you’d like to learn leading-edge ideas from some of the best researchers in the world, please check it out.
Basie, our replacement for Trac, built on Django and jQuery, is coming along nicely, but I don’t know what will happen to it once I leave U of T. A few non-students are now involved in its development, but we aren’t big enough to bid for our own Google Summer of Code students. If anyone would like to get involved, please give me a shout. (I’d particularly like to hear from ex-project students—it would be nice to have an excuse to stay in touch.)
UCOSP stands for “undergraduate capstone open source projects”. Since September 2008, undergraduates from several universities in Canada and the US have been taking part in joint capstone projects in order to learn first-hand what distributed development is like. Each team has students from two or three schools, and works for a term under the supervision of a faculty or industry lead on an open source project. We’re currently trying to find $35,000 to hire a half-time administrator to run the program from September 2010 so that we can scale up from the present 45 students/term to 80, 90, or more. Again, if you’re interested, please give me a shout.
CSC302 is my regular undergraduate software engineering course. This term, six teams of students are porting Django to Python 3, adding pivot tables to Gnumeric, parallelizing parts of ILUTE, upgrading PyLint, pluginifying Selenium, and extending SpatiaLite. It could be the last regular course I teach at the University of Toronto; it has been a bit bumpy, but I’m glad the students are getting to work on real things.
Grad student supervision: Alecia, Zuzel, and Mike all have topics nailed down, and Jason is writing up. I plan to spend one morning a week in the department working with them from now through next January; I’m looking forward to seeing what they produce.
The Cowichan Problems. This one goes back to the mid-1990s, when I first realized that human performance was at least as important to overall productivity in computational science as machine performance. The idea is to use a suite of fairly simple applications, all stitched together, to benchmark the usability of parallel programming systems. A couple of undergrads updated the code last year; I’m hoping to revisit it as part of my work on Software Carpentry.
Book #1, called What Really Works?, is a Beautiful Code-style book that presents evidence-based results in software engineering. Where do bugs actually come from? Does pair programming get the job done faster? Can code metrics predict post-release fault rates? Are some programming languages intrinsically more productive than others? Each of our authors will explore one such question in a chapter-length essay; contributions are now coming in, and we’re still on track to have the book on the shelves this summer. (I’ve been talking about this subject and this book for a few months now; if you’re interested, you can view the slides.)
Book #2 is yet another collection, this time exploring the architecture of open source applications. As I said in my lightning talk at PyCon, the aim isn’t really to explain the internals of Hadoop, Parrot, and Mercurial (though I think that’s worth doing). The real aim is to teach people how to think about software architecture by showing them how architects think. We’re hoping to have chapters in for review by November, and the book out this time next year.
Book #3 is an illustrated children’s book about the universe, life, science, and global warming. I’ve had some good feedback from the editor who handled my last children’s book, but most of the work is still in front of me.
Projects I’m not working on:
Government 2.0: I enjoyed working on open data/open government projects with my students last term, but I couldn’t find any faculty at U of T willing to keep it going. I could have found Gov 2.0 stuff for CSC302, but I thought open source work would be better for them.
Two novels and half a dozen short stories. I enjoy writing fiction, but it feels like an indulgence, and I keep pushing it aside to do “serious” stuff. I’m sure that when I’m seventy I’ll regret having done that, so I hope to spend one hour a day writing fiction once I start full-time on Software Carpentry.
Jazz: I haven’t touched my sax since this time last year—it may be vanity, but I’d rather not play at all than play badly. Maybe when my daughter’s a little older…
Exercise: yeah… exercise. Maybe I’ll get my bike back on the road this week…
Basie, Government 2.0, Making Software, Research, Teaching, Uncategorized
The professional CS Master’s sounds great! If I were just coming out of undergrad I would be seriously interested in that. Also, your CSC302 projects for this term sound really cool.
Measuring software and scientific productivity should really be considered “grand challenges” in our field. (Just look at the promotion and tenure process at universities to see how all of academia struggles with the problem of assessing scientific productivity).
Given how non-controversial the material is from a software engineering point of view, I don’t know how important a rigorous assessment of the impact on productivity is. You’d probably have a hard time publishing the results of an empirical study that showed that using version control makes software developers more productive. David Parnas has noted that electrical engineering didn’t need empirical studies to show them that circuit notations were useful.
From a Software Carpentry assessment point of view, it’s much easier to assess “process conformance”: how many people actually end up using this stuff X number of months (years?) after the course is taught.
On the other hand, evidence of improved productivity might be useful as an evangelization tool to convince other scientists to adopt these method…
@Laurie Does that mean you’re not interested now? Cuz, you know, it would be fun to have you back…
@Lorin The material may be uncontentious from an SE point of view, but claiming that doing these things makes scientists more productive—in particular, that investing 120-150 hours to master these skills will pay off—is a harder claim to back up. We certainly got it wrong in some of the early runs of the course, and there’s nothing but my personal experience to back up the claim that the current version is better.
To play devil’s advocate for a moment, and counter David Parnas’s example of EE’s and circuit diagrams, look at how _few_ developers in industry use UML. I don’t think this is ignorance: almost everyone with a CS degree has been exposed to modeling notations. I think UML doesn’t actually help people enough with analysis or explanation to make creating and maintaining UML diagrams a good investment of time. So if we (the SE community) can get something like that wrong, why should scientists trust our claims about version control without supporting evidence?
As you said, this is the real grand challenge…
I think that where there’s a clear overlap between what the SE academic community recommends and what the SE practitioner community actually does, then our claims about the value of these techniques have the most force. Admittedly, this is a pretty small set, but I would include “version control” and “automated tests” (i.e., a suite of tests you can run at the push of a button) in that set. And, indeed, more sophisticated scientific software projects do use version control and automated testing, and those can be pointed to as examples that work for scientists, you just have to make the argument that they scale down.
However, you’re absolutely right that there are many technologies that are advocated, where we don’t really know if they make people more productive. UML is a good example: some people use it, but not many. Formal methods is an (extreme) example of this, where it’s hard to name many companies that use it in practice.
On the other hand, there are techniques that we know are effective from evidence, but people don’t use them, either because they don’t know about them, or they’re hard. Inspections are the poster child here, estimation techniques are probably another one. I’d advocate the following strategy:
* If the SE academic community advocates something that is in widespread use in the practitioner community in general, and in the high-end scientific SE community in particular, it can be taught as “something that we know from experience works”. (e.g., version control, unit testing)
* If the SE academic community advocates something that is not in widespread use, but we have good results from empirical studies (e.g., inspections), then it can be taught as “something we know from research works”.
Otherwise, if it’s advocated by the SE academic community, but not many people are using it and there isn’t strong evidence for it, I wouldn’t teach it to the scientists.
More interestingly, there are technologies that have seen widespread adoption in the absence of empirical evidence that support their productivity. The two examples that come to my mind are dynamic languages (e.g. Python) and OO. But I think it’s reasonable to include these because widespread community adoption of a technology suggests that the technology is useful, even if the evidence is not as strong as a systematic study.
Don’t get me wrong, though, I think it’s a great idea to evaluate the effectiveness of the course by measuring the productivity impact. Heck, I’d love to be involved in the design of the assessment. Even if all of the course material was backed up by strong empirical evidence, that’s no guarantee that the students will be able to effectively assimilate and apply it. In your goal to do assessment, you need to solve the open problems mentioned previously (measuring software productivity, measuring scientific productivity) as well as measuring the effectiveness of a teacher + curriculum, which is also an “open problem” that people are still trying to figure out.
The best thing to do is probably to use multiple metrics, and include some subjective measures in there. Do they feel more productive afterwards? We know that people are not good at judging their own productivity in an absolute sense (Dunning-Kruger effect), but maybe they are good at judging changes in their own productivity over time? This hypothesis could be tested in a separate study under more controlled conditions. Hmmm…..
(This comment was a little more rambling-y then anticipated when I started).
Re: “Site5 only allows one shell account per domain, which makes it difficult to open up the project’s Subversion repository to other contributors.”
Let me be the first to suggest BitBucket! (Wait, am I _actually_ the first to suggest a DVCS? Woah. Wolever and Balogh must be slipping.
Seriously, though, if you go to one of the DVCS hosts, you can take contributions from anyone in the world, and even if your code weren’t under a permissive license, you could still use BitBucket and have Software Carpentry be your one private project, thus restricting the set of people allowed to contribute.
I’ld be happy to walk you through getting it all set up, or through my setup if you wanted to see what it’s like before doing a test run.
Also, this is a good week to start biking again.
Later,
Blake.
@Blake Wolever actually got there first — and yes, I could use Google Code, GitHub, etc., but a lot of scientists can’t (their work or their data isn’t open), so if I can’t figure out how to set this up for my own project, what am I supposed to tell them to do? And yes, this is a good week to start biking — I’m hoping to buy a new machine today.