Monthly Archives: January 2005

Puppy-Driven Computing

From Todd Veldhuizen:

Babbage never succeeded in having his analytical engine finished. However, a father-and-son pair of Swedes, Georg and Edvard Scheutz, succeeded in building a working difference engine in 1843. It was clear that the drive toward computer miniaturization had already begun: Babbage’s design was intended to be driven by a steam engine, but the Scheutzes’ “…does not require great power, and the machine could be kept in motion by a little dog, of the kind that is used in England as motive power for roasting-spits.”

The mind, it boggles…

Source: M. Lindgren. Glory and failure: the difference engines of Johann M�ller, Charles Babbage and Georg and Edvard Scheutz. Stockholm papers in history and philosophy of technology 2017, Dept. of Technology and Social Change, Link�ping University, Link�ping, Sweden, 1987.

PyWebOff at Pycon / Extensible Programming Mailing List

*ahem* I have two announcements:

1) Miles Thibault, a former 49X students the University of Toronto, has set up a mailing list devoted to extensible programming. Everyone with an interest in next-generation programming systems is invited to join in.

2) Michelle Levesque, another former 49X student, will be presenting her comparison of Python web programming frameworks at PyCon 2005.

Thanks, Miles, and congratulations, Michelle.

Contributing to Open Source

When I first encountered open source years ago the idea of contributing really excited me. Unfortunately, I found little success in my attempts to help out. I have always wondered why that was, and now I think I know.

I first ran into Biopython when I started working in the Botany Department at U of T. Perl is the language of (no) choice for bioinformatic projects simply because the BioPerl modules are so mature. Perl had always left a bad taste in my mouth though, so I found myself looking for alternatives. When I investigated Python, I found a Biopython project had already started and was playing catch up to BioPerl. The problem was of course that Biopython was still quite a bit behind its older brother. I decided to bite the bullet and use it in my work anyway.

As time progressed, I realized that I was often circumventing Biopython due to its incompleteness. In one case, I found the Biopython architecture for calling external applications essentially unusable. The problem being that there was no way to find out the applications return code. I ended up calling the application myself, and then tricking Biopython into using the data as if I had used the framework. Eventually I began to think my solution was unmaintainable and set out to fix the problem in Biopython. Turns out the solution was actually a lot simpler than I thought and a heck of a lot easier than maintaining my current hack. A few posts to the Biopython list later, and my 10 line patch to provide this functionality had been entered into CVS.

What was most interesting to me about the process is that it was so easy. My previous attempts to contribute to open source generally ended with me reading Slashdot. No doubt this was partly because I had set my sights a little high, but I think it was also because I did not have a problem that I really needed to solve. In this case, I had to figure out my applications return code, and I did. So if someone asked me now “How do I get involved in open source?”, I might suggest that working on problems in the software you use daily is a better use of time than checking Sourceforge’s help wanted list.

Why I Think XP Works

I gave a talk at PyGTA last night on what we’re doing to integrate Python into the undergraduate curriculum at the University of Toronto, and what I’ll be doing with my PSF grant to promote Python in science and engineering. I was pleased at how many people turned out, given the weather, and thought everything went well…

…someone asked what I thought about Extreme Programming (XP). It was late, I was tired, so I ran off at the mouth a little. (I do that.) Here’s what I should have said:

  • I’ve never used XP on a real project, so take my opinions with a large grain of salt.
  • That said, I’m sceptical of the claims made about it, partly because they fly in the face of my personal experience, but also because the Scot in me instinctively mistrusts anything that’s hyped as hard as XP has been.
  • Most importantly, I don’t think that the particular practices that make up XP (pair programming, on-site customer, stand-up meetings, etc.) are the real reason it’s successful. The reason is that teams that adopt diametrically opposed methodologies, like CleanRoom, also see their productivity go up. One possible explanation is that common practice is the worst of all possible worlds, and any change at all would be an improvement. (There are days when I believe this.) A more likely explanation is that what really matters is deciding that you want to be a better programmer. If you make a sincere commitment to that, then exactly how you get there is a detail. It’s kind of like dieting: Atkins, South Beach, macrobiotic, seasonal, or fruitarian is secondary to being sincere about eating better and exercising more.

This hearkens back to a point I made a couple of times during my talk. It’s easy to make students jump through hoops in a course. What’s hard is convincing them that jumping through those hoops after the course is over really will make their lives better. The best way I’ve found so far is to bring in experienced programmers who are doing exciting things, and have them say, “Comments, version control, test-driven development…” I’d be interested in hearing what other people do to make the case.

Interviewing at Google

A web-friend of mine just interviewed for a tech lead position at Google. Here’s a (slightly tidied up and anonymized) version of their experiences:

Most of my work, at least at the start, should be in “production software”–googlese for the software that helps keep Google’s amazingly huge distributed system running smoothly and seamlessly, and is mostly Python though with ample helpings of C++ here and there and a little bit of Java where integration is needed with some Java-centric application server (e.g. to serve google-ads on sites using such servers).

Plenty of “sideshows” doing such things as statistical analysis and data mining on the huge wealth of data Google collects, maybe giving [name deleted]‘s team a hand in data-quality assurance, etc, etc. Plus, every Google techie is supposed to use 20% of his time working on his or her own pet projects which might become Google’s Next Big Thing—that’s how gmail was born.

The selection process is grueling—multiple rounds of phone interviews where they ask you (depending on the fields of expertise you claim) everything from what’s 210, to how you would tweak bits in C to find out if a machine’s stack grows up or down in memory, all the way to having you “program on the phone”… then all of a sudden they rush you to Silicon Valley and you get a long full day of nonstop interviewing. I didn’t quite ace mine because I hadn’t thought of cramming on TCP/IP fundamentals, so I didn’t remember which bits are on in the three packets of the handshake (it’s SYN, SYN+ACK, ACK—I could have worked it out, but not jetlagged and after about 6 hours’ interviews ;-) .

I made up for that when they had me program at the whiteboard a C++ implementation of unbounded precision multiplication; I did a test-driven implementation of the trivial routine with std::vector<digit> containers, then did some handwaving about the Karatsuba algorithm (far too hard to implement standing up at a whiteboard, of course ;-) and could sense I had struck lucky… The guy interviewing me at that time had never really done unbounded precision computation work (at least not implementation of high-quality libraries for it), so by just opening the door a crack to the huge and mathematics that underlies that field (in which I had the good fortune to dabble a bit — a byproduct of my interests in combinatorial arithmetic) I had apparently exceeded expectations.

Lots of back-of-envelope computation and the like, too. A friend of mine thought he was doing well in his second Google phone interview when asked to sketch a way to compute bigram statistics for a corpus of a hundred million documents—he had started discussing std::map<std::string> and the like, and didn’t get why the interviewer seemed distinctly unimpressed, until I pointed out even if documents are only a couple thousand words each, where are you going to STORE those two hundred billion words—in memory?! That’s a job for an enterprise-scale database engine!

So, at least as far as the interviewing process goes, it seems designed for people with a vast array of interests related to programming, computation, modeling, data processing, networking, and good problem-rough-sizing abilities—I guess Google routinely faces problems that may not be hugely complex but are made so by the sheer scale involved. I can just hope the actual day-to-day work is as interesting, fascinating and challenging as the interviews were—but from all I hear, it probably is. And they have bar-quality espresso machines in rest areas… ;-)

Extensible Programming Slashdotted (Unfortunately)

My ACM Queue article on extensible programming systems just got slashdotted. Once again, it’s clear that most of the posters haven’t bothered to read the article: even the headliner seems to think that I believe programmers will all be typing XML tags five years from now.

The article’s real point is that the next revolution in programming will not come from aspect-oriented languages or new ways of expressing concurrency; it will come from extensible languages—syntactically and semantically extensible, just like Common Lisp and Scheme. These languages will require us to turn programming tools into extensible frameworks, which will in turn finally force us to adopt the model/view separation that we’ve been telling the rest of the world to use for the last twenty years. For historical and marketing reasons, those models will probably be stored as XML, but programmers won’t look at the tags any more often than they look at assembly code.

Lots of people are already working on systems of this kind; just this week, for example, I came across the Proxima editor, a generic presentation-oriented editor for structured documents which does many of the things I think the next generation of general-purpose editors will have to do. Who knows? Maybe the Slashdot article will turn up a few more like this… And maybe Toronto will win the Stanley Cup this year.

Postscript: I was very pleased to see Jon Udell include a mention of the article in his blog. Some of the links from his post look very interesting; looks like I’ve got some more reading to to…

Next-Generation Communication and Software Engineering

A couple of years ago (summer of 2003, actually), I noticed something that’s been nagging me ever since. When I log into a computer, the first thing I fire up is email. When my students log in, the first thing they run is an instant messaging client. They keep IM running all the time, just as I keep my email client up, and whenever they run into something they don’t understand, they’re more likely to ask one of their buddies than to ask Google.

Now, I don’t know how this is going to change the way people develop software, but I know it will, just as email did. Just today, for example, I:

  • asked a guy in Germany whether the documentation for his open source library was accurate;
  • searched the archive of the Python Developers’ List to remind myself of something I worked on in 2000;
  • sent a stack trace to another developer in Boston;
  • arranged meetings with three different teams;
  • apologized for missing a meeting with a fourth team;
  • triggered a complete system build by sending email to a daemon process running on a machine in B.C.; and
  • received a reply telling me that some code I’d checked in didn’t compile on BSD (missing header files…).

None of this would have happened if my generation didn’t take email for granted: it might have been technically feasible, but we just wouldn’t have thought to do it. The teenagers and twenty-somethings in my class take IM for granted; the only thing standing in the way of them being really creative with it is that dinosaurs like me still make the rules. What will software development projects feel like ten years from now, when they’re in charge? If anybody knows of creative uses of IM in software development, I’d be grateful for pointers.

Python, Typing, and the Scientific Spirit

There’s been a minor blog storm over the last few weeks about Guido
van Rossum’s proposal to add optional type declarations to Python [1]. Guido believes it will help catch errors before
code is run (or in sections of code that aren’t exercised by unit
tests), but other people say no, the extra clutter and complexity will
spoil Python’s clean lines.

The problem is, neither side has any real data to back up their
arguments. Will optional static typing catch 1% of errors? 10%?
50%? 90%? And how cost-effective will it be? If it takes twice as
long to write code, but 50% of errors that would otherwise not show up
until run-time are caught on load, is that a net win or not?

Four and a half years ago, when Python’s developers were arguing
over the syntax for multi-list iteration, I ran an
experiment
to find out how well users would understand some of the
proposals. In the same spirit, I’d like to see the advocates and
opponents of optional static typing put their heads together and
design an experiment to gauge its costs and benefits. I’d be very
happy to run that experiment here in Toronto, and I’m sure others
would do the same in their user communities. Best case, the results
convince all but a few die-hards that it is or isn’t worth doing.
Worst case, figuring out how to tell if optional static typing is a
win or not will clarify the debate, and we’ll all have the
satisfaction of knowing that we at least tried to be scientific about
programming language design.

[1] See these two
articles
and many follow-ups on the Daily Python URL.