Archive

Archive for January, 2010

Quiet Time

January 12th, 2010

Two grad students in our department bought white lab coats over Christmas. When they wear them, it means that they’re “doing science”, and are not to be interrupted. I’d buy one too, except almost all my interruptions are auto-generated: no matter how hard I try, I just… can’t… stop myself from checking email, reading blogs, et cetera.

Which is what makes the Web 2.0 Suicide app so interesting. You give it your credentials for Facebook, Twitter, and what-not, and it erases all the content you’ve created, unfriends/unfollows everyone, then sets your password to something random so that you can’t even log in again. Sure, you could create another account, but will you? Or will you, I dunno, go read a book or see a movie or talk to someone instead?

I’m not ready to go that far (translation: I’m too weak), but what if it wasn’t quite so drastic? What if instead of wiping everything out, it just re-set your password and didn’t send you the new one for some specified period of time? If I could block myself for a few hours at a stretch, or maybe even a day or two, I’d get a lot more done. Or start juggling again, one or the other…

Uncategorized

Google and China

January 12th, 2010
Comments Off

“We have decided we are no longer willing to continue censoring our results on Google.cn, and so over the next few weeks we will be discussing with the Chinese government the basis on which we could operate an unfiltered search engine within the law, if at all. We recognize that this may well mean having to shut down Google.cn, and potentially our offices in China.”

From http://googleblog.blogspot.com/2010/01/new-approach-to-china.html.

Uncategorized

Two Thumbs Up, One Thumb Down

January 11th, 2010
Comments Off

Three recent papers:

  1. Sahoo, Criswell, and Adve: “Towards Automated Bug Diagnosis: An Empirical Study of Reported Software Bugs in Server Applications”.  Looked at bugs reported in six large web server apps, and discovered that most could be reproduced deterministically by replaying just a few recent inputs (in most cases, just one). Over 60% of bugs resulted in silent data corruption, which means that adding more internal consistency checks and assertions would help weed them out; only a handful were non-deterministic or timing based. Upshot is, keeping a log of the last few requests and saving that when a bug crops up has a good chance of helping developers localize the bug quickly. Excellent piece of empirical research.
  2. Gutiart, Torres, and Ayguadé: “A survey on performance management for internet applications”. Summarizes published results on request scheduling, admission control, dynamic resource management, service degradation, and other approaches, both empirical and theoretical. A good map of the terrain.
  3. Demsky and Lam: “Views: Object-Inspired Concurrency Control”. The idea is that developers define one or more views that describe which method(s) of a class can safely be run concurrently with which others; a clique-finding algorithm then automatically generates the locks and locking calls required to ensure safety. Nice idea, but as with so many papers in programming languages, there’s no empirical validation: do real programmers find this comprehensible? Does it make coding easier than [name of alternative goes here]? Does it lower error rates? Etc. Most papers on tools and methods at ICSE now include some kind of empirical study, even in their early stages; here’s hoping the practice spreads to programming language design.

Research

More Public Embarrassment About Workflows

January 10th, 2010
Comments Off

Thanks to everyone for the comments on my recent post about web workflows and public embarrassment. I have two of my own to add:

  1. I’ve looked at tools like Selenium, but making them do what I want is more effort than it’s worth. For example, when a new student joins UCOSP, I have to go to the Google map that shows where participants are from and add her (or him) to the pointer for her (or his) school. That means parameterization, ‘switch’ logic, and pattern matching on strings, all of which I’d be happy to do in Python (but only if there was a sandbox in which to test my script, which web applications don’t provide).
  2. I’ve run into a similar frustration (at least, it feels similar to me) with iTunes. Djole’s Indiscretion is one of my favorite albums, but when I import it, iTunes decides that it’s actually a mid-80s recording of the Brandenburg Concertos. I presume this is because some identifier in the album data is being mis-matched to a database, but what actually bothers me is how hard it is to override. There doesn’t seem to be a way to say “this whole album is actually over there, you silly mis-interpreted assemblage of bytecodes.”

What ties these two cases together is the notion of computational thinking. Almost by definition, novices (in any domain) don’t know enough to have “gut instincts” about how easy things ought to be, or to come up with plausible diagnoses when things go wrong. Someone with a few years of experience, on the other hand, can look at most problems and say, “OK, it should be easy to do,” and to have some notion of what the fix could be. I personally believe that the only way to develop those instincts for computational tasks is to actually program—I don’t believe that anything worth calling “computational thinking” can be acquired in any other way. (See here and here for earlier discussion of this point.)

Uncategorized

Code vs. Messages

January 10th, 2010

I used to keep track of the numbers of lines of code I’d written per day, and (when I was working on a book) the number of words written or deleted as well. These days, I count emails sent and received, which is a pretty good reflection of how I’ve “matured”. Here’s what 2009 looked like:

email

Uncategorized

Projects This Term

January 7th, 2010

Along with the cross-country capstone projects I’m coordinating this term, I’m also setting up six projects for the students in my CSC302 software engineering course (the first four of which I mentioned in an earlier post):

  1. Adding pivot tables to Gnumeric.
  2. Upgrading PyLint.
  3. Converting the Selenium IDE to a plugin architecture.
  4. Improving the SpatiaLite GIS extensions for SQLite.
  5. Porting Django to Python 3.
  6. Helping with ILUTE (the Integrated Land Use and Transportation Engineering tool).

10-11 students will be working on each; it promises to be an exciting term.

Teaching

A Broken Pledge

January 7th, 2010

Well, it lasted six months and a bit—after promising not to fly for a year, I broke down and got a plane ticket yesterday to get to Atlanta in February for PyCon 2010. I could have gone by bus or train, but it’s 22 hours or more each way: by comparison, a train from London to Bologna (almost exactly the same distance) is 14 hours and change, and less than half as much with a sleeper than the equivalent ticket on VIA + Amtrak.  I can’t justify the carbon load of two high-altitude flights, but I can’t justify two extra days away from my family either… *sigh*

Uncategorized

Aranda on SEMAT

January 7th, 2010
Comments Off

The January 2010 issue of PragPub has an article by Jorge Aranda critiquing the SEMAT initiative. Good on ya!

Uncategorized

Changing Gears

January 7th, 2010
Comments Off

As some of you already know, my contract with the University of Toronto runs out this spring, and I have decided not to seek renewal. I’ve learned a lot in this job, and had a chance to work with some great people, but it’s time for new challenges.

What I’d most like to do next is spend a year working full-time on the Software Carpentry course—of all the things I’ve done, it’s the one that I think has the most potential to make scientists’ lives better. My goal is to raise approximately CDN$25,000 from each of half a dozen sponsors so that I can reorganize and revamp the content, add screencasts and video lectures, and generally drag it into the 21st Century. An abbreviated proposal is included below the cut—if you or anyone you know would be interested in discussing possibilities, please give me a shout.

Read more…

Software Carpentry

The Design of Fossil

January 7th, 2010

Partly in response to my post about building something Fossil-like on a NoSQL data store, Richard Hipp has written a brief discussion of Fossil’s design that tackles two questions:

  1. Why is Fossil based on SQLite instead of a distributed NoSQL database?
  2. Why is Fossil written in C instead of a modern high-level language?

His answer to the first is that Fossil is a NoSQL database—its use of SQLite to store metadata and other stuff is an implementation detail. His answer to the second is, “Fossil does use a modern high-level language for its implementation, namely SQL.” He goes on to say:

Much of the “heavy lifting” within the Fossil implementation is carried out using SQL statements. It is true that these SQL statements are glued together with C code, but it turns out that C works surprisingly well in that role. Several early prototypes of Fossil were written in a scripting language (TCL). We normally find that TCL programs are shorter than the equivalent C code by a factor of 10 or more. But in the case of Fossil, the use of TCL was actually making the code longer and more difficult to understand. And so in the final design, we switched from TCL to C in order to make the code easier to implement and debug.

I’m sceptical: I earned my living as a C/C++ programmer for almost 15 years, but believe these days that other languages give better bang for the buck in almost all cases. On the other hand, Richard has shipped much more high-quality software than I have. I wish I had time to dig into this deeper… *sigh*

Uncategorized