Archive

Archive for April, 2008

A Rare Triple

April 12th, 2008

I can’t remember the last time I read three books in a row that I really liked:

Of course, this means I have a stack of technical books waiting for me;  I hope at least one is as good.

Books

Feature List

April 11th, 2008

I skipped an important step in my previous post: I wasn’t explicit about the features something had to have to qualify (in my mind, if no one else’s) as a “software project portal”. Here are my current thoughts, sorted in order of importance.

  1. Identity management and access control (i.e., accounts, privileges, and all that jazz).
    • Bonus marks if privileges are organized into roles for easier administration.
    • More bonus marks if administrators can define new roles.
    • Still more if it integrates with OpenID or the like.
  2. Issue tracking (a.k.a. “ticketing”), or some other kind of to-do list.
    • …with a query interface that lets users filter and sort.
      • Bonus marks if F&S settings can be saved and shared, so that users can set up things like “all the high-priority items assigned to me that are due within a week”.
    • More bonus marks if fields in tickets can be customized without rewriting any code.
    • Still more bonus marks if this is combined with a calendar so that users can group tickets by release or due date.
      • DrProject‘s “milestones” have been inherited from Trac, and are just as lame.
  3. Version control repository browser (browser only because there’s no safe and easy way to do a commit through a browser).
  4. Mailing list management (even though Trac doesn’t have this, and I still consider it a portal).
  5. A wiki.
    • Marks taken off if the wiki’s grammar doesn’t have shortcuts for linking to tickets, code revisions, mail messages, and the like.
  6. Cross-component search.
  7. A plugin or extension system.
  8. Some kind of over-the-web API (REST, SOAP, or RPC doesn’t matter, just so long as it allows remote scripting, automation, and integration).
  9. An integrated view of the project’s history.
    • This should be available as an RSS feed as well as in the browser.
    • Users should be able to get more immediate notification of selected events (e.g., email when tickets they care about are created or changed).
    • There should also be a separate, more detailed event log for auditing and administrative purposes.
    • Plus charts and other analytic tools.
      • We’re going to try again this summer to build something like this for DrProject.
  10. Project blogs.
  11. Continuous integration.
    • Most systems rely on external tools like CruiseControl, in which case this becomes “integrate with continuous integration”.
  12. Test case management.
    • Actually, most don’t bother to do this separately (Rally being a notable exception). I’m still trying to figure out why nobody has integrated with FitNesse yet…
  13. Requirements management.
    • A lot of systems, including DrProject, just use tickets for this, but Mingle and other agile-oriented portals support user story cards or the like.
    • As an aside, I think it’s pretty telling that nothing smaller than ClearCase offers, or even integrates with, a “classic” requirements management tool like DOORS.
  14. Time tracking.
    • This comes near the bottom of my list because I believe most people input random numbers.
  15. Integration with IRC, instant messaging, VoIP, and other communication tools.
    • Anyone?
  16. Support for internationalization and localization.
  17. Forums or some other kind of bulletin board system.
    • I personally think this is redundant given mailing lists and a wiki, but a lot of portals offer them, and we regularly get requests to add them to DrProject.

One more consideration is that a portal should be installable, rather than a hosted service: not every project is open source, and many universities aren’t allowed to store student information out of their own jurisdiction.

DrProject

Alternatives to DrProject

April 11th, 2008

We’re hoping to release a new version of DrProject next week, and persuade some Trac users to upgrade. (Multiple projects! Mailing lists! Role-based access control! Scripting interface!) This is therefore a good time to take a fresh look at what other systems offer:

SourceForge: not the first web-based software project portal, but certainly the best known and (probably) the most widely used; not free, and too big for most student projects and startups (though there are lots of cases of both using it).

Google Code: much smaller, but growing fast; only available as a hosted service (which rules it out for course projects in many jurisdictions, and for companies that want to keep their software behind their firewall).

Trac: probably the most popular entry-level open source system; this is what we forked DrProject from, and what we’re hoping to supercede.

Mingle: a relatively new offering from ThoughtWorks specifically aimed at agile projects (and lovers of sticky notes everywhere). Very attractive, but not open.

Rally, VersionOne, ScrumWorks, TargetProcess, and Acunote: same story as Mingle.

OpenProj: an open source alternative to Microsoft Project, available both on the desktop and as a service.

XPlanner, ExtremePlanner, ProjectCards, XPStoryStudio, PlanningPoker, and Plan B: all target agile processes, but lack some or all of the features of an all-purpose portal.

Perforce: my favorite version control system, which also has simple task management, but not the rest of the features a team needs in a portal.

ClearCase: a configuration management tool rather than a portal; definitely not something to inflict on a small team (or a large one, for that matter).

Jazz: “an IBM Rational project to build a scalable, extensible team collaboration platform for integrating work across the phases of the development lifecycle.” Slightly smaller than Greenland, and not yet finished; definitely not for student teams.

So, what have I missed?

DrProject

It Went Well

April 10th, 2008

Yesterday’s consulting course showcase went well: lots of visitors, lots of noise, lots of fun.  David Wolever was there to take pictures…

9398large.jpg

9415large.jpg

9416large.jpg

9426large.jpg

9449large.jpg

9486large.jpg

9521large.jpg

9549large.jpg

9514large.jpg

Teaching

Three Studies (Maybe Four)

April 10th, 2008

We’re in the thick of picking students and projects for Google Summer of Code, which has inspired some less-random-than-usual thoughts. Here are two studies I’d like to do (or see done):

  1. What has happened to previous students? How many are still involved in open source? How many have gone on to {start a company, grad school, prison}? What do they think they learned from the program? How much of the software they wrote is still in use? Etc.
  2. Every one of the 175 organizations blessed by Google this year is using the same web application for collecting and voting on projects. From what I can tell, they’re all using it in different ways: +4 means something very different to the Python Software Foundation than it does to Eclipse or SWIG. They’re also using a bewildering variety of other channels for communication: wikis, IRC, Skype chat sessions, mailing lists (the most popular), and so on. Why? Is this another reflection of Jorge Aranda’s finding that every small development group evolves a different process, but all those processes “work” in some sense, or is it—actually, I don’t have any competing hypotheses right now, but I’m sure there are some.

And while we’re on the subject of studies, I just read Hochstein et al’s paper “Experiments to Understand HPC Time to Development” (CT Watch Quarterly, 2(4A), November 2006). They watched a bunch of grad students at different universities develop some simple parallel applications using a variety of tools, and measured productivity as (relative speedup)/(relative effort), where relative speedup is (reference execution time)/(parallel execution time), and relative effort is (parallel effort)/(reference effort). The speedup measure is unproblematic, but as far as I can tell, they don’t explain where their “reference effort” measure comes from. I suspect it’s the effort required to build a serial solution to the problem, and that “parallel effort” is then the additional time required to parallelize; I’ve mailed the authors to ask, but haven’t heard back yet.

I wasn’t surprised when I realized that the authors hadn’t done the other half of the study, i.e., they hadn’t benchmarked the productivity of a QDE (quantative development environment) like MATLAB—many people talk and think as if scientific computing and high-performance computing were the same thing. At first glance, it doesn’t seem like it would be hard to do—you could use the performance of the MATLAB or NumPy code over the performance of a functionally equivalent C or Fortran program for the numerator. You have to be careful about the denominator, though: if my guess is right, then if things were done in real-world order, you’d be comparing:

time to write parallel code after writing serial code   time to write serial code from scratch

vs
time to write MATLAB from scratch   time to write serial code having written MATLAB

Even with that, I strongly suspect that MATLAB (or any other full-featured QDE) would come out well ahead of any parallel programming environment currently in existence on problems of this size. Yes, you need big iron to simulate global climate change over the course of centuries, but that’s not what most scientists do, and the needs of that minority shouldn’t dominate the needs of the desktop majority.

I’d also be interested in re-doing this study using MATLAB parallelized with Interactive Supercomputing‘s tools. I have no idea what the performance would be, but the parallelization effort would be so low that I suspect it would once again leave today’s mainstream HPC tools in the dust.

And now let’s double back for a moment. I used the phrase “desktop majority” a couple of paragraphs ago, but is that really the case? What do most computational scientists use? What if we include scientists who don’t think of themselves as computationalists, but find themselves doing a lot of programming anyway, just because they have to? If you plotted rank vs. frequency, would you get a power law distribution, i.e., does Zipf’s Law hold in scientific computing? Last term, I calculated a Gini coefficient for each team in my undergraduate software engineering class using lines of code instead of income as a raw metric; what’s the Gini coefficient for the distribution of computing cycles used by scientists (i.e., how evenly or unevenly is computing power distributed)? And how should the answers to these questions shape research directions, the development of new tools, and what we teach in courses like Software Carpentry?

Research, Software Carpentry

Cross-Platform PowerShell

April 10th, 2008
Comments Off

I knew it would happen sooner rather than later: there is now an open source reimplementation of Microsoft PowerShell that will run on all the “other” platforms. If you haven’t played with PowerShell, it’s the coolest thing to happen to coding in a long time; there’s also the Hotwire hypershell (written and scripted in Python), and I’m sure more will come along.

Extensible Programming

Global Intelligence

April 9th, 2008

I read George Dyson’s Darwin Among the Machines way back in 1998. Its subtitle is The Evolution of Global Intelligence; in it, Dyson tries (I think—I was never quite sure) to give both a history of the idea that somehow we are evolving toward some collective species-wide mind, and a glimpse of what that mind might look like.  Nothing in it quite lives up to this quote from the introduction:

In the game of life and evolution there are three  players at the table: human beings, nature, and machines. I am  firmly on the side of nature. But nature, I suspect, is on the  side of the machines…

but it’s still a thought-provoking read.

I mention this because of this story over at Nature: the UN High Commission for Refugees is creating an overlay for Google maps so that people can see in near-real time what’s going on in places like Darfur.  If self-awareness is one of the hallmarks of intelligence, things like this might count as steps toward that global mind Dyson was writing about.  We may have less personal privacy than ever before, but it’s also harder and harder for us to bury our camps, gulags, and torture chambers in our collective subconscious and pretend we don’t know about them.  Progress, of a kind…

Uncategorized

Morning Routine

April 9th, 2008
  1. Log in to my desktop Windows box.
  2. Start a script that does “svn up” in all the repositories I care about (currently 40+, as I’m watching a lot of student projects).
  3. Start up Thunderbird (it takes a while to load).
  4. Log in to Google Calendar to see what my day looks like.
  5. Do a quick pass through email (typically 50-60 non-spam messages overnight): everything is skimmed and either deleted or moved to my “action” folder.
  6. Check the output of the “svn up” script to see if there’s anything I should worry about.
  7. Log in to this blog to approve or delete comments.
  8. Check Google Reader to see what the rest of the world is doing.
  9. Log in to Facebook to check on my Scrabulous games.  (It’s a drug…)
  10. Go through my “action” folder from the top; if possible, deal with one holdover issue before tackling any of today’s.  (This morning, for example, I booked a flight to Quebec City.)
  11. Make up a sticky-note-sized list of things I need to do today (some taken from the “action” folder, some not).

If I’m in at 8:15, this is done by 9:00, and I’m ready to face my day.  What’s your routine?

Uncategorized

Our Own Little DemoCamplet

April 8th, 2008

Tomorrow (Wednesday, April 9, 2008), from 1-3 pm, the students in my consulting course will be giving their end-of-term demos upstairs at Molly Bloom’s on College Street. This brochure describes what they’ve been doing—there’s quite a range, and I hope everyone will find some reason (other than the beer) to attend.  (You can also check out their end-of-term videos, which are linked from the course web page.)

As we wrap up the term, I’d like to thank the following people for giving the students their projects and their time:

AdMap Kristan Uccello
ClearCanvas John Adziovsky and Norman Young
CT Surgery Mike Daly
Condor Peter St. Onge
Feature Diagrams Michael Feathrs
Firebreak Placement David Martell
Go Go Kayaks Elsa Marziali
GPU Fluid Flow Scott Brigs
Realistic Sand Andrew Clinton and Jeff Lait
Jabber David Janes
Jazinga VoIP Shidan Gouran
Modeling Budworms Josie Hughes
OLPC Touchpad Mike Fletcher
One Week Out Katrin Lepik and Oshoma Momoh
SlashID Zeev Lieber
Slidy Dave Raggett
SpaceFX Phil Hassey
Spatial Cognition Jing Feng and Ian Spence
Thunderbird David Ascher
weMap Marco Campana, Aimee Holmes, Alfonso Lorenzana, Bonnie Mah, Dave Montague, and Jane Zhang
Wii Molecular Viz Ryan Lilien
WikiStats Michael Terry
Automating Lab Workflow Jessica Wong

I’d also like to thank David Wolever for tech support, and Glen the bartender for always being happy to see us. It’s been a good term; I’m proud of what we’ve accomplished.

Teaching

Summer of Code Applications Are In

April 8th, 2008
Comments Off

According to Leslie Hawthorn, Google received over 7000 applications for this year’s Summer of Code. I’m pleased to report that 19 of them came from 14 U of T students, and that 7 students from elsewhere in the world put in proposals for projects we’re mentoring or otherwise involved in. We’ll find out April 21…

Uncategorized