Archive

Archive for April, 2005

You and Your Research

April 29th, 2005
Comments Off

Richard Hamming was one of the early greats of information science.
After working on the Manhattan Project at Los Alamos, he spent thirty
years at Bell Labs; he received the ACM Turing Prize in 1968, and in
1987, the IEEE named its Hamming Medal after him.

In 1986, he gave a lecture called “You
and Your Research”
, in which he talked about what makes great
researchers great. A few passages show signs of old-fashioned views,
and others are frankly egotistical, but for the most part, it’s a very
thought-provoking look at what you can (and have to) do if you really
want to make an impact.

Among points like “Great researchers are comfortable with
ambiguity” and “Great researchers know how to sell an idea” was one
that particularly struck me:


If you do not work on an important problem, it’s unlikely you’ll do
important work… Great scientists have thought through, in a careful
way, a number of important problems in their field, and they keep an
eye on wondering how to attack them. Let me warn you, ‘important
problem’ must be phrased carefully. The three outstanding problems in
physics, in a certain sense, were never worked on while I was at Bell
Labs. By important I mean guaranteed a Nobel Prize and any sum of
money you want to mention. We didn’t work on (1) time travel, (2)
teleportation, and (3) antigravity. They are not important problems
because we do not have an attack. It’s not the consequences that
makes a problem ipmortant, it is that you have a reasonable attack…
When I say that most scientists don’t work on important problems, I
mean it in that sense. The average scientist, so far as I can make
out, spends almost all of his time working on problems which he
believes will not be important and he also doesn’t believe that they
will lead to important problems.

and later:


Many great scientists know many important problems. They have
something between 10 and 20 important problems for which they are
looking for an attack. And when they see a new idea come up, one
hears them say, “Well that bears on this problem.” They drop all the
other things and get after it.

In 1995, after completing a Ph.D., doing post-docs in several
countries, and writing a book on parallel programming, I decided that
I wasn’t cut out to be a researcher. I was pretty sure that I could
jump through the hoops required to get a tenured position somewhere or
other, but the idea left me cold. As far as I could tell, the whole
point of being a professor was to think Big Thoughts. Since I didn’t
seem to have any, I felt I should go and do something else.

Ten years and several micro-careers later, I think I’ve finally
figured out a way to find big ideas (and important problems):

  1. Look at how people (especially people in their teens and twenties)
    are actually using computers.
  2. Draw up a list of things that software developers find
    frustrating, time-consuming, or error-prone.
  3. See if anything from the first list can be used to solve problems
    in the second.

For example, a growing number of students are using SubEthaEdit to take
notes collaboratively
during lectures. Questions:

  1. How do editing patterns compare with those of multi-author
    wikis? Classroom notes are taken in real time, by people who are
    more likely to have direct contact; will we see the same patterns
    of collaboration and competition that researchers have found at
    Wikipedia and elsewhere?
  2. Are notes taken this way more useful to students? To
    instructors (as feedback on what students are actually getting out
    of lectures)? Either way, how can we enhance or customize
    collaborative editors to improve the student experience?
  3. What other tasks can collaborative note-taking be applied to?
    Steve Easterbrook
    suggested using it in requirements analysis sessions, so that
    customers could see the analyst’s impression of what they’d said
    evolving on a shared screen; how well would that work in
    practice?

Here’s another: a lot of empirical research in social
network theory
analyzes intra-group email traffic to discover who
actually shares information with, or makes commitments to, whom. Now,
almost everyone has some kind of spam filter set up on their mailbox.
Suppose you were to compare the filter settings of group members:
would you find role-related patterns, e.g., that people in QA are
reading and ignoring the same kinds of things? Could you
automatically uncover common interests that group members might not be
aware of? I don’t think it would help much on small projects, but
what about the Windows development group (several hundred people) or
IBM’s DB2 group (ditto)?

Closer to home, Modern IDEs, like Eclipse, include refactoring
tools to help programmers rearrange and clean up their code.
Recently, researchers at the University of Colorado have
taken to recording refactorings of one body of code, and replaying
them
against other code. If I modify a library’s API, for
example, you can take what I did, and apply it to your application, to
bring your app into line with my new API. So, suppose you have two
pieces of code “A” and “B”; can you use heuristic search to turn “A”
into “B” using only well-defined refactorings? If so, I can think of
several applications:

  1. When a student hands in an assignment, run the tool in order to
    provide marking assistance: “Class XYZ ought to be split, and method
    M made abstract, in order to conform with the instructor’s
    solution.”
  2. When looking at two snapshots from a version control repository,
    see if you can reverse engineer a sequence of refactorings to
    account for the changes, in the spirit of Parnas and Clements’
    famous paper “A
    Rational Design Process: How and Why to Fake It”
    .

I’m also interested in the fact that a growing number of software
development teams use some kind of web portal to manage their
projects. SourceForge is the most
famous of these, but there are many others. Each one combines a
version control browser with mailing lists, bug tracking, blogs,
release management, and other collaboration tools. So far, this stuff
isn’t part of the standard undergraduate curriculum, but Karen Reid and I are hoping
to change that by modifying Trac to provide the
features that we need to run courses. Once we do that, we’ll have a
way to collect data on how students actually do group assignments. We
were surprised in the summer of 2004 that we couldn’t
find
any correlation between the way students used CVS
repositories in a second-year course, and the grades they were given.
So:

  1. What’s the correspondence between student use of the web portal,
    and how students actually program?
  2. What are the differences between the way students use
    collaborative tools, and the way professional programmers
    (particularly those working on open source projects) use them?
    Should we try to close that gap? If so, how?

Last but not least are three accelerating trends:

  • giving programmers more ways to express abstraction in
    programs;
  • building applications as extensible frameworks; and
  • using XML-based markup, rather than arbitrary text formats, to
    store data.

I believe the logical endpoint of this convergence is extensible
programming systems
, in which “programs” are mixed-media
representations of application code, meta-code for tools such as
compilers and debuggers, and meta-data such as class diagrams and
pictures of the dev team. Pretty much everyone else is already
there—just take a look at what’s into Word documents, CAD diagrams,
or the web site of your favorite band. Sooner or later, programmers
are going to join the future too, which opens up a host of research
problems.

If you’re interested in pursuing any of these, or already are, I’d
enjoy hearing from you.

Uncategorized

Time Travel

April 26th, 2005
Comments Off

Alper Ozdamar, a former 49X student, forwarded this link to an article by Hans Moravec on time travel and computing. Fun to read, if you don’t mind a little brain-ache ;-) .

Uncategorized

Data Crunching

April 25th, 2005

At the risk of turning this blog into adware, my book on Data Crunching is now available from Amazon, or directly from the Pragmatic Programmers. Personally, I think it’s beautiful:

data-crunching-cover-large.jpg

Writing

I’m So Glad We Had This Time Together…

April 25th, 2005
Comments Off

We had our end of term dinner this past Saturday. Good food and good company—makes it all worthwhile.

Uncategorized

Book Sales as Tech Trend Indicator

April 24th, 2005
Comments Off

An interesting piece by Tim O’Reilly on trends in book sales, and how they reflect (or forecast?) trends in technology. Interesting that Python book sales are now 2/3 of Perl sales, though both are down in absolute numbers.

Uncategorized

We’re Not Just Shaping the Future…

April 17th, 2005
Comments Off

…we’re rewriting the past. I don’t suppose there’s any chance they’ll stumble over Season Two of Firefly somewhere in there? No? Pity…

Uncategorized

New Favorite Web Site

April 15th, 2005

Perceived Usefulness

April 14th, 2005
Comments Off

Sam Ruby has posted a slide set from a recent talk on open source. It includes this:

Larry’s Psychological Conjecture:

For normal people, the perceived usefulness of a computer language is inversely proportional to the amount of theory the language forces you to learn.

Is this why Perl, Python, and Ruby have succeeded where Scheme, Haskell, and ML did not?

Uncategorized

Mapping Human History

April 13th, 2005
Comments Off

Carl Zimmer’s excellent blog The Loom has a report about Spencer Wells’ project to map historical human migrations using genetic sampling. (It also has a link to this interactive map of our species’ history.) Zimmer says, “You can buy a DNA kit, and hen you send it back to the Genographic Project, you’ll get a report on ‘your genetic journey’ and the information will get added to Wells’s database.” Gotta love that—”you can buy a DNA kit” has apparently moved from being science fiction to being as unremarkable as picking up a liter of milk. Wonder what would happen if we organized students at the University of Toronto to take part?

Uncategorized

Agile Commenting

April 3rd, 2005

The new hype in software development is Agile Programming Languages. These languages try to speed development time by ignoring traditional principles such as static typing. Unfortunately despite these advances, certain instructors still insist on the tradition of commenting code.

Enter Agile Commenting in the form of the Commentator. What better way to make that last minute code cutoff?

Uncategorized