Monthly Archives: August 2004

Issue Tracking Systems

One of the students who’s going to be working on Hippo this fall sent a message to the group asking for everyone’s MSN IDs, so that the team could set up group chats. That got me thinking about how many different ways there are to communicate electronically these days, and about how to use them to manage a small, part-time software development team.

First, some definitions. Targeted communication is aimed at a particular receiver, or set of receivers, while untargeted just puts content in a well-known place for anyone to read. Email and instant messaging are generally targeted, while newsgroups and wiki pages are untargeted; an always-on IRC channel is somewhere in between. Archived communication “sticks around”, i.e. can be re-read or searched at a later date, while transient communiation evaporates. As with the previous category, the dividing line is not a clear one—newsgroups feel archived, but most servers throw away old articles after a time. Similarly, instant messaging feels transient, but many IM clients now allow you to record conversations. Finally, content that is pushed reaches the recipients without the recipients having to do anything; email is once again the classic example, but weblogs are becoming popular as well. In the pull model, readers have to go out of their way to get messages, i.e. go to a particular web site to read new messages in a forum.

Now, small software development teams want to be able to:

  1. ask teammates for help, or request feedback on ideas;
  2. report progress, or share things you’ve discovered;
  3. schedule meetings;
  4. hold meetings (rather than getting together face-to-face); and
  5. socialize,
  6. all without being flooded.

#1 wants to be targeted to team members, archived, and pushed. (Requests for help don’t actually need to be archived, but answers do.) #2 is similar, though it’s OK for progress reports to be pulled rather than pushed, so long as everyone pulls them eventually.

#3 can be transient, but should be pushed to everyone you want in the meeting. #4 is just a bad idea—the Winter 2003 students found that IM is a very inefficient way to hold meetings. (Check with them if you don’t believe me.) On the other hand, IM is great for #5 (although as Joel Spolsky says, running an IM client at all times is guaranteed to lower your productivity). As for #6, that comes down to the project admins putting spam protection in place, and project members playing nice.

So, with that lengthy preamble out of the way, how should team communicate?

  • Use the email lists for questions and short answers, especially when they are time-critical (i.e. someone needs to know how to do XYZ right now).
  • Add content to the project web site for progress reports, longer answers, discoveries, FAQs, how-to’s, and so on, but post a note (including a URL) to the project mailing list to let people know when new content is available.
  • By all means use IM for socializing, but if you spent ten minutes on IM explaining XYZ to someone, please, copy the conversation, paste it into a page in the project’s web folder, and let everyone know it’s there.

Hm… I’ve left something out here. In fact, I’ve left out the most important point of all:

USE YOUR PROJECT’S ISSUE TRACKING SYSTEM.

(Yes, I actually do think it deserves capital letters.) An issue tracking system is a project’s shared to-do list; it’s used to record bugs, features that need implementing, action items (such as “prepare a note for students on how to communicate”), and so on. What’s more, it records them in a structured way, and that’s what makes issue tracking systems so useful. Email, wikis, blogs, and so on are the “goto” statements of electronic communication: since you can put/send data just about anywhere, there’s no inherent rhyme or reason to traffic and content. An issue tracking system, on the other hand, imposes rules on format, state transitions, and so on, just as “for” loops and “if/else” statements impose order on goto’s.

Experience (mine, anyway) shows that imposing order at the outset is more efficient, and more helpful, than trying to reverse engineer it after the fact by building ultra-smart search engines. Experience also shows that if teams start using an issue tracking system early in the project lifecycle, when deadline pressure is low, they’re more likely to use it well by the time deadline pressure intensifies—i.e., by the time they really need it.

The Joel Test

So, how good is your software team? Joel Spolsky (who runs a
company called Fog Creek, and
writes a weblog that
everyone in the software industry either reads, or ought to) has a 12-question
test
to help you measure how well your team is doing. How do the
student teams hosted on Pyre
do?

1. Do you use source control? Yes. Absolutely. For everything.
2. Can you make a build in one step? The Helium team was pretty
close; the only thing that stood in the way was setting up the
database. Other teams vary—getting this right will probably be
worth a mark or two this term.
3. Do you make daily builds? Helium team did a
build every twenty minutes, and had a glowing ball on their home
page to show its status.
4. Do you have a bug database? Yes. The Memview team got into
using theirs to coordinate their work early on; the Helium team just got
into the swing of doing this in mid-August.
5. Do you fix bugs before writing new code? No. I’m not convinced this is always a good
thing (though I agree that it usually is). Sometimes, when you’re
“in flow”, finishing the current piece of the project is more
important than line-editing what you did this morning. I do
believe that leaving bugs more than a couple of days is dangerous.
6. Do you have an up-to-date schedule? We did for the last lap of Helium, which helped
us cut corners as the end-of-summer deadline rushed toward us.
The Memview team
stopped reporting weekly progress once they got into the crunch.
7. Do you have a spec? Low marks here—we did for Helium, but it fell
out of date while work was under way. It’s being updated now so
that the incoming students will know what they’re supposed to
build. The Memview team wrote a
good A&E (analysis & estimation) document at the start of
term, which gave them a roadmap for their work.
8. Do programmers have quiet working
conditions?
No. The Helium students are in
an open plan office (without a window); the Memview team worked
on their own machines at home most of the time, so I’m guessing
their environment included a lot of heavy metal music. (I dunno,
they just look like those kind of guys…)
9. Do you have the best tools money can buy? Half marks: Eclipse is a pretty good IDE,
and the machines were good enough to run it and the application
they were developing at the same time. Our server, Pyre, is a
little underpowered, but other than Helium‘s regular
build, it’s only being used as a CVS/SVN host.
10. Do you have testers? No.
11. Do new candidates write code during their
interview?
No. I’ll definitely be asking for this in
future, since students’ grades don’t correlate particularly well
with their ability to Get Things Done.
12. Do you do hallway usability testing? No. Should

So, that leaves Helium with 6/12, and
Memview with 5/12.
Let’s see how much better we can do this fall…

A Summer’s Worth of Links

Well, here they are: this summer’s favorite links.

http://websavvy-access.org/resources/top_ten.php Accessibility Guidelines web accessibility guidelines
http://www.gotdotnet.com/team/brada/APIUsability.pdf APIUsability.pdf usability studies; APIs
http://www.idealliance.org/papers/dx_xml03/papers/06-02-01/06-02-01.pdf Circles, Triangles, Rectangles extensible programming; technical articles; programming languages
http://weblogs.asp.net/oldnewthing/archive/2004/04/22/118161.aspx Cleaner, more elegant, and wrong programming examples; exceptions
http://www.devexpress.com/?section=/products/net/coderush CodeRush IDE; development tools; Visual Studio
http://jakarta.apache.org/commons/launcher/ Commons Launcher Java; open source; Tomcat
http://cruisecontrol.sourceforge.net/ CruiseControl Java; open source; tools; build
http://docsynch.sourceforge.net/ DocSynch open source; collaboration; text editor
http://www.jluster.org/log/d/social/science/2004/01/25/dovester_was_old_social_networks Dovester Social networks; history; anecdote
http://quintanasoft.com/dumbster/ Dumbster Java; open source; testing; email
http://www.mems-exchange.org/software/durus/ Durus Python; persistence
http://sourceforge.net/projects/easylog/ EasyLog Python; open source; tools; logging
http://emma.sourceforge.net/ EMMA Java; open source; tools; coverage
http://homepages.inf.ed.ac.uk/wadler/steele-oopsla98.pdf Growing a Language Steele’s paper; programming language design
http://www.joelonsoftware.com/articles/fog0000000073.html Guerrilla Guide to Interviewing essays; interviewing
http://www.aci.com.pl/mwichary/guidebook GUIdebook GUI; history of computing
http://mindprod.com/unmain.html How to Write Unmaintainable Code program design; tutorial
http://www.tldp.org/HOWTO/Encourage-Women-Linux-HOWTO/index.html HOWTO Encourage Women in Linux advocacy; gender issues; open source
http://tidy.sourceforge.net/ HTML Tidy Open Source; HTML Tidy
http://www.interactivetools.com/products/htmlarea/ htmlArea HTML editor; open source
http://udell.roninhouse.com/GroupwareReport.html Internet Groupware for Scientific Collaboration Udell’s report for LANL
http://developer.netscape.com/docs/manuals/security/pkin/ Intro to Public Key Cryptography Netscape; tutorial; public-key cryptography
http://www.xml.com/pub/a/2004/07/21/oxml.html Introducing o:XML XML; extensible programming languages
http://www.izforge.com/izpack/ IzPack installer; open source; Java; tools
http://jamvm.sourceforge.net/ JamVM — A compact Java Virtual Machine Java; open source; JVM
http://www.javaspecialists.co.za/archive/Issue089.html Java Exception Handling Java; exception handling; tutorial
http://metrics.sourceforge.net/ Java Metrics Java; open source; tools; code metrics
http://www.jave.de/ JavE Java; open source; ASCII art; editor
http://burtleburtle.net/bob/math/jenny.html Jenny open source; testing tools; test generator
http://www.jgraph.com/ JGraph Java; tools; open source; graph drawing
http://jmechanic.sourceforge.net/ jMechanic Java; open source; tools; profiling
http://jsvn.alternatecomputing.com/ JSVN Java; Subversion; open source; tools
http://karrigell.sourceforge.net/ Karrigell little Python web server
http://www.lorem-ipsum.info/generator3 Lorem Ipsum tools; internationalization; testing
http://mozart-dev.sourceforge.net/ Mozart open source; extensible programming systems; Mozart
http://napkinlaf.sourceforge.net/ Napkin Look open source; Java; user interfaces; humor
http://www.nedbatchelder.com/text/index.html Ned Batchelder essays
http://nifty.stanford.edu/ Nifty Assignments education; programming assignments
http://www.sunlabs.com/techrep/1994/abstract-29.html Note on Distributed Computing papers; distributed computing
http://www.oldversion.com/ OldVersion.com archive; old versions of software
http://www.manageability.org/blog/stuff/open-source-automated-test-tools-written-in-java Open Source Java Testing Tools Java; open source; catalog
http://pamie.sourceforge.net/ PAMIE Python; open source; tools; Internet Explorer; QA
http://csce.unl.edu/%7Ewitty/sp2004/csce496/ Performance Analysis of OO Systems course; performance analysis; programming
http://pmd.sourceforge.net/ PMD Java; tools; open source
http://www.giuseppetanzilli.it/mod_auth_pgsql2/ PostgreSQL Apache Auth Apache; authentication; PostgreSQL
http://www.prevayler.org/wiki.jsp Prevayler Java; open source; persistence
http://www.tmtm.com/nothing/archives/000497.html Programming Performance empirical studies; programmer performance
http://www.prothon.org/ Prothon (prorotype-based Python) programming languages; Python; experimental
http://pyb.sourceforge.net/ Pyb Python; open source; tools; build
http://pychecker.sourceforge.net/ PyChecker Python; open source; source code analysis
http://www-itg.lbl.gov/gtg/projects/pyGridWare/ PyGridWare Python; open source; Grid
http://sourceforge.net/projects/pymonitor/ PyMonitor Python; open source; tools; performance monitoring
http://www.atug.com/andypatterns/pynsource.htm PyNSource open source; Python; tools; UML
http://www-106.ibm.com/developerworks/library/os-ecant/index.html?ca=drs-tp2604 Python/Eclipse Python; Eclipse
http://www.rallydevelopment.com/ Rally project management tools
http://www.securesw.com/security_tools_download.htm RATS open source; tools; security
http://www.vdesmedt.com/%7Evds2212/rsync.html RSync in Python rsync; Python; tools; open source
http://www.catholic.org/isidore/ St. Isidore humor
http://web.mit.edu/ghudson/thoughts/diagnosing Subversion critique of subversion from early 2003
http://web.mit.edu/ghudson/thoughts/undiagnosing Subversion critique Response to critique of Subversion from early 2003
http://xplusplus.sourceforge.net/ SuperX++ open source; extensible programming systems; SuperX++
http://dorffweb.com/?page=taptutorial Tapestry Tutorial Tapestry; open source; tutorial
http://www.edgewall.com/products/trac/ Trac open source; tools; wiki; version control; SVN; issue tracking
http://qse.ifs.tuwien.ac.at/~auer/umlet/ UMLet Java; tools; open source; UML
http://www.joelonsoftware.com/articles/Unicode.html Unicode explanation Joel Spolsky; Unicode; background; tutorial
http://eyegene.ophthy.med.umich.edu/unicode/ Unicode Primer Unicode; tutorial
http://www.versionone.net/ VersionOne agile development; project management; tools
http://whisper.cx/ Whisper open source; blogging; content management

Filters, Performance, and Priorities

Where has the summer gone? We’ve written a lot of software (and documentation), but there’s still a lot to do before Helium (er, sorry, Hippo—we’re renaming it, for reasons that are too silly to go into) is ready to deploy.

Eleven undergraduates from the Department of Computer Science at the University of Toronto will be carrying on with the project this fall. Between now and September 9, we have to decide what we want them to work on. In order to do that, we have to resolve some technical issues that we’ve been deferring.

For example, Hippo includes mail management, so that projects can have mailing lists. We’d like to make use of whatever SMTP server our host is running, but that means we have to find a way to get messages into Hippo. We also have to put some kind of spam filtering in place.

However, we also want to keep Hippo unilingual if at all possible. It would be fairly easy for our existing programming team to write a Python script (or even a shell script) to feed Hippo mail, but if we do that, I’m afraid we’ll be falling into the same trap that SourceForge did. Last time I checked, SourceForge depended on five (5) different languages (PHP, Perl, Python, Bash, and C), plus more than a dozen major third-party packages and many smaller libraries. Getting all this stuff installed is hard; keeping it all in synch is harder; and finding programmers who know XYZ when it breaks is a continuing headache. (“Sorry, dude, if this bit was written in Perl, I could help you…”)

So why not write the necessary glue in Java? Performance. Write a little shell script that runs a simple “Hello, world” script in Python a thousand times, and see how long it takes to run. Now do the same in Java. Make sure you have other applications running on the box, so that you’re getting realistic numbers, rather than never-to-be-exceeded everything-in-memory peaks. On Pyre (the Linux box that’s hosting Helium’s development), the difference is about 4:1. It was less (about 3:2) on my souped-up development box at work, but only when I wasn’t running anything else. As soon as the VMs were fighting with other applications for space, Java lost: badly.

We have several ways to deal with this. One is to bite the bullet and write the smallest possible shell/Python script to move mail from the SMTP server to (for example) a scratch directory, then have a long-running Java daemon do everything from there. (I’m pretty sure that Java will match Python’s performance on the actual processing, though I don’t yet have any data to back that up.) Another would be to throw up our hands and say, “OK, Java’s been fun, but this is the N’th time (where N is approximately 3) that we’ve found a case where Python would have been easier, so maybe we should start over.” Needless to say, I’m less than excited about that…

I’m sure there are other possibilities; I’m also sure that we’re not the first development team to bang our heads against this particular wall. As always, pointers and advice are welcome…

Configuration files and dynamic languages

A few months ago, Carlos Perez blogged about someone else’s claim to have built a mildly complex web app using Ruby in just two months. Perez argued that one reason Ruby-based development was so much faster than Java-based development would have been was:

  1. Java web app frameworks typically use lots of configuration files.
  2. These are effectively dynamically-typed (i.e. checking is done at runtime, rather than during a compilation phase).
  3. If you’re going to use dynamic typing, you’re better off going all the way, rather than trying to weld dynamically-typed configuration onto a statically-typed language.

Perez offers no evidence to back up his argument, but it’s an interesting one nonetheless. Hibernate, Tapestry, Tomcat itself, and many other current-generation Java tools require XML configuration files that are so complex that users are effectively doing bilingual programming, without having a debugger for one of the two languages.

In contrast, dynamic languages like Ruby and Python allow users to type in complex data structures directly, and make it easy to include calculated values. This is (in my opinion) the main reason why SCons (a build system written in Python) is easier to use than Ant: one language, with a debugger, and you don’t have to jump through hoops to say “for each” or “only do this if A and B are true, but C isn’t”. Even Ant‘s inventor agrees: a few months ago, James Duncan Davidson wrote:


If I knew then what I know now, I would have tried using a real scripting language, such as JavaScript via the Rhino component or Python via JPython, with bindings to Java objects which implemented the functionality expressed in todays tasks. Then, there would be a first class way to express logic and we wouldn’t be stuck with XML as a format that is too bulky for the way that people really want to use the tool.

Now that entire books are being written to decry the complexity of Java application frameworks, and urge us to return to simpler code, it’ll be interesting to see whether more programmers switch to dynamic languages, or whether the continuing backwardness of those languages’ application frameworks drives programmers back to Java and .Net.

Real-time Scheduling

In this blog posting from May 2004, Johanna Rothman talks about how easy it is to build a schedule for a small project using just yellow sticky notes. Back in June, I commented on Roy Osherove’s whiteboard-based project management, and asked for links to software tools that were as easy to collaborate with. None were forthcoming, but thanks to Michelle Levesque’s recent experience at VanPy, I now know where to look.

Michelle saw people in several conference sessions using SubEthaEdit to take notes collaboratively. For those who haven’t seen it, SEE allows any number of people to edit a single document simultaneously. It sounds like a recipe for chaos, but if authors are willing to follow a few obvious social rules, it can be tremendously productive. Michelle and Karen Reid are already thinking about having students use it during lectures to take a single, shared set of notes (and about how to prevent the one bad apple in every class from spoiling it).

I’m now wondering whether something like SEE could make electronic schedule construction as easy as whiteboards and yellow sticky notes. I know that wikis can be used this way, but wikis feel like email and turn-based games, while SEE has the zing of instant messaging and real-time games [1]. Schedule negotiation feels like it needs rapid, interactive give-and-take, at least for small projects (a dozen programmers, a dozen months).

So, let me ask my question again: who’s building rapid-fire interactive scheduling and tracking tools using collaborative editing? Anyone? Anyone at all?

[1] Like Homeworld, the best real-time strategy game ever built—socks, but the sequels were disappointing…

Poor Cousins

Four weeks left until the summer’s work on Hippo (formerly Helium)
winds down, and we’re starting to run into the 10% of cases that make
up 90% of the grief. For example, consider the problem of keeping
track of the relationships between users and projects. In SourceForge and other systems, this is a
simple pairwise relationship: projects are not related to one another,
and users do not belong to groups (other than the groups implicitly
formed by their membership in particular projects).

That’s not good enough for Hippo. Students are naturally grouped
by the courses they belong to; instructors must be able to do
groupwise operations (such as making all students in course C members
of project P) in a single step, or Hippo’s administrative overhead
will be prohibitive. Similarly, instructors must be able to manage
batches of projects at once, so that they can do things like delete
all projects associated with Exercise 3 of Course C with a single
command (after backing them up, of course).

To handle this, we’ve organized projects as a tree, and users/user
groups as a graph. A single root “super project” represents Hippo as
a whole; every other project must have a parent. Similarly, users can
belong to groups, which can also contain other groups (though cycles
are not permitted).

Now comes the tricky part. A user U’s relationship to a project P
is described by one of a small set of roles, such as “observer”,
“developer”, or “admin”. (A fourth role, “unaffiliated”, isn’t
explicitly represented, but is used when U has no other relationship
with P.) Relationships are inherited: if U has no relationship to P,
but does have a relationship R with P’s parent Q, U also has
relationship R with P. This means that if an instructor is an admin
of the project “/csc207″, then she is also automatically an admin of
“/csc207/exercise01″, “/csc207/exercise01/studentFred”, and so on.

Relationships are also inherited via group membership. If U has no
relationship with P, but U is a member of a group G, and G has a
relationship R with P, then U has relationship R with P as well. For
example, if a student is a member of the group “csc207Students”, and
“csc207Students” is an observer of “/csc207″, then the student is an
observer of “/csc207″.

Seems pretty simple—at least, it seemed pretty simple to
us. However, inheritance means that a user U can have several
different candidate relationships with a project P. For example, U
can be an explicit observer of P, but also be a member of a group G,
which is a developer for P’s grandparent. Which relationship should U
have with P? We decided two and a half months ago to use the
“strongest” relationship: we find all possible relationships via
transitive closure, and if any of them allow U to perform a requested
operation on P, the operation is permitted.

The problem is, that leads to Bug
#36
:

Users are allowed to have multiple memberships to a single project,
both through inheritance and through user groups. When a user wants
to change something (ie mail settings), the getMemberships(user,
project) method has to decide which of these memberships to return.
Currently, it returns the user’s strongest membership, based on role.
The problem with this is that if the user’s strongest membership is an
implicit one, or if it is through a user group, the user is unable to
change the settings (unless an explicit membership is created for the
user and project).

Let’s go over that again. Hippo represents relationships between
users and projects using instances of the class Membership. If
membership is implied, rather than direct, then when we look up U’s
membership in P, we sometimes get back an object representing the
relationship between a group that U belongs to (directly or
indirectly) and a project that is a parent of P (directly or
indirectly). That’s fine if we just want to find out whether or not
some operation is permitted, but if we want to change U’s
relationship with P, what do we do? We can’t change that relationship
object, since that would potentially affect other users’ relationships
with other projects.

OK, so we don’t change that object; we add a new one representing
the more specific relationship. That solution fails to address Bug
#37
:

Another problem with implicit memberships: users cannot be deleted
from projects (or have their roles changed etc.) when their membership
is via a user group, unless they are removed from the group. Removing
a user from a user group falls under user-user authorization, which
has been deferred.

And on, and on, and on. Representing all relationships
explicitly (no groups, no project tree) would make this problem go
away, but we believe it would make administration much more onerous.
Adopting Unix-style permissions (rwxr-xr–, anyone?) is tempting, but
we worked through that three months ago, and it doesn’t address all of
our use
cases
either.

Which brings us to this posting’s title. HP’s marketing literature
describes the product I work on in terms of authentication,
authorization, and access control. Authentication means figuring out
who you are; authorization means figuring out what you’re allowed to
do; and access control means enforcing those rules. There’s a ton of
literature about authentication, and a fair bit in the operating
systems world about access control. By comparison, authorization is a
very poor cousin indeed: you can find detailed descriptions of the
schemes used in particular systems, like Unix permissions and Access
Control Lists (ACLs), but there doesn’t seem to be any “theory” behind
it all.

Authorization seems to be one of those things that is overlooked by mainstream computing, for no good reason. Think about it: parsing and code generation are part of the standard curriculum, but implementing byte code interpreters and debuggers are not; process scheduling is, but linking is not; and so on. I don’t know why some fields are “poor cousins”, while others are so heavily overfarmed that they suffer from the intellectual equivalent of salinization, but perhaps there is scope here for both innovative research, and for new languages or programming models to make some headway.