Archive

Archive for July, 2004

The right tool for the job

July 26th, 2004

It is easier to bang in a nail with the back of a screwdriver than with a handleless hammer (believe me, I’ve tried and it wasn’t pretty). Even though a hammer is the right tool for that job, sometimes you have to switch tools if the one that you would like to use isn’t complete.

When we began Helium a few months ago, I’d hoped that we would be able to use Python for most of the project. Python is my favourite programming language, and I felt that it would help the project with productivity (by reducing the amount of time spent struggling with syntax) and longevity (since code that is easier to read is also easier to maintain). Since then, I’ve discovered several other more important reasons why Python was the right tool for the job. So why, then, is Helium written in Java?

The answer to this question can be found on the PythonInfo Wiki under WebProgramming. In order to list all of the web frameworks in Python, you have to scroll through several screenfuls of names. This, in and of itself, is enough to make me cringe. I can’t help but feel that this goes against part of Python’s spirit. In fact, one of the postulates of the Python philosophy states that “there should be one — and preferably only one — obvious way to do [something]“. It turns out that there isn’t just one obvious way to build a web application in Python: there are several pages worth of options to choose from.

If the problem only ran as deep as having to make a choice between them, I wouldn’t mind so much; I can get a coin (or, in this case, a 70-sided die) to make that decision for me. The real problem with this kind of rampant fragmentation is that you have no guarantee that the piece you choose will still be an active project in one, two, or five years from now. Furthermore, every piece will be far less mature than if there were very few options to choose from, since the developer effort will be spread thin across so many projects.

Beyond the choice of web framework, it was no more clear which Python Object/Relational mapping tool Helium should use. Every developer you ask has a different opinion of what the best way to do this is in Python. Again, it’s not their disagreement itself which is the problem, it is what their disagreement implies: there is not a single obvious way to do O/R in Python.

For these reasons, we chose to write Helium in Java. For the first few weeks our language choice made little difference. But as Helium grew in size and complexity, we kept finding ourselves stumbling over the language, rather than having the language make our lives easier. There are currently three independent parts of Helium which had to be written in a wrapper scripting language (we, of course, chose Python) because we couldn’t get Java to do what we needed it to. We’re also heavily interfacing with Subversion, and the Python-SVN bindings are far more mature than the Java bindings. Over and over, I felt constantly reminded that Python was the right tool for this job. Unfortunately, it was the hammer without a handle.

On my next web application project, I would like to use Python. But I’m not going to feel comfortable doing so until there’s a clear “winner” or two in the web framework field. Just a few minutes ago, Jon from the Memview team asked me, “What sort of web architecture do you like to use with python?”, and I was forced to shrug helplessly. I didn’t have a good answer. His response captured my feelings exactly: “At least with php you dont have to choose”.

Uncategorized

Preparing for the Next Round

July 23rd, 2004
Comments Off

The team that’s going to be working on Helium this fall had its first meeting last night. If all goes well, eleven of the department’s best undergraduate students will build on all the hard work that Michelle, Laurie, Jason, Eric, and Wilfred have put in this summer, and deliver something that we can use in undergraduate classes.

The next step is to determine exactly what they will work on. The current list (in priority order) is:

  1. Real Subversion integration. We believe we can use applets to give users a complete in-browser interface to Subversion (rather than a read-only interface, which is what SourceForge and GForge offer). In order to do this, we must first bring Subversion‘s Java bindings up to date, which is an entire project in itself…
  2. A scripting interface. We can’t require instructors to create hundreds of user accounts, and hundreds of projects, by hand. Instead, we’re hoping to provide a Jython library that instructors and administrators can use to work directly with Helium‘s data model. (The alternative would be to expose the database tables, but that would bypass all the constraints that the model layer enforces.)
  3. Testing infrastructure. Helium‘s lower layers are exercised by a unit test suite—in fact, there’s currently more testing code than application code. The next step is to adapt HttpUnit, HtmlUnit, or something similar to do end-to-end testing on the whole application.
  4. A progress monitoring framework. Helium‘s home page displays a status light (green if all tests are passing, red if there are failures) and a graph showing the growth of the source and test code over time. Displays like these will help students and instructors see how their projects are doing.
  5. Searching. Google is every programmer’s friend; the Lucene project puts that search capability into everyone’s hands. We hope that if forums and site content are searchable, fewer students will repeat questions umpteen times.
  6. Issue tracking. Version control and issue tracking are two of the things that distinguish professional programmers from amateurs. It’s easy to show students why they want the former (if nothing else, version control helps them keep their home computers in synch with their accounts on the university’s machines), but motivating issue tracking is harder. Initially, we’re going to add it to Helium for our own use (we’d like to be self-hosting by September). Once it’s in place, we hope we’ll be able to find ways to work it into courses.
  7. Blogging, wikis, project and personal home pages, NNTP integration… There’s a lot more we’d like, but they’ll probably have to wait for the Winter 2005 term.

It’s a lot, but with two or three students per item, we should be able to pull it off.

Uncategorized

Dependencies

July 21st, 2004
Comments Off

The list of Helium dependencies is growing longer by the day. What I’ve found interesting, however, has been our decisions about when we’re going to incorporate an external package into Helium and when we’re going to build our own. It’s a delicate balance.

Right now, Helium is using JavaMail for when Java needs to parse mail headers, and ViewCVS to browse the SVN repository from the web. However Jason Montojo wrote us our own set of Java SVN bindings, and we’re using our own mailing list manager rather than something already available, like Mailman.

The more dependencies Helium has, the harder it’s going to be to setup and maintain. (What happens when a huge exploit is discovered in ViewCVS and Helium has to update its ViewCVS version, but so much has been changed that they’re no longer easily compatible?) And yet by using someone else’s code, we save on time and are able to use the lessons others have learned, rather than starting from scratch on everything.

It will be interesting to see which of our decisions were the right ones and how much it will affect Helium in the end.

Uncategorized

Up and to the Right

July 21st, 2004
Comments Off

I spent a twenty minutes this morning throwing together a couple of Python scripts to measure Helium‘s progress over time. The first script checks Helium out of CVS for each day since the project started, and counts the number of lines of source and test code. The second script takes that data and produces this simple graph.

I built this simple tool for three reasons. First, it’s just plain cool ;-) . Second, the five students working on the project all finish at the end of August. I know how what you didn’t accomplish can loom large in your mind toward the end of a project, so I hoped this graph would remind them of how much they’ve accomplished, and how quickly.

Third, these little scripts are my way of finding out whether a larger tool of this kind would be worthwhile. One half would periodically gather statistics about a project and store them in a database; the other half would extract that data and create images for display. The whole thing could then be integrated into Helium itself, so that students and professors could gauge projects’ progress over time.

SourceForge already has something like this, of course. However, its progress meters aren’t extensible: you can’t create new metrics and plug ‘em in. I was sure someone would have written an open soruce framework for this, but Google hasn’t turned anything up. I therefore have a couple of students from the study group building a prototype. They’re using Checkstyle to collect information about Java source code; eventually, we’ll want to extend it so that people can use other tools, or languages, to gather data.

Uncategorized

A Sense of Adventure

July 16th, 2004

According to Larry Wall, the inventor of Perl, three characteristics distinguish good programmers: laziness, impatience, and hubris. Of these, I think hubris is the most important, although I prefer to think of it as a sense of adventure. The best programmers I know are all comfortable with:

  1. googling various combinations of likely-sounding terms until they find some software that might do what they want;
  2. downloading, installing, and trying out that software; and
  3. throwing it all away and starting over again if it doesn’t seem to be doing what they want.

With every passing year, more and more of what’s in the average programmer’s toolbox is free or open source software. Knowing how to go through the three steps listed above efficiently is therefore increasingly important. My question is, can you teach someone to do this, and if so, how? Can you tell an undergraduate class, “For 10% of your course mark, find a tool to profile Java applications, run it on the example code you’ve been given, and hand in the profile”? How do you mark something like that? How do you prevent cheating? Most importantly, how do you prepare students to tackle it?

Uncategorized

Microsoft Wins Because They Deserve To

July 10th, 2004

Now that I have your attention (as I’m sure I do with a title like that ;-) … Five items came across my screen this morning that made me think, yet again, about why Microsoft dominates the desktop.

Item one: the .NET Book Club, which is “…an organization to promote reading and discussion amongst professional developers. Along with this, the group provides book reviews for interesting readings and suggested reading materials.” Nine years after Java debuted, there are plenty of sites that review Java-related books (such as the books section at JavaRanch),but I don’t know of any self-improvement book clubs.

Item two: Mike Gunderloy’s Coder to Developer, which I think is an excellent book. I’ve just subscribed to his site, Daily Grind, where he posts several notes a day about new tools for .Net developers. It’s got me thinking: how come no-one wrote something like Coder to Developer based on Java? There are plenty of books about modern Java tools (most of which also advocate Extreme Programming), but none that measure up to C2D.

Item three: Don Box’s blog entry titled “Travels in Java Land”, in which he says:

I was also surprised at the continued enthusiasm for AOP [aspect-oriented programming] over in Java land. I remember being infatuated by it maybe 5 years ago, but obviously the Java folks have found lasting love.

Aspect-oriented programming takes object-oriented programming a step further by allowing developers to add functionality (“aspects”) to methods
orthogonally to the inheritance tree. It’s certainly useful for things like persistence and logging, which cut across class hierarchies, but only in the hands of the kinds of ubergeeks who really do find things like closures intuitive. For the other 90% of programmers, who are still wrestling with inheritance, it’s just one more place where they’ll have to rely on voodoo and superstition. It may be great fun for academics, but it’s just a distraction for Java at a time when .Net is threatening to take over the world.

Item four: Eclipse. We use it at Hewlett-Packard, and my students use it at the University of Toronto, and I think very highly of it. But start it up with a stopwatch in your hand; on a 2.4 GHz Pentium with 1 GByte of RAM, it takes about eleven seconds. Now start up Visual Studio .NET on the same machine: under 5 seconds. Try refactoring, or navigating through the class browser—you’ll see the same performance differences. On a typical undergraduate machine (256 MByte of RAM, 1.2 GHz processor), Eclipse is unusable as soon as you have one other large application running (like Tomcat).

A lot of people (myself included) have told the folks at IBM/OTI that if undergrads can’t use Eclipse on their personal machines, they (IBM) are leaving an opening for Microsoft. This morning, I got mail from a friend from [name of university deleted because I haven't had a chance to ask him if I'm allowed to say], telling me that they’re switching from Java to C# in part because VS.NET is just a compelling tool.

Item five: Bryan Cantrill and others have been discussing the growing gap between academia and industry. I won’t try to recapitulate their discussion here; I’ll just point out that fewer than a quarter of the academics I know use a version control system, and fewer than a tenth use an IDE.

Put these together, and what I get is that Java is still, in its heart, academically oriented. Hundreds of thousands of developers may be building real-world applications on top of it (I’m one), but: aspect-oriented programming? IDEs that only run on top-of-the-line machines? Compare that with a book club for professionals who want to keep their skills up to date, or the maturity of .Net tools compared to their Java counterparts (despite the six-year lead Java had). Microsoft is winning because their #2 priority (after making money) is to make developers productive. They’re winning because they deserve to win.

Uncategorized

Smart Views vs Model Facades

July 4th, 2004
Comments Off

The Helium team hit a milestone last week: they demoed working pages, backed by a persistent store. It may sound pretty tame, but when you consider that none of us had seen Tapestry or Hibernate eight weeks ago, it’s not bad.

We now have to face a difficult design decision. The current pages show everything to everyone, but the final system has to filter content according to user identity. For example, professors have to be able to see everything, but students must not be able to view each other’s projects. Jason Montojo has designed and implemented an authorization module to decide whether user X can see thing Y. The question is, who should invoke that module, when, and where?

Since Tapestry is a Model-View-Controller , there are three places authorization decisions could be made: the model, the view, or the controller. We ruled out the controller right away: as in Struts and other frameworks [1], Tapestry‘s controller simply despatches from one application servlet to another.

In email, Howard Lewis Ship (Tapestry‘s inventor) recommended putting authorization decisions in our views: for each element X, the view would call the auth module to determine whether or not do display it.

In contrast, Irving Reid suggested putting a facade on top of our model classes. The facade class (or classes) would filter actual model content according to user identity, so that each view only saw the things it was supposed to display.

We’ve decided to go with the model facade, primarily because we think we’re going to have lots of fine-grained authorization decisions to make, rather than a few coarse-grained ones. Smart views are probably appropriate for portals, in which large chunks of content are included or excluded as wholes, but it doesn’t feel like the right solution when the “content” is as small as the individual links in a tree display.

The most interesting thing about this decision (for me, anyway) is that the issue isn’t mentioned in any of the books I’ve read about web application frameworks. It appears to be yet another one of those “orphan” topics that everyone has to deal with, but no one bothers to document. I’d be very interested in pointers to MVC frameworks that have built-in support for authorization decisions, or to books or papers that analyze the problem.

[1] I was a little depressed to discover that both Tapestry and Struts are Apache projects. I appreciate open source’s “let a hundred flowers bloom” philosophy, but the outcome is often too many projects that fall just short of the critical mass needed to become category killers. For example, if all the effort that has gone into Python many web frameworks had instead been put into one, we’d probably be building Helium in Python instead of Java.

Uncategorized

Command-Line Power Tools

July 1st, 2004
Comments Off

Harald Koch just pointed me at XMLStarlet, a command-line toolset for manipulating XML. This isn’t the first beast of its ilk — Sean McGrath built similar tools several years ago in Python, for example — but it seems to be more mature than others.

Clicking through the documentation, I’m struck yet again by the disconnect between programming’s two approaches to handling “odd jobs”. The first is the Unix command-line filter model; the second, scripting. The second is more powerful, primarily because it gives programmers access to richer data structures, and a wider set of control constructs. (I talk about this more in this article on extensible programming systems.)

Why then has the command-line model proved so durable? According to Irving Reid, the main reason is that you get more bang for your keystroke: an experienced Unix geek can do wonderful things in a five-filter pipe. Once you add a few simple control constructs (like while read var, which I only discovered last fall), you have a lot of power at your fingertips.

Which brings me back to the problem of supporting batch operations in Helium. One of the biggest differences between it and existing systems like SourceForge and GForge is that Helium administrators have to be able to operate on dozens or hundreds of users and projects at once. Every project in SourceForge is a separate entity; so is every user. In Helium, on the other hand, projects are grouped by course, users are registered in courses, and so on. If someone has a class list for CSC207, they must be able to create a new project for each student in the course, and make the student a developer in that project, without clicking through a web interface for several hours.

The three options we have considered were discussed in an earlier post: web services, a library of bindings, or using a Java-friendly scripting language like Jython. A fourth possibility, though, is to provide a set of command-line tools that talk directly to Helium‘s database. The downside is that we lose the consistency checks that our data model classes implement; the upside is that administrators can then do small things quickly by combining our tools with others they already know.

I’d be interested in hearing from people who have done things like this with other systems.

Uncategorized