Archive

Archive for October, 2006

DemoCamp 10: Congratulations

October 24th, 2006
Comments Off

DemoCamp 10 was held last night, and three of the five presentations were from U of T. Sana Tapal (now at Jonah Group) and Andrey Petrov led off with the Online Marking tool; Jonathan Lung (who was part of the student team that presented at DemoCamp 5) showed us all how productive PHP procrastination can be; and Sacha Chua tried to convince us that Emacs isn’t actually bad for you. The other two demos were a social networking/quotes site called Quotiki.com, and Broken Tomb, which advertises itself as the world’s first commercial Smalltalk host. There wasn’t any new technology, but the presenters were entertaining, and it was fun to read the stuff that flashed by on the screen during their demo; the Smalltalk demo had a lot of technical and other difficulties.

I was really pleased to see such a good turnout from the university, and even more pleased that our students were just as polished as the pros. DemoCamp 11 is on Monday, November 20; along with a news aggregators and wikis, we’re going to see SSL certificate provisioning and Andrew Reynolds talking about Selenium. Hope to see you there…

Later: Thuan Ta has posted a video of the OLM demo.  Grainy, but the audio is pretty good ;-)

Uncategorized

DrProject Internals: Setting the Stage

October 23rd, 2006

Over the past 18 months, students here at the University of Toronto have modified and enhanced an open source system called Trac to create DrProject, a classroom-friendly software project management portal that addresses the unique needs of undergraduate programming teams. With Version 1.2 of DrProject just a few days away, I thought it would be a good time to describe its current architecture, and how it got to be the way it is.

Let’s start with a simple wiki that consists of nothing more than a bunch of pages that can be edited over the web. Each page is stored as a file on disk, and is written using a simplified syntax in which (for example) ''this'' becomes an emphasized this and ``that`` becomes a code-style that (Figure 1). When the user wants to view a page, the web server runs a CGI program that translates the simplified wiki markup into HTML; when the user wants to edit a page, the CGI program puts the file’s contents into an editable text box, writing it back to the file when the user presses the submit button.

Figure 1

Now suppose we want to keep track of who edited each page, when, and why—in short, we want to keep some meta-data about each page. One option would be to store a header in each file of the form:

Author: Grace Hopper
Timestamp: 2014-07-03 15:43:06
Comment: Updated boot page for IDIAC to describe new quantum array.

= The IDIAC 900 = The IDIAC 900 uses a 1.4 petabit quantum array for...

This would be easy to implement: we just teach our CGI program to insert the header when the page is edited, and strip it off before formatting the page for output, and we’re done.

But what if we also want to record old versions of pages, so that users can undo changes, or view a page’s history? We could put everything in one file, and separate successive versions with some kind of textual marker, but then we’d have to worry about users putting that marker in their pages. For example, if the separator was a line of 70 dashes, the CGI would think that any page containing such a line was actually two separate pages. Another option would be to name successive files PageName.1, PageName.2, and so on, and keep a link called PageName pointing at the most recent version.

Most wikis don’t actually do this, though. Instead, they store pages in a database instead (Figure 2). At its simplest, the database contains one table with five columns: the page’s name, a version number, a timestamp, the author’s name, and the text of the page. When the user asks for a page, the CGI program finds the record with the page’s name and the maximum (integer) version number. To create a new version of a page, it simply adds a new record with the appropriate information.

Figure 2

Storing pages in a database turns out to be simpler and faster than storing them as files. If you want to find all the pages authored by Grace Hopper, for example, a database can do it with a single query; if your pages are stored as files, your CGI program will have to open them one by one to read their headers. Similarly, if you want to get the names of all the pages the wiki currently contains, so that you can tell which CamelCaseWords to format as links, the database can give them to you in one operation; if you use the filesystem, you’ll have to open the directory, read its contents, throw away everything that contains a version number, and so on.

What kind of database should the wiki use? Our choices are a database that is managed by its own long-lived server, or an embedded database, which is really just a library of functions that manipulate a chunk of disk reserved to store data (Figure 3). MySQL and PostgreSQL are well-known examples of the former; SQLite is perhaps the most widely instance of the latter.

Figure 3

The right answer is not to choose. Instead, we should design our CGI program so that it can use either, since the best solution for a particular deployment will depend on a lot of other factors, and may in fact change over time. Luckily, every modern language comes with tools that help insulate programs from the idiosyncracies of different databases. If we use Java’s JDBC, for example, the only bits of our code that are specific to a particular database are the bits that load that database’s driver, and establish a connection to the database. Everything after that is written in abstract terms: fetch this value, create that record, and so on.

Life’s not actually that simple, though. SQL is supposed to be a standard, but every relational database implements the standard differently. SQL commands that work with one database won’t work with another, or will behave differently. Industrial-strength systems therefore have to create (or use) another layer of insulation on top of that provided by SQL and their database connection libraries. These tools are called object-relational mappers, or ORMs, and we’ll look at them a few posts from now. Before then, though, we need to think about how we’re going to make our system secure: that will be the topic of the next article in this series.


Alan Grosskurth and others have asked, “Why not use a version control repository as a backing store instead of a database?” It’s a good question, and I wondered about this issue myself when I first looked into how wikis are implemented. Good version control systems, like Subversion and Perforce, can keep track of meta-data, as well as files’ histories. Why not keep each page in a repository? It would certainly help solve the problem of several people trying to update a page at once.

I think there are two reasons why most wikis haven’t gone that route. The first is performance: in order to format a wiki page, the CGI needs to know the names of every other page in the system (so that it can tell which CamelCaseWords to turn into links, and which to mark as “not yet written”). If you’ve already established a connection to a database, SELECT PageName FROM Wiki is probably cheaper than calling opendir, reading a list of directory contents, and filtering out things that aren’t pages. I don’t actually have any data on this, though; if anyone has pointers to any, please let me know.

The second, and more important, reason that most wikis use a database is inertia. The four horsemen of the web applicationalypse are browser, server, CGI, and database. If you’re used to building e-commerce and social networking sites in PHP, ASP.NET, or RubyOnRails, storing text in a database comes naturally; using a version control system feels like adding complexity. Of course, as soon as you decide you need to reconcile conflicts between concurrent edits, the cost of re-implementing what version control does best quickly outweighs anything you saved by not integrating SVN or P4 into your system…

DrProject, Teaching

And I Thought *I* Worried a Lot…

October 22nd, 2006
Comments Off

Anders Sandberg’s “Warning Signs of Tomorrow” is funny. And frightening.:

Uncategorized

Michelle Levesque on “Getting Hired at Google”

October 20th, 2006

This from a former 49X student who’s now working at Google:

Google hired me because of my extracurricular activities, not my grades.  Nothing impresses interviewers more than a student who takes an active role in the computer science community, so get involved in open-source projects, build your own website, do PEY, or find a part-time programming job.

The resources are available here to accomplish nearly anything that you can imagine.  But no one is going to just hand these possibilities to you; it’s going to be up to you to discover what you want to know, do, and learn.  It doesn’t matter if you don’t know what you want to do with your life, because even if you do, your plans will probably change.

Do something outside of your classes — something that you love.  Draw, paint, act in a play, take up fencing, join an activist group, learn how to fix cars, teach kids basic math, organize a camping trip, write a short story, start a blog, bake cookies for your class every month.  It doesn’t have to involve computers, and today you might not be able to dream of how it could one day further your career.  But everything that you do helps to make you into a more interesting person and people really do take notice of these things.  In the end it will be these extra activities, and not your grades, that people will remember about you.

Teaching

The Baby Just Kicked!

October 20th, 2006

I am thrilled — just thrilled — to report that the baby just kicked Sadie in the squishy bits. There have been nudgings before, but this was the first unequivocal instance of “Goooooooaaaaaaaalllllllllll!!!!!!!” Today is the last day of week 18; we’re due to ship on March 23.

First Ultrasound

Family

Award Winners

October 20th, 2006
Comments Off

Several CSC49X students (past and present) are to be honored at the department’s award ceremony next week: congratulations to Olga Vesselova, Maria Khomenko, Petcharat Viriyakattiyaporn, Jonathan Lung, and all the others students in our department who have done so well this past year.

Uncategorized

Why Software Projects Are Always In Crisis

October 18th, 2006
Comments Off

Once you’ve gone mountain biking, i’s hard to get excited about riding a kid’s tricycle.  If you spend a lot of your time putting out fires, it’s hard to get motivated to do things that aren’t urgent.  Take today, for example: I owe three dozen people feedback on their outlines of chapters for a book, and have 35 student biographies to edit, but since neither needs to be done right now, I’m going to write a blog entry, tidy up my desk, and wait for something on my calendar to start smoldering.  Adrenaline is a very dangerous drug…

Uncategorized

The Last of September’s Reading

October 17th, 2006
Comments Off

In the years leading up to the First World War, French military doctrine held that the élan of their troops—their superior fighting spirit—was guaranteed to win the day. Never mind the machine guns; what mattered most was courage.

We all know what happened next.

I was reminded of this history lesson a few weeks ago when two books landed in my inbox within a few hours of one another. The first, from the Chicago-based web development firm 37Signals, is called Getting Real. Here’s a quote:

Getting Real is about skipping all the stuff that represents real (charts, graphs, boxes, arrows, schematics, wireframes, etc.) and actually building the real thing.

Here’s another:

There’s nothing more toxic to productivity than a meeting… They usually convey an abysmally small amount of information per minute…often contain at least one moron that inevitably gets his turn to waste everyone’s time with nonsense…[and] frequently have agendas so vague nobody is really sure what they are about.

This is what Steve Yegge, in his blog post “Good Agile, Bad Agile“, calls “snake oil”. Do schematics and wireframes actually get in the way of building the real thing? That depends on how complicated your “real thing” is, and (crucially) how willing your customer is to pay you to rewrite—sorry, refactor—your code a jillion times. And meetings are no more likely to be toxic than code is to be unreadable: in both cases, what makes the difference is discipline and professionalism.

My real objection to Getting Real isn’t that it’s an infomercial (again borrowing Yegge’s terminology). My real objection is that the authors don’t back up their claims with evidence. Anecdotes, yes, but if anecdotes were proof, then eating a raw onion before each playoff game would be enough to guarantee your team the trophy. A few people are actually doing rigorous, empirical studies of how effective agile practices are; unfortunately, this book’s approach is to shout, “Over the top!”

The contrast with Steve McConnell’s latest book, Software Estimation: Demystifying the Black Art couldn’t be clearer. McConnell is one of the most reasonable people in the industry today, and has a wikipedic knowledge of the literature on development practices. His contention in this book is that estimation isn’t as difficult as people think, as long as it’s approached like any other engineering problem. Did you keep track of how long it took to build the last system of this kind? If so, those numbers can make your next estimate more accurate. Are you sure that everyone on the team is including the same factors when they give their estimates? If not, the fanciest spreadsheet in the world is going to churn out garbage.

The best part of this book for me is the way McConnell weighs each piece of evidence. Take, for example, his discussion of the COCOMO II estimation formula, which includes almost two dozen different factors calculated from mountains of empirical data collected over many years. As McConnell points out, many of the factors require human judgment, which means that COCOMO II’s output is too easily skewed to be of real practical use. However, it’s a great way to see what relative effect changes in estimates will have, since it takes into account the nonlinear relationships between those factors. McConnell caps this discussion off with a pair of diagrams showing the diseconomies of scale introduced by various factors on projects of different sizes. This “appeal to evidence” is what the snake-oil advocates of UML, agility, and other fads don’t do, but it is what our profession needs most.

This month’s other two books didn’t inspire such strong emotion, though they are both well worth reading. Derby and Larsen’s Agile Retrospectives explains how to figure out what’s going right or wrong in a project, while that project is going on. That last phrase is what makes it special: to paraphrase the authors, incremental process improvement is just as effective as incremental software development.

Much of the book describes particular activities you can use to set the stage, gather data, decide on actions, and so on, which come complete with time estimates and a list of the supplies you’ll need (hint: whiteboard pens and sticky notes). The authors also talk about how to keep such meetings on track, in order to avoid the productivity drain that the authors of Getting Real were so worried about.

Finally, there’s the No Fluff Just Stuff 2006 Anthology, a collection of articles on a wide range of topics from participants in the eponymous developers’ conference. Neal Ford writes about domain-specific languages (DSLs); Stuart Halloway explains the use of aspect-oriented programming (AOP) in the Spring framework; Ian Roughley preaches the gospel of code coverage (amen), and Eitan Suez gives a programmer’s perspective on CSS. As far as I can tell, what ties these articles together is their authors’ passion for their subjects, and the high quality of their writing. I expect every reader will decide for herself which pieces to skim over, and which to dive into, but I’m pretty sure that everyone who picks it up will look forward to the 2007 edition.


37Signals: Getting Real. https://gettingreal.37signals.com (viewed October 17, 2006).Esther Derby and Diana Larsen: Agile Retrospectives. Pragmatic Bookshelf, 2006, 0977616649, 170 pages.

Neal Ford: No Fluff Just Stuff, 2006. Pragmatic Bookshelf, 2006, 0977616665, 240 pages.

Steve McConnell: Software Estimation: Demystifying the Black Art. Microsoft Press, 2006, 0735605351, 308 pages.

Books

CSER, Privacy, Agility, and Games

October 16th, 2006
Comments Off

I spent Sunday at a workshop organized by the Consortium for Software Engineering Research (CSER). The theme was “empirical software engineering”, a subfield that has emerged since the late 1980s whose practitioners focus on studying and evaluating software development methods and tools in systematic, rigorous ways. I went there to try to persuade educators at other Canadian universities to start using DrProject to manage undergraduate programming teams. A few people seemed interested, but what really got the room talking was the ethical issues surrounding the collection and publication of data on how students actually use tools like DrProject.

At one end, Ontario’s new privacy law says that information can only be used for the purposes for which it was originally collected. If interpreted strictly, this would disallow studies like the one we did two years ago, in which we tried to find patterns in students’ use of CVS that correlated with the grades on assignments, since we did not obtain explicit permission from those students to publish our analysis of their data. At the other end, there are dozens of papers at the SIGCSE conference every year in which educators present data on students grades. I can’t speak for all of them, but I’m pretty sure that most haven’t asked for their students’ permission.

Janice Singer (National Research Council), Peggy Storey (University of Victoria), and Steve Easterbrook (University of Toronto) are writing a book chapter on the ethics of doing software engineering studies. I’m looking forward to it, and will blog when it appears.

The other big topic yesterday was the mechanics of actually doing empirical studies. Coincidentally, two articles landed on my screen this morning: one from DanC singing the praises of agile development in the gaming industry, and an update on Steve Yegge’s piece on good agile vs. bad agile (which I covered a couple of weeks ago). There are lots of strong opinions in both, but no actual data; what I took away from yesterday’s workshop is that it is possible to study these issues, instead of just arguing about them, and that our profession would be a lot better off if we did that more often.

Research

Barry Warsaw on debugging Python’s memory usage

October 13th, 2006
Comments Off

Barry has been part of the Python development team for yonks; this article is the first of two (or more) that describe how Python is supposed to manage memory, and how to find out what’s actually going on. Good debugging tips for systems geeks…

Python