Years ago, I read a book called Powers of Ten, which showed the structure of the physical universe from the subatomic level up to, well, everything, really, in 10X jumps. This map from National Geographic uses the same trick to go from our Solar System to the universe as a whole. I’m now wondering: has anyone ever produced anything similar for computer hardware, going from transistors to VLSI to chips on a board and on up to the Internet as a whole? It’d be cool artwork to hang on a wall…
Monthly Archives: May 2005
Navigating Source
One of the minor items on my to-do list is to replace the default Trac graphics on the Argon web site with ones that include some reference to the University of Toronto and our hippo mascot. Having spent half an hour on Friday talking to a colleague about how we teach students how to write code, but not how to read it, I thought I’d describe how I figured out what files I’d need to change, and where to find them.
Step 1: what am I looking for? If you take a look at the Argon site, you’ll see two logos: a large banner at the top of the page, and a smaller “Trac Powered” logo at the bottom. “View Source” shows me that the top one is produced by the following HTML:
<div id="header"> <a id="logo" xhref="http://trac.edgewall.com/" mce_href="http://trac.edgewall.com/" ><img xsrc="/trac-static/trac_banner.png" mce_src="/trac-static/trac_banner.png" width="236" height="73" alt="Trac" /></a> <hr /> </div>
The footer is similar.
Step 2: That <div id="header"> tag looks like a good landmark. I find it like this:
$ cd ~/argon/trunk/trac $ fgrep -e 'id="header"' templates/*.cs templates/header.cs:
Notice that I’m relying here on what I’ve already learned about Trac: I know that the Python code generates pages based on ClearSilver templates, which live in the templates directory, and have a .cs extension. If I didn’t know that, I’d have run this command instead:
$ cd ~/argon/trunk/trac $ fgrep -e 'id="header"' `find . -type f -print | fgrep -v -e '.svn'`
(I pipe the results of find through fgrep to filter out everything in the .svn directories that Subversion uses to manage metadata.)
Step 3: take a closer look at the contents of header.cs:
<div id="header"> <a id="logo" xhref="<?cs var:header_logo.link ?>"><img xsrc="<?cs var:header_logo.src ?>" width="<?cs var:header_logo.width ?>" height="<?cs var:header_logo.height ?>" alt="<?cs var:header_logo.alt ?>" /></a> <hr /> </div>
The notation var:header_logo.link tells ClearSilver to look in the HDF (Hierarchical Data Format) structure that the Python CGI builds up for it, find the header_logo key, and use the value associated with its link subkey. Do another search:
$ fgrep header_logo `find . -name '*.py' -print` ./trac/db_default.py: ('header_logo', 'link', 'http://trac.edgewall.com/'), ./trac/db_default.py: ('header_logo', 'src', 'trac_banner.png'), ./trac/db_default.py: ('header_logo', 'alt', 'Trac'), ./trac/db_default.py: ('header_logo', 'width', '236'), ./trac/db_default.py: ('header_logo', 'height', '73'), ./trac/web/chrome.py: logo_src = self.config.get('header_logo', 'src') ./trac/web/chrome.py: req.hdf['header_logo'] = { ./trac/web/chrome.py: 'link': self.config.get('header_logo', 'link'), ./trac/web/chrome.py: 'alt': escape(self.config.get('header_logo', 'alt')), ./trac/web/chrome.py: 'width': self.config.get('header_logo', 'width'), ./trac/web/chrome.py: 'height': self.config.get('header_logo', 'height')
The first set of hits, from db_default.py, are from the code that builds up a tuple of tuples called default_config:
default_config = (('trac', 'htdocs_location', '/trac/'), ... ('header_logo', 'link', 'http://trac.edgewall.com/'), ('header_logo', 'src', 'trac_banner.png'), ('header_logo', 'alt', 'Trac'), ('header_logo', 'width', '236'), ('header_logo', 'height', '73'), ... )
Make a note of that, then look in chrome.py at the second set of hits, which come from the following block of code:
logo_src = self.config.get('header_logo', 'src') logo_src_abs = logo_src.startswith('http://') or logo_src.startswith('https://') if not logo_src[0] == '/' and not logo_src_abs: logo_src = htdocs_location + logo_src req.hdf['header_logo'] = { 'link': self.config.get('header_logo', 'link'), 'alt': escape(self.config.get('header_logo', 'alt')), 'src': logo_src, 'src_abs': logo_src_abs, 'width': self.config.get('header_logo', 'width'), 'height': self.config.get('header_logo', 'height') }
OK, chrome.py is looking in the configuration object that Trac creates each time it services a request, pulling out the values that describe the header logo, and sticking them into the HDF for ClearSilver to use. That leaves two questions: how do values get from db_default.default_config into the configuration object, and where do files like trac_banner.png actually live on disk?
Step 4: grepping for default_config gets two hits in env.py:
def insert_default_data(self): ... for section,name,value in db_default.default_config: self.config.set(section, name, value) self.config.save()
and:
def load_config(self): self.config = Configuration(os.path.join(self.path, 'conf', 'trac.ini')) for section,name,value in db_default.default_config: self.config.setdefault(section, name, value)
Ah ha! The configuration object initializes itself by reading from trac.ini. All we have to do now is find that…
Step 5: Our Trac is run by Apache 2 on Debian Linux, which keeps configuration information under the /etc/apache2/conf.d directory. Sure enough, there’s a trac file in conf.d, which contains the following lines:
Alias /trac-static/ "/usr/share/trac/htdocs/" <Directory "/usr/share/trac/htdocs/"> Options Indexes MultiViews AllowOverride None Order allow,deny Allow from all </Directory> <Directory "/usr/share/trac/cgi-bin"> AllowOverride None Options ExecCGI -MultiViews +SymLinksIfOwnerMatch AddHandler cgi-script .cgi Order allow,deny Allow from all </Directory> <LocationMatch "/trac/[[:alnum:]]+/login"> AuthPAM_Enabled on AuthName "Pyre Trac" AuthType Basic require valid-user </LocationMatch>
Remember looking at the HTML source in Step 1? The image path specified there was /trac-static/trac_banner.png, so only the first stanza matters right now. It specifies that /trac-static is translated into /usr/share/trac/htdocs, and sure enough, that directory contains the PNG images for the two logos. It also contains a trac.ico file, which produces the little pawprint logo next to Trac URLs in the browser and in Favorites links.
And that’s that—the artwork still needs to be done, but at least I know where to put files for testing.
Now, you’ll probably never need to find Trac logo files on your computer, but every time you start work with a new code base, you’ll need to find your way around. If there isn’t a comprehensive, up-to-date map of the code (and there never is), the first thing to do is to find a landmark, like the div element I picked in Step 1. Work backward from that: who touches it? Where does that code get its inputs? Keep notes—I had half a page of them by the time I was done—so that when you hit a roadblock, you can back up and try another path. Anything you already know about the application’s architecture will help steer you in the right direction, so make sure you understand its processing cycle. And above all, don’t panic, and don’t be afraid to ask for help.
MySQL, LiveJournal, and Real-World Web Sites
This talk from the last MySQL conference describes what LiveJournal (a blogging portal) has had to do over the past few years to meet its users’ growing demands. It’s an insightful look at what you have to do in practice to make a workable web.
Schedule Games
Jeff Atwood has created an index to Johanna Rothman’s postings on Schedule Games. Those of you who’ve been building software for a while will recognize at least a few of these. Those of you haven’t—one day, you will.
Recommended Reading
I have updated my recommended reading list; it now includes descriptions (many of them culled from my reviews for Doctor Dobb’s Journal), and cover images that link directly to Amazon. Enjoy…
Later: I’ve added five more books to the recommended reading list:
- Feathers: Working Effectively with Legacy Code
- Levine: Linkers and Loaders
- Margolis and Fisher: Unlocking the Clubhouse
- Skoudis: Malware
- Spinellis: Code Reading
May 15: I’ve added some fiction and general non-fiction entries.
Dr Requirements
We’re about to kick off another summer of work on Hippo, our baby-SourceForge for student use. I’m pretty excited: five good students will be working on it full-time, on five brand new machines (thanks to a donation from the Jonah Group), starting from a freshly-refactored version of Trac that includes Christopher Lenz‘s new component architecture.
The biggest challenge we face this summer may lie in trying to add some sort of requirements management tool to Trac. The reason is the sheer size of the gulf between what textbooks teach, and what developers actually do. For forty years (if not longer), academic researchers have been saying, “Formalize! Formalize!” For just as long, students have nodded, jumped through whatever hoops they had to in order to get a good grade, and then gone out into industry and done something else. In the twenty-three years I’ve been programming for a living, I’ve only twice seen anyone write the kind of specs I see in software engineering textbooks; both times, it was to satisfy a contractual requirement, and both times, the spec gathered dust once the real work started.
There are several possible explanations for this situation. The first is to say that formalization is the right solution—most programmers may be too stupid/too lazy/too conservative to see it, but it will inevitably triumph in the end. Having heard advocates of Scheme and proletarian revolution make the same claims, I’m disinclined to believe any of them any longer.
The second explanation is to say that we’re just not teaching it the right way: we don’t introduce early enough, we don’t explain it well enough, it isn’t integral to other courses (in the way that calculus is in math and physics), and so on. I don’t buy this either, and the reason I don’t is that the people who teach at colleges and universities are, by and large, a pretty smart bunch. If they believed that formalizing requirements would help students with assignments, they’d do it, but how often do you see instructors write use cases, or draw sequence diagrams, as part of assignment handouts in courses on compilers, graphics, or operating systems?
What I think we’re going to try to build this summer (as always, we reserve the right to change our minds) is the requirements equivalent of DrJava. If you haven’t seen it, DrJava is an entry-level IDE for Java programmers; it isn’t nearly as powerful as Eclipse, but it isn’t nearly as intimidating, either. As a result, students are more likely to realize that yes, it does help them. Once that happens, they’re more likely to believe that investing time in something more complex will pay off.
Over the next four months, we’re going to try to figure out what “DrRequirements” would look like, i.e., what should go into (and be left out of) a tool that instructors can use to specify requirements for programming assignments, and/or students can use to specify what their solutions actually do. In order to qualify as a solution, it must be possible for both parties to learn the tool from a single hour-long lecture (which is all we give tools like Subversion and JUnit). More importantly, both sides must see real benefit to themselves from using it: as our experience with bug tracking systems shows, anything that appears “write-only” from a student point of view will only be used under duress.
One possibility (pulled out of thin air) would be something that made bidirectional connections between the assignment spec and students’ test cases, i.e., something that allowed students to say, “Tests A, B, and C are proof that part 3.1 of the assignment has been done,” and to then check which parts of the assignment spec hadn’t been satisfied. We’ve kicked a few other ideas around too, but are still looking for one that has that “Ah ha!” feeling to it. If you have any ideas or pointers, I’m easy to reach.
Crash This Party… Tomorow
MIT is holding a party for time travelers. Their logic is that you only need one, since everyone who qualifies to attend, can.
Misdirection and Javascript
When I was twelve, I spent $3.95 on a book that promised to teach
me how to do magic tricks that would astound my friends. I didn’t
make it past the second chapter (“No way—I have to
practice!?”), but I still remember the way the word
“misdirection” was set in bold face every time it appeared. The key
to making a trick work, the book said, was to get the audience to
focus their attention on something else. That way, by the time they
realized the trick was happening, the hard part would already be
over.
A similar effect seems to have played a key role in the success of
some of today’s biggest software technologies. Unix, DOS, Perl, the
web—they all just kind of grew while the grownups were worrying
about something else, until one day, everyone turned around and said,
“Hey, this is huge!”
So, as a follow-on to last week’s post about you and
your research, here’s another idea that I think has at least a
fighting chance of going through that same cycle of stealthy growth
followed by overnight success. I think there’s at least an even money
chance that Perl, Python, and Ruby will all turn out to have been
also-rans, and that the dynamic language that eventually succeeds in
going mainstream will be (wait for it) Javascript. Here’s why:
- It has a clean, C-like syntax, and a very conventional imperative
programming model, so there are no immediate obstacles to
adoption. - It offers everything that have dynamic languages popular,
including free typing, first-class everything, garbage collection, and
a rich set of built-in tools. - Thanks to IE, Firefox, and Safari, it’s available everywhere;
thanks to XMLHttpRequest,
it can now deliver everything that Java applets were supposed to back
in the 1990s. Google Maps is
the most famous example of the AJAX (Asynchronous Javascript And XML)
architecture that XMLHttpRequest permits, but many others are starting
to appear. - Most importantly, anyone who wants to build a professional-looking
web site these days has to learn it, which means that
hundreds of thousands of programmers are using it every day. (I’m
willing to bet that more people are writing Javascript at this instant
than are writing Perl, Python, Ruby, and Tcl put together.)
Of course, there’s a ton of things missing: I wouldn’t use
Javascript for command-line data
crunching1, for example, since it
lacks the thousand and one libraries for LDAP, database connectivity,
process control, and what have you that are the real key to those
other languages’ power. It also lacks IDE support (although projects
like jseditor are
already addressing this.) Set those against its ubiquity, though, and
they seem like small change.
Javascript has one other thing going for it, at least in my eyes:
it may be the first widely-used language to include direct syntactic
support for XML, via E4X.
Whether you like XML or not, it’s as much a part of the modern web as
HTTP. Ubergeeks might scoff and say, “You can do all that with
libraries,” but my guess is that any language that treats XML as a
first-class native data type is going to look awfully attractive to
the other 95% of programmers.
(For more information about Javascript, see Mozilla’s Javascript page, which
has links to open source implementations in both Java and C.)
1Shameless plug.