I just finished reorganizing my database of former project students, and now it looks like I have to re-jig the schema once again. The reason is that South Indians use a Unix-like path for personal names: village/father’s-name/name. Thus, the “C.” in “C. V. Raman” identifies a place, “V.” identifies his father, and “Raman” is his personal name.
Eurocentric library indices aren’t set up to handle this, which is why this particular physicist’s work is filed under what he (and millions of others) would consider to be his “first” name. Getting this kind of thing right is increasingly important in a flat world (as well as simply being polite). So, anyone have a sample database schema out there that will handle non-Euro names properly that I can use as a model?
More papers, most from the journal Computer Science Education (which has a lot of nebulous meta-meta content, but a few gems):
- Gal-Ezer, Vilner, and Zur: “Teaching Algorithm Efficiency at CS1 Level”. CSE, 14(), 2004. Describes a way to introduce big-O notation early: on each assignment, students are asked (a) what does this code do, (b) what’s its big-O efficiency, and (c) write a function which performs the same task, but is more big-O efficient. The examples are simple, but the idea is intriguing. Though the authors don’t draw attention to it, I particularly like the idea of giving the students working (if inefficient) code that they can use as an oracle when testing their (more efficient) solutions.
- Ginat: “On Novice Loop Boundaries and Range Conceptions”. CSE, 14(3), 2004. Looks at the mistakes beginning programmers make with loops, and why. This ought to be the first step toward fixing both the way we teach iteration, and the iteration constructs we put in our languages; it’s a shame so few language designers do it.
- Wolfe: “Why the Rhetoric of CS Programming Assignments Matters”. CSE, 14(2), 2004. The prize of the bunch, this study examined how changes in the wording of assignments can affect student interest and satisfaction. The short answer: the more “relevant” the background to the assignment seems, the more engaged students will be.
- Ala-Mutka: “A Survey of Automated Assessment Approaches for Programming Assignments”. CSE, 15(2), 2005. Describes tools and approaches for auto-marking, from static style checkers to dynamic run-compare-and-profile engines. Lots of good ideas; even better is the fact that the thing Igor Foox is going to build this fall doesn’t seem to have been invented elsewhere.
- Pike, Dorward, Griesemer, and Quinlan: “Interpreting the Data: Parallel Analysis with Sawzall”. Scientific Programming Journal, 13(4) (PDF). Describes a domain specific language (DSL) built by Google to do record-by-record data crunching. I understand the motivation; I just wonder when people will admit that they’re reinventing Haskell piece by piece?
The idea has been around for years: buy CPU cycles in bulk, just as you buy watts, and let someone else worry about how it all happens. Amazon’s EC2 (reviewed here by Jon Udell, and here by TechCrunch) brings it one step closer to reality. How long before someone starts referring to this as “Web 3.0″?
Two articles that help put squabbles over Python web programming frameworks (and just about everything else) into perspective:
- Haroon Siddiqui’s “The Muslim Malaise”, from last weekend’s star. The first couple of paragraphs almost made me throw it away; I’m glad I stuck with it to find what how a sensible, liberal, devout Muslim thinks is wrong with the world.
- Bruce Schneier’s “What the Terrorists Want”, which was originally published by Wired.com. As he points out, the aim of terrorism is not to kill people—it’s to inspire terror. By that measure, Western governments are doing exactly what the terrorists want.
The folks at Enthought (sponsors of SciPy) were kind enough to set up a Trac so that I could manage development of the Software Carpentry course. Unfortunately, spammers have figured out how to bomb Trac: over a dozen tickets relating to gay porn, online casinos, and the like have been filed, and there are literally dozens of comments (undeletable) along the same lines on the useful tickets. I could have prevented this by not giving anonymous users the ability to file tickets, but requiring people to register in order to give feedback on the course notes would greatly reduce the amount of feedback I got.
I don’t have an answer to this, but we’re going to have to come up with one for DrProject. We’re also going to have to come up with a better way to manage user accounts. Right now, DrP requires people to have accounts on the underlying Unix system. That makes sense for classroom use, but not for “open” projects — I’ve had to request guest accounts so that people outside the university can be on the DrP development mailing list, for examlpe, and that doesn’t scale to dozens of contributors. I do not want to add user account management, password checking, and the like to DrP: it’s a lot of work to do properly, a security hole when done improperly, and synchronizing it all with Subversion would be just one more thing that could go wrong. If you have ideas, I’m easy to find…
Another good article from Jon Udell, this one on debugging. As far as I can tell, the topic is wide open as a research area: if you want to do graduate work in systems, coming up with better ways to track down and repair problems would be a high-yield topic to focus on.
I realize that not everyone is an evolution geek, but this is just wonderful: social behavior in spiders has been observed for the very first time.
In response to comments and emails over the last few days saying, “I don’t know why you are so obsessed with having just one Python web framework — different people have different needs, competition spurs everyone to do better, and anyway, the technical issues aren’t settled enough yet to pick a winner,” I’d like to say, “Bah.” My argument comes down to this:
- Number of books on Rails: 12 (in print or in the works, and those are just the ones I know of). Number of books on TurboGears, Django, Pylon, and all the other Python web frameworks put together: 0. (I’m not counting John’s book on network programming, or the Twisted book.)
- Number of Rails Pub Nights and other gatherings (including the Rails Conference): over 20, based on a quick google and some guesses. Number of attendees (i.e., potential collaborators, employers, or employees): hundreds. Equivalent numbers for Python’s fragmented frameworks: less.
This is not about Python vs. Ruby: it’s about our obligation as developers to give maximum value to our customers. As long as Pythoneers’ efforts are divided between [pick a random number] different frameworks, none of them will be as mature or reliable as Rails, which means that developers using Python will be taking longer to accomplish less. Competition hasn’t led to any of “our” frameworks surpassing Rails to date; there’s no reason to believe that will change, so picking one and making it competitive is, in my opinion, the only defensible course of action.
Now, back to marking…
I’ve been rationalizing my database of CSC49X project participants, and want to update the map that appears on this site’s home page. I’m going to have to do this every three or four months, so I’d like a simple Python script that takes a bunch of latitudes and longitudes, and gives me back a map with those locations mapped (bonus marks if I can also specify the colors of the markers, to differentiate people by year or role). I know Google Maps and other services will let me do this, but I don’t have a couple of hours to spend trawling through the results of a search for “google maps python” or the like. If you have something that’ll solve my problem, I’d be grateful for a ping.
And this gives me an excuse to make yet another suggestion to the good folks at Google. I can already use “define: patacriticism” to search for word definitions. How come I can’t use “python: map API” to search for the words “map” and “API” in the documentation for Python code? I’m told by reliable sources that it would be easy to implement; it’d be a very useful thing to have on the intranet search box of any software development shop, and most importantly, it’d give open source developers an incentive to start documenting their code a little better. Right now, the payoff for doing so is fuzzy and remote; if people could google APIs that exactly, well, I don’t think it could hurt…
The Department of Computer Science at the University of Toronto will be hosting an industry showcase from 4-6 p.m. on Tuesday, September 5, to give local software companies doing leading-edge work a chance to show off what they’re doing to incoming graduate students, build bridges, make connections, etc. Groups that have already confirmed attendance include:
And of course, we’ll be heading to a pub afterward…