Yes, We *Can* Design Languages for Human Beings

Lambda the Ultimate recently had a nice summary of a paper titled, “Is Transactional Programming Actually Easier?” In it, Rossbach, Hofmann, and Witchel report a study in which 147 undergrads in an operating systems course solved problems using traditional concurrency control mechanisms and newfangled memory transactions. The result? Students reported that transactions were harder to use, but actually had fewer errors in their synchronization code when using them.

At the risking of sounding like a curmudgeon (yes, David, I’m looking at you), why the hell don’t we do more of this? Why don’t we apply usability testing techniques to programming language features?  It’s easy to do in small cases, and as Microsoft’s Steven Clarke discusses in his chapter in Making Software, when done systematically, it can make programmers’ lives a lot better.

3 thoughts on “Yes, We *Can* Design Languages for Human Beings

  1. Zak

    I think experimental validation of programming language features is a dangerous idea. Empirical studies are done with small groups on small projects over short time periods. Relying on these studies could potentially bias languages towards the sort of features that work well for these small projects, but not so well on larger ones. For example, I would expect static type systems to compare unfavourably on smaller systems, but to be a net win for large systems. To pick a less controversial example, if you make the project small enough, any sort of testing will almost certainly result in a net loss of productivity.

    In short: there’s no reason to believe that results on small case studies generalize to large software projects when it comes to programming language features, and empirical studies could do more harm than good by biasing researchers to work on features that work well on small examples.

  2. Greg Wilson Post author

    @Zak Not all empirical studies are small-N/short-term — several of the ones reported in “Making Software” (which should be on the stands by Christmas) have run for over a decade, and have involved several hundred programmers. I’d also hope (I was going to type “expect”, but I’m too old and cynical for that) that researchers would look at scaling as well, i.e., validate assumptions about what works in the small still working in the large. Psychologists do it; why can’t we?

  3. Zak

    It may be perfectly possible to do large, well-conducted empirical studies. But is that the rule, or the exception? Conducting a decade long study is just not a viable option for researchers seeking tenured positions. Nor is it really a feasible way of weeding out new PL features: in 10 years, are the same features still going to be relevant? If we start to demand empirical validation of PL features, I would be shocked if these massive studies just started coming out of the woodwork. The much more likely scenerio, I believe, is small studies over short time periods, with all the problems that causes.

    Psychology is a field where we know for sure that some things don’t generalize from small sample sizes to large ones. One example of this is manic depressive disorder, which is routinely misdiagnosed as depression. The way a person acts during a one hour session doesn’t necessarily reflect how they act all the time. So if we’re taking lessons from psychology, I’d like us to take this one: some problems fundamentally cannot be studied in-the-small and generalized to real life.

Comments are closed.