Author Archives: Greg Wilson

Problems with Pandoc

People have been asking me to write the Software Carpentry instructor’s guide in Markdown instead of HTML, mostly so that it will be easier for other people to review and contribute. I was initially against the idea because standard Markdown lacks so many features that I’d basically be writing HTML with back quotes instead of <code> tags, but it turns out that Pandoc’s variation on Markdown provides a lot of what I want—a lot, but not all. After converting the section on databases, I’ve come up against the following:

  1. Pandoc won’t number figures and insert those numbers in references. I can do this by inserting \label{…}’s and \ref{…}’s if my target format is LaTeX, but I want HTML.
  2. There’s no way to attach a CSS class to a table. I can do this to a heading by writing:
    ### Heading Title {.some-style}

    but the curly-brace syntax doesn’t work with tables. I want to do this so that I can display the output of SQL queries as HTML tables, but style them in a particular way.

  3. I can’t put styling information inside a pre-formatted code block. I frequently want to show a snippet of code, then show it again with some minor changes highlighted. Using HTML, I do this as:
    the original <span class="highlight">and the changes</span>

    but if I do that inside an indented (pre-formatted) block, it’s rendered literally.

  4. Most of the major sections in each chapter open with instructors’ notes. As pure HTML, this looks like:
    <section>
      <h2>Section Heading</h2>
    
      <div class="guide">
        <h3>For Instructors</h3>
        <p>A hundred lines of guidance go here...</p>
      </div>
    
      <p>...and several hundred lines of lesson go here.</p>
    </section>

    Now, if I use the --section-divs flag, Pandoc will guess where sections begin and end based on uses of h1, h2, etc., and wrap them in div‘s (which is good). However, if I give it this:

    ## Section Heading
    
    ### For Instructors
    
    A hundred lines of guidance go here...
    
    ...and several hundred lines of lesson go here.

    it (quite naturally) guesses wrong about where the div‘s should go. For now, I’m putting the instructors’ material in a quotation block, but I’d rather do this properly if I can.

I can get around all of these problems by writing raw HTML instead of Markdown, but the result isn’t any more readable to me than pure HTML. I’d welcome other suggestions, or offers of help.

Heroes

All that is necessary for evil to triumph is for good men to do nothing.
— Edmund Burke

These men chose to do something. If anyone deserves to be called heroes, they do.

edward-snowden Bradley Manning
Edward Snowden Bradley Manning

Is There Only Room for One Utopia?

The title of Samuel Moyn’s The Last Utopia: Human Rights in History is misleading. It isn’t really a history of human rights; instead, it’s an outline of how human rights rose to prominence as a defining political issue in the years after World War II. But it doesn’t really do that either: all too often, Moyn alludes to people and events, rather than describing or explaining them, so his argument he’s trying to make is sketched rather than drawn.

And he definitely is making an argument. In a nutshell, he believes that human rights have become important because other utopias have failed. The great socialist experiments of the 20th Century either imploded or (in the Chinese case) transformed themselves into free-market capitalism, whose promises of opportunity and prosperity are now a hollow joke. Moyn thinks that freedom of speech, freedom of worship, and basic human dignity aren’t better ideas than those—they’re just the only ones left standing.

Which makes me wonder: is this why most techies shrug off concerns about surveillance and censorship? Is it because their utopia—their Singularity—doesn’t leave room for others?

Merging is the Real Revolution

Many people at Mozilla think that Javascript and HTML5 are the future of the web. Respectfully, I think they’re both red herrings: I think what makes Mozilla and other successful open source projects work is older, less exciting, and still only kind of works. It’s called “merge”, and if we really want to help people collaborate on a global scale, we ought to put a lot more effort into making it easy to use.

What’s merge? It’s what you do after a “diff”. What’s diff? It’s something that shows you the differences between two files in a human-readable way. More specifically, suppose that you and I are both working on a program. We’re sitting in front of different machines, trying to fix different bugs or add different features, and it just so happens that we both need to change graphics.java. After we’ve both made our changes, the world looks like this:

Simultaneous Editing

At this point, we need to combine our changes. We could scroll through two copies of the file side by side, copying edits from one to the other, but we’d almost certainly miss something or make a mistake. What we should do is use a program like diff to highlight the changes for us. Or better still, we should use a tool like merge to show your version of the file on the left, mine on the right, and the merge in between:

When we’re done merging, what we have is the best of both worlds—the best of your ideas combined with the best of mine. The biological term for this is recombination, and it’s at least as important to evolution as its more famous cousin, mutation, because it lets good genes (or ideas) cooperate.

Diff and merge make open source possible. They let dozens, hundreds, or thousands of people remix their work—not just take what others have done and build on it, but give back their own changes and ideas to be stirred back into the original for further remixing:

Recombining Ideas

When remixing is hard, open collaboration doesn’t take root [1]. Education is a prime example: at some point in their career, every teacher has picked up someone else’s PowerPoint slides and used it as a starting point for their own lecture on the subject, but hardly anyone ever gives their changes back to the author of the slides they started from. It’s easy to say that’s because remixing isn’t part of educational culture, but there’s a reason it isn’t: PowerPoint decks can’t be diffed and merged [2]. If it takes me an hour to scroll through my slides, comparing them one by one with yours and copying changes back by hand, I’m not going to use what you send me, so you’re not going to send it in the first place [3]. Going back to our biological metaphor, people who can’t merge are stuck in a universe that has mutation but not recombination, and that’s a really inefficient way to improve fitness.

I’m thinking about all of this now because of the IPython Notebook and Mozilla Thimble. They’re both really exciting tools, but neither makes collaboration easy [4]. If I want to merge your changes to a project into my copy, I can’t view them side by side in the browser and pick the pieces I want from each. Instead, I have to merge two JSON files if I’m using the Notebook and—well, I’m not sure what I’d do with Thimble. I could view the differences in the text of the HTML and CSS, but anyone who can do that can build web pages without Thimble in the first place.

More to the point, people shouldn’t have to drop down a cognitive level or two in order to collaborate this way. Lots of graphic design tools can highlight and merge the differences between two photographs; DiffEngineX does it for Excel spreadsheets (though you need a pretty wide screen to use it effectively), and so on. There’s no technical reason we can’t diff and merge all our files; it’s just that programmers mostly work with text, so they haven’t built merging tools for other formats. (And increasingly, I believe they work with text because it’s what they can diff and merge in version control…)

We’re smarter when we work together. It’s more fun, too, so I think tools ought to make collaboration as easy as adding a caption to a picture of a cat:

Captioned Cat

We were collaborating on a global scale before HTML5 and Javascript came along, and I’m confident that we’ll still be doing so ten years from now when they’re both regarded as legacy technologies. If we want kids to hack web pages the way we hack code, we need to make merging as easy as reading email or uploading files to Dropbox. And if we want their teachers to remix each other’s lessons, we need to show a little humility and make our methods work with their files. If we do that for them, they will learn to work the way we do and raise up a generation that thinks open collaboration is normal.

And that, my friends, would be a revolution.


  1. The exception is systems like Wikipedia that have just one copy of the document which everyone edits simultaneously, but like Google Docs and Etherpad, that clearly doesn’t work for programming, slide decks, or other situations in which people want to try different things at the same time.
  2. PowerPoint “merging” tools like these two just concatenate multiple presentations into one, or generate a specialized deck from a template by filling in blanks with names and dates (rather like spam generators).
  3. At this point programmers often say, “Then write your slides Markdown or LaTeX or HTML5 or some other text-based format so that merging is easy,” but that’s like saying, “If you take all the pictures out of your book, it’ll compress much better.” PowerPoint, LibreOffice, Keynote, and other WYSIWYG presentation tools have survived and thrived because they make it easy for people to mix graphics and text however they want, just as they would on a whiteboard. As this blog post shows, it’s a lot harder to do this with text-based tools: I had to switch from my editor to a drawing package to create the diagrams included above, then upload them, and if you ask your browser to search for “Original Version”, it still won’t find that label in either of the diagrams. Given the choice between whiteboarding (which they take for granted) and merging (which they’ve never done before, and whose value they don’t yet understand), almost everyone will choose the former.
  4. More precisely, neither makes asnchronous collaboration easy. TowTruck lets people share dynamic browser sessions in real time, which is really cool, but as noted in [1], that’s a very different model than forking and merging.

Cameron Neylon Speaking in Toronto on May 1, 2013

Network Enabled Scholarship — Reconfiguring Research for the Web
Dr. Cameron Neylon
Director of Open Access Advocacy, Public Library of Science
4:00 p.m., Wednesday, May 1, 2013
Room 205, Bissell Building, University of Toronto, 140 St George St

The web, like all network technologies before it from the mobile phone to writing itself, has the potential to enable a qualitative change in our capacity as people, organizations and societies. We are starting to see the first glimmerings of how our research capacity might change with projects like Galaxy Zoo and Polymath but these remain isolated examples. What will it take to exploit the network capacity that the web brings us to enable a step change in the efficiency and effectiveness of our research?

This seminar will be the first in a series highlighting new opportunities to network knowledge through application of knowledge media design values and methodologies.

A Software Carpentry Boot Camp for Women in Science and Engineering

Software Carpentry is pleased to announced a two-day software skills boot camp for women in science and engineering, to be held in Boston this June. We’re currently trying to raise the $6000 needed to give 120 grad students (and others) a chance to improve their research computing skills while networking with peers; donations would be very welcome.

Why a boot camp specifically aimed at women? Because a large body of research has shown that without initiatives like this, the cycle of low participation today leading to low participation tomorrow will continue unchecked. For example, WiT reports:

In the Bayer Facts of Science Education XIV survey, women and minorities raised a number of barriers in their path to STEM careers, including:

  1. Lack of mentors (50%)
  2. Lack of role models (49%)
  3. Stereotypes adversely affecting women and minorities (39%)
  4. Lack of communication from STEM industry (39%)
  5. Self doubt (35%)
  6. Cost of education (31%)
  7. “Sense of isolation” (29%)
  8. A lack of solid math and science education in poorer schools (24%)

Issues like the lack of role models, lack of mentors, stereotypes, and a sense of isolation are effectively addressed by getting a bunch of women together in one room. We’re not just presenting the Software Carpentry material, we are also creating a community of women who will support each other in tangible and intangible ways. If you would like to learn more, one of the most thorough and most readable pieces of research in this area remains Margolis and Fisher’s Unlocking the Clubhouse, which reports their work in the late 1990s and early 2000s at Carnegie-Mellon.

Congratulations to Christian Muise

U of Toronto PhD student Christian Muise created an application that was selected as a winner of Google’s Places API Challenge. The competition brought together 87 developers from 27 countries, challenging them to build apps that address some of the most pressing needs in our communities. The top three applications were declared winners, and will be showcased at Google I/O, the company’s annual developer conference.

Muise’s award-winning app, TTC Pass, is “a website that allows for collaborative editing of the locations for purchasing various transit fares” in Toronto. See a video of the app, or the Google Places API Developer Challenge webpage.