Never Mind the Content, What About the Format?


I'm still gnawing on the problem of how to construct content for 21st Century learning–or, more prosaically, what I should use to build the next version of Software Carpentry. My starting point is the need to serve several different kinds of users [1], whose descriptions I have moved to a separate post on learners and their needs.

I'm sure some of the above is inconsistent or just plain wrong, but here are my takeaways:

  1. Different people want content in different formats. Yeah, OK, we knew that already, but:
  2. Everybody needs first-class content, in the programming sense of the term. In practice, it means that every kind of content can be copied and pasted without losing its meaning. A bunch of colored pixels in an image that look like letters aren't actually letters; if you copy a region of an image and paste it into a text editor, you don't get the text [2]. Similarly, search engines like Giggle can't "see" code evolving line-by-line in a video, so you can't search for that. Together, I think that point #1 and point #2 imply that:
  3. We need model-view separation in learning content. I apologize for the computerese, but I don't know any other way to say it. A model (more fully, data model) is how information is stored, while a view is how people interact with it. Models should be designed to be easy for computers to work with; views should be designed to meet human needs, and the plural there is important: different people want to interact with information in different ways, and even a single person may want to use different ways at different times. Search engines want the information that's in the model, such as the captions on the boxes in a diagram, not some arbitrary view of it (like a bunch of pixels in a PNG). People usually want that as well when they're remixing, since their goals are to combine that information with information from other sources, and/or to present that information in different ways (i.e., views).
  4. We also need first-class metadata. I haven't been able to find a standard format for summarizing and exchanging lesson objectives, learning dependencies, and everything else needed to stitch individual facts together. The closest thing seems to be SCORM, but I'd rather stick a fork in my eye [3]: it's bloated, it mixes data models with meta-models with presentation layers with everything else its authoring committee could think of, and did I mention the fork? I could provide metadata as data, e.g., put a point-form list at the top of a lesson saying, "Here's what you need to know before tackling this," but that mixes model and view: since it's just a convention, computers will have a hard time stitching things together accurately.
  5. Finally, we need social learning. Even the Zuzels of this world learn best in collaboration with other people: peer learners are often better at understanding and clearing up misconceptions than instructors, and having a "running partner" helps people stay focused and motivated. This isn't really a matter of format, though, but of the tooling used to deliver content, so I'll skip over it below.

OK, so how well do today's tools and/or formats do by these measures? The fact that "PowerPoint" is both a tool and a format is one indication that the answer is going to be, "Not well."

So after all of this, what do I actually want?

  1. I want content stored in HTML5 with purely semantic markup, so that it can be searched, copied and pasted, and styled for presentation in a variety of ways [9].
  2. I want an agreed-upon meta and data-* vocabulary for educational metadata, like dependencies, introduction of key terms, questions and answers, and so on. I want a similar vocabulary for commenting and other social interactions that plays nicely with things like the Salmon protocol.
  3. I want an authoring tool (note the singular there) that lets me:
    1. write and draw WYSIWYG instead of typing in tags and IDs;
    2. freely mix drawings and text; and
    3. manage parallel streams (or channels), so that I can keep slide content, presenter's notes, prose, and translations of all three into other languages together.
  4. I want to be able to animate my drawings and text, which is emphatically not the same as "embed video" (though I may want to do that too). Instead of recording the pixels drawn on the screen as I type Python into an editor, I want to record and play back the text that's being created, so that learners can pause the animation, copy the text, and paste it somewhere else. Equally, instead of painting pixels to fool your eyes into believing that a box just moved off the screen, I want to move the damn box; once again, if you pause the animation, you should be able to click on the box, attach a comment to it, paste it into your own drawing, etc.

Freeling mixing drawings and text feels like it ought to be doable today: we could either put the text in blocks inside a canvas element, or layer a transparent canvas over the page and dynamically resize it. Anchoring drawings to the underlying text (e.g., keeping the arrow from a term to the corresponding bit of the diagram in the right place) is "just" Javascript (for some value of "just"). Making it all WYSIWYG is just more Javascript [10].

But animation… Ah, that's a big one. It's an intrinsically hard problem, but canned effects can do a lot to put simple things within reach [11]. The big question is, how far do we push it? If I want to show you how to use a debugger, or how to draw something with a painting program, I can't re-create the whole UI–I'm going to have to record pixels off a screen.

Or am I? I know this is never going to happen–we're not that organized a species–but just imagine what the world would be like if every interface was built using HTML5 and CSS. Any tool at all could export widget descriptions and a semantic trace of what they did (i.e., "the file menu was pulled down" rather than "the cursor moved to pixel (132,172) and the user clicked"), and any other tool could consume it and play it back. The consuming tool might draw the widgets differently, or display the interactions in its own way, but that would be exactly the same as applying a different skin to the original tool [12].

Returning to this universe for a moment, we can store things as HTML5 right now–I'm already using it for Version 5 of Software Carpentry. I could create a vocabulary for instructional metadata, but I'm not an information architect. WYSIWYG authoring tools for HTML5 abound, though the HTML5 they produce can be idiosyncratic (and doesn't play nicely with version control, but that's fixable). I haven't seen a WYSIWYG tool that supports freehand drawing mixed freely with text, or one that supports parallel content streams, but I think half a dozen people working could deliver something substantial in half a dozen months [13].

As for animation, I think we're stuck with video for now: prototyping an HTML5/SVG/Javascript animation framework for use in a learning tool would be a great research project, but we really do need to build a couple to throw away to find out if it's workable. If you'd lke to tackle it, please let us know–I'd be happy to be your alpha tester.


[1] There was a lot of talk in the 1980s and 1990s about different people having different learning styles, inspired in part on Gardner's theory of multiple intelligences. The idea has mostly been discredited, but like many memes, it lives on in popular culture.

[2] Although I bet someone's working on an Emacs mode to do that…

[3] I've actually done this, so I know whereof I speak.

[4] Except that LaTeX and wiki text require slightly less typing than HTML, but if you're using a smart editor, even that advantage goes away.

[5] Please don't quote Tufte's complaints about PowerPoint at me–I don't think it encourages bad presentations any more than the tangled rules of English spelling and grammar encourage bad writing.

[6] In particular, almost all video content makes life harder for the visually impaired: a screencast in which someone talks over themselves typing in an editor or sketching on a tablet is tantalizing but useless to someone who can't see the pixels. I committed this sin when I created Version 4 of Software Carpentry; I'd like to do better in Version 5, and would like to see high-profile online learning sites make some kind of effort as well.

[7] But wait a second: if video isn't effective, why do MIT Open Courseware and the Khan Academy work so well? The short answer is, they mostly don't: if you take out the 15% of people who can learn almost anything, no matter how it's presented, watching videos and doing drill exercises works less well than other options. The longer answer is, watching a good teacher (and Khan is a great teacher) work through a problem, instead of just presenting the answer, moves the content into the "how to" category that video is well suited to.

[8] Research dating back to the early 1990s shows that higher-quality material improves student retention. I don't know whether it improves it enough to justify its higher production costs, though.

[9] HTML5 will also help with version control, since I expect HTML5-aware diff-and-merge tools to start appearing Real Soon Now. Of course, I've been saying that for almost ten years…

[10] These days, you can wave away almost any technical objection with "it's just more Javascript".

[11] In my mind, the animation interface looks more like Scratch than it does like PowerPoint's menus and dialogs. It definitely doesn't require people to type in code, unless they want to create and share an entirely new kind of animation effect.

[12] We could even call that format XUL

[13] "6×6" is as big a team/timescale as I'm able to contemplate these days.