March 30, 2010: Formats

As I said in last week's announcement, and mentioned again in a later post, one of the main goals of this rewrite is to make it possible for students to do the course when and where they want to. That means recording audio and video, but much of the material will probably still be textual: code samples (obviously), lecture notes (for those who prefer skimming to viewing, or who want to teach the material locally), and exercises will still be words on a virtual page. And even the AV material will (probably) be accompanied by scripts or transcripts, depending on what turns out to work best.

Which brings up a question everyone working with computers eventually faces: what format(s) should material be stored in? For images, audio, and video, the choices are straightforward: SVG for line drawings, PNG for images, MP3 for audio, and MP4, MPEG, or FLV or video (I don't know enough yet to choose). But there's a bewildering variety of options for text, each with its pros and cons.

  1. Authoring tools: do authors need to use a specialized editor? If so, is it freely available for the three major platforms (Windows, Linux, and Mac)?
  2. Composition: can authors "just type", or do they need to spend a lot of keystrokes on markup?
  3. Diffing and merging: does the format play nicely with version control systems, i.e., if two or more people edit independently, can their changes easily be merged after the fact?
  4. Formatting: does the format allow fine-grained control over layout? (My personal test here is how easy it is to create tables with irregular arrangements of rows and columns.)
  5. Multiple output formats: can HTML pages, slides, PDFs, and what-not all be produced from a single source?
  6. Referencing: does the format take care of section and figure numbering, cross-references, and bibliographic citations automatically?
  7. WYSIWYG: does the raw content have to be compiled or transformed to produce something viewable, or is what you see what you get?

Here are the options as I see them:

Format A C D F M R W Minimum
Microsoft Word -1 +1 -1 +1 -1 +1 +1 -1
OpenOffice 0 +1 -1 +1 -1 +1 +1 -1
DocBook 0 -1 0 0 +1 0 -1 -1
Other XML 0 -1 0 -1 0 -1 -1 -1
Plain Old HTML 0 -1 0 -1 0 -1 +1 -1
S5 and its kin 0 -1 0 -1 0 -1 +1 -1
Wiki text +1 +1 +1 -1 +1 0 -1 -1
LaTeX +1 0 0 +1 0 +1 0 0

I use the minimum in evaluation, rather than the average or total score, because what you notice most when you're working with something is usually what's most annoying about it. Or maybe that's just me... But what do these numbers actually mean? In no particular order:

So, does that mean LaTeX is the right answer? My scoring says I should—what do you think?

< OlderNewer >

This post originally appeared in the Software Carpentry blog.