Text Is Still King
Text is still king—or at least “evil overlord”. Over on his Fastware blog, Scott Meyers (of Effective C++ fame) has been explaining why he’s writing his next book in LaTeX: it’ll handle cross-referencing, can generate every output format he cares about (PDF, HTML, etc.), and plays nicely with version control. The last is (for me) the most important: I’m in several collaborations right now that are using Microsoft Word or OpenOffice for documents, and since version control systems can’t diff or merge concurrent edits to them, we’re having to play “pass the baton”. I posted a plea to the Google Summer of Code mentors’ list yesterday begging people to propose projects that would teach their tools how to play nicely with version control; we’ll see what comes of it.
Merging algorithms require the ability to compare minimal sets of changes in context, and decide which pieces should make it into the committed form of the document.
To teach a tool to play nicely with version control, it’s almost inescapeable that one must teach that tool to store its source documents in a line-oriented human-readable text format.
Which tools are you hoping this will happen to?
It’s awkward collaborating with folks who don’t use version control. If you come up with a good solution, let us know.
As for Word, if it’s stored in Word 2007 format, you can unzip it and diff the XML files inside. Inconvenient, but better than diffing binary Word files. Maybe there’s a tool to diff the files directly. If they don’t have Office 2007, there’s a plug-in to let them save Word 2003 documents in 2007 format.
This interests me too, Greg. I don’t know who you’re collaborating with, but if they’re programmers or scientists I’m seriously surprised they’re not using a text-based format and version control. On the other hand, which text-based format should they be using? Text processing is so flexible there’s no accepted standard. Latex isn’t for everyone, RST takes some learning, and plain text may not be sufficient. (I use generally use Markdown, but that’s just for plain text and HTML rendering.) Like it or not (and I don’t!), everyone can read and write .doc files.
Funnily enough, excellent tools like Tortoise SVN have persuaded non-techies to use version control for Word documents, spreadsheets etc, rather than the awkward “pass the baton” style of email and shared drives. Tortoise SVN _can_ display diffs in Word documents, using Word’s built in diff tool.
Maybe programmers come across as arrogant when we suggest we know better (even if we have a case, especially when it comes to collaborating on a technical book). I wrote up my own experiences of setting up a collaborative document build system. I thought the article might be of interest to the technical writing community, and it did end up republished in a Tech Writers journal, where it generated some interesting private discussions. One thing I learned is that professional technical authors need persuading of the benefits of automatic merging. After all, some programmers are initially skeptical a diff algorithm can merge code changes safely — can an algorithm maintain an editorial voice? The authors have a point. They are adept at working with words, sentences, paragraphs — they like revising and reshaping, _editing_, and they are expert at it. And yes, they may prefer using sophisticated tools like MS Word to do this.
Having used Word for one book, I’ve sworn “never again”. XML Docbook is my current choice (with svn or hg for version control): ad-hoc Python scripts (or any other language I guess;-) to process XML in a zillion different ways are easier to write than ones for LaTeX (or so says my past experience with both).
Alex – was it the backslashes that made LateX harder to manipulate than Docbook with Python? Or was it just that Python already has lots of support for XML?
I agree about Word. I used Docbook and svn for Practical Development Environments (O’Reilly, 2005)
There is oodiff (http://www-verimag.imag.fr/~moy/opendocument/) which has instructions on how to get OpenDocument diffing integrated into a version control system like git or mercurial. I’ve never actually used it, though, and I don’t see it talk anywhere about handling merges.