<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: What the World Needs Now Is Diffs, Diffs, Diffs</title>
	<atom:link href="http://third-bit.com/blog/archives/462.html/feed" rel="self" type="application/rss+xml" />
	<link>http://third-bit.com/blog/archives/462.html</link>
	<description>Data is ones and zeroes &#124; Software is ones and zeroes and hard work.</description>
	<lastBuildDate>Fri, 03 Feb 2012 17:07:30 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
	<item>
		<title>By: David Warde-Farley</title>
		<link>http://third-bit.com/blog/archives/462.html#comment-411</link>
		<dc:creator>David Warde-Farley</dc:creator>
		<pubDate>Sat, 22 Apr 2006 07:31:31 +0000</pubDate>
		<guid isPermaLink="false">http://pyre.third-bit.com/blog/?p=462#comment-411</guid>
		<description>Being able to put stuff under version control and actually read/interpret the diff output is one of the reasons I stopped using word processors and switched every non-trivial document I create or edit to LaTeX (I know that MS Word has some sort of revision history metadata voodoo in its file format now, but storing the revision history with the document itself has proven to be an embarassingly bad idea - just ask the British government).

I&#039;m not sure why more programmers don&#039;t use LaTeX -- it seems like a natural fit for that segment of the population, looks a lot nicer when typeset, and has the aforementioned benefit of being CVS/SVNable, not to mention segmented across multiple files, just like regular source code. Perhaps the word processor is such a ubiquitous cultural element that by the time most people realize there are alternatives, they&#039;re unwilling to put the effort into switching.

Storing things in a sparse, structural markup like LaTeX doesn&#039;t completely solve the problem you pose regarding semantically insignificant edits, but barring some sort of semantic meta-markup to be superimposed on top of everything we do, the problem seems rather intractable. I look forward to someone much smarter than myself proving me wrong. ;)</description>
		<content:encoded><![CDATA[<p>Being able to put stuff under version control and actually read/interpret the diff output is one of the reasons I stopped using word processors and switched every non-trivial document I create or edit to LaTeX (I know that MS Word has some sort of revision history metadata voodoo in its file format now, but storing the revision history with the document itself has proven to be an embarassingly bad idea &#8211; just ask the British government).</p>
<p>I&#8217;m not sure why more programmers don&#8217;t use LaTeX &#8212; it seems like a natural fit for that segment of the population, looks a lot nicer when typeset, and has the aforementioned benefit of being CVS/SVNable, not to mention segmented across multiple files, just like regular source code. Perhaps the word processor is such a ubiquitous cultural element that by the time most people realize there are alternatives, they&#8217;re unwilling to put the effort into switching.</p>
<p>Storing things in a sparse, structural markup like LaTeX doesn&#8217;t completely solve the problem you pose regarding semantically insignificant edits, but barring some sort of semantic meta-markup to be superimposed on top of everything we do, the problem seems rather intractable. I look forward to someone much smarter than myself proving me wrong. <img src='http://third-bit.com/blog/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> </p>
]]></content:encoded>
	</item>
	<item>
		<title>By: michael bernstein</title>
		<link>http://third-bit.com/blog/archives/462.html#comment-410</link>
		<dc:creator>michael bernstein</dc:creator>
		<pubDate>Fri, 21 Apr 2006 16:16:13 +0000</pubDate>
		<guid isPermaLink="false">http://pyre.third-bit.com/blog/?p=462#comment-410</guid>
		<description>The hard part isn&#039;t the diff itself, it&#039;s figuring out what the most meaningful linear tokenization of the source files is, and ensuring that the files are normalized in a way that does not obscure the semantics.

So, for your table example, the source files must first be parsed into a DOM tree and then re-serialized with a standard indentation and attribute order before diff will work, but after that it works very well at highlighting the semantic change, since the normalization ensures that the meaningful unit of change (a line) is tokenized identically between the two versions.</description>
		<content:encoded><![CDATA[<p>The hard part isn&#8217;t the diff itself, it&#8217;s figuring out what the most meaningful linear tokenization of the source files is, and ensuring that the files are normalized in a way that does not obscure the semantics.</p>
<p>So, for your table example, the source files must first be parsed into a DOM tree and then re-serialized with a standard indentation and attribute order before diff will work, but after that it works very well at highlighting the semantic change, since the normalization ensures that the meaningful unit of change (a line) is tokenized identically between the two versions.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Seo Sanghyeon</title>
		<link>http://third-bit.com/blog/archives/462.html#comment-409</link>
		<dc:creator>Seo Sanghyeon</dc:creator>
		<pubDate>Fri, 21 Apr 2006 07:16:17 +0000</pubDate>
		<guid isPermaLink="false">http://pyre.third-bit.com/blog/?p=462#comment-409</guid>
		<description>Have a look at this:

SSDDiff  a diff for semistructured data
ssddiff.alioth.debian.org

The author of the above tool also wrote this interesting thesis: &quot;Stucture-Preserving Difference Search in Semistructured Data&quot;. It has comparisons against Logilab xmldiff (written in Python!) which shows how his algorithm improves the result.</description>
		<content:encoded><![CDATA[<p>Have a look at this:</p>
<p>SSDDiff  a diff for semistructured data<br />
ssddiff.alioth.debian.org</p>
<p>The author of the above tool also wrote this interesting thesis: &#8220;Stucture-Preserving Difference Search in Semistructured Data&#8221;. It has comparisons against Logilab xmldiff (written in Python!) which shows how his algorithm improves the result.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

