A better diff

Posted May 27, 2005

This is an interesting idea. I, along with many people who work a lot with computers, tend to have files distributed all over the place: desktop machines, laptops, remote servers, PDAs, school or work accounts, webmail, etc. Keeping them all up to date and synchronized is a problem both in terms of practical software development and in fundamental computer science, i.e. "What is the most efficient way to communicate what has changed in a file?"

The trivial solution is to just copy the new file. Much saner is to use a utility such as diff to compare the two versions line-by-line and transfer only the changes. However, the Georgia Tech team found a better method: record the operations that the user performed on the file (insert paragraph, change font size, etc.) and have the remote machine just mimic those same actions on the old file, bringing it up to date. Since the bandwidth-limiting piece of the puzzle is a human being at a keyboard, the amount of data you have to transfer is typically a lot less than you can manage by analyzing the files produced, and it scales vastly better in applications that work with complicated binary formats (coughWordcough) or large multimedia datasets. "Gaussian blur, 1.2 pixel radius" is a much more compact description than what diff would output from comparing "before" and "after" Photoshop files.

Although I don't believe they considered this aspect of the solution, recording user actions like this maps very well toward merging the classic "undo" functionality of applications with a full version-control system. If you have all these user-action diffs anyway, you might as well keep them around and expose the ability to revert to previous document states by just selectively replaying them.