Recent Articles



































Diff



         


In computing, diff is a Unix utility that outputs the difference between two text files. The output of this program is also called a diff. It is invoked from the command line with the name of two files:

$ diff firstone.txt secondone.txt

The first editions of the program were designed for line comparisons in text files. By the 1980s, support for binary files was necessary resulting in a shift in the application's design.

In unified format, each line that occurs only in the first file is preceded by a minus sign, each line that occurs only in the second file is preceded by a plus sign, and common lines are preceded by a space.

Lines beginning with three plus signs indicate the number of lines in each hunk, the file names, and where in the files to find them. Diffs are often used as input to the patch program.


[Top]

History

The diff program was developed in the early 1970s on the Unix operating system which was emerging from AT&T Bell Labs in Murray Hill, New Jersey. The final version incorporated into these early Unix systems was written completely by Douglas McIlroy. This research was published in a 1976 paper which he co-wrote with James W. Hunt who also wrote one of the initial prototypes of diff.

McIlroy's work was preceded and influenced by Steve Johnson's comparison program on GECOS and Mike Lesk's proof program--which like diff--also originated on Unix. Proof produced line-by-line changes like diff and even used angle-brackets (">" and "<") for presenting line insertions and deletions in the program's output. The heuristics these applications used were deemed unreliable though. The potential usefulness of a diff tool provoked McIlroy into researching and designing a more robust tool that could be used in a variety of tasks but perform well in the processing and space limitations of the PDP-11's hardware. His approach was also a result from collaboration with individuals at Bell Labs including Alfred Aho, Elliot Pinson, Jeffrey Ullman, and Harold S. Stone.

In the context of Unix, the use of ed provided diff with the natural ability to create machine-usable "edit scripts". These edit scripts when saved to a file can, along with the original file, be reconsitituted by ed into the modified file in its entirety. This greatly reduced the space necessary to maintain multiple versions of a file. McIlroy considered writing a post-processor for diff where a variety of output formats could be designed and implemented, but he found it more frugal and simpler to have diff generate the syntax and reverse-order input targeted for ed.

Most all diff implementations remain outwardly unchanged since 1975 with but improvements to the core algorithm, the addition of useful features and designing of new output formats. Postprocessors sdiff and diffmk rendered side-by-side diff listings and applied change marks to printed documents, respectively, and were developed elsewhere in Bell Labs in or before 1981. The Berkeley distribution of Unix made a point of adding the context format (-C) and recursion on filesystem directory structures (-r). The GNU Project's diff application includes the unified context format and is combined in a package with other diff and patch related utilities.

In diff's early years, common uses included comparing changes in computer language source code, source to technical documents, program debugging output, filesystem listings and computer assembly code. The output targeted for ed was purposely added to allow compression of a sequence of modifications to a file. Emerging in the late 1970s, the Source Code Control System (SCCS) was a direct consequence of this development. The context format of diff introduced at Berkeley helped in distributing patches that would be applied to code that possibly may have been modified. Paul Jensen, faced with consolidating changes from two branches of a program's source code tree proposed diff3, which persists for most as a curiosity, but is still useful to those finding themselves in Jensen's situation.

[Top]

Free Software Implementations

The GNU Project has an implementation of diff (and diff3), in the package.

The Windows utility is based on the GNU diffutils engine, as a way to provide a graphical display of the same information. There are other such utilities, for various platforms.

[Top]

References






  View Live Article   This article is from Wikipedia. All text is available under the terms of the GNU Free Documentation License