How to do line-by-line comparison of files in Linux using diff command

If you are a Linux user, and your job involves working on various Linux distributions, there may be times when you'll find yourself typing commands on a Linux system with no GUI. This means you'll no longer be able to access your favorite GUI applications - for example, say, Gedit for file editing - that you usually use for your regular work.

Irrespective of whether you are a system admin or a developer, file comparison is a task that's common to almost everybody's work. What if you need to compare two files while working on a CLI-only Linux system? Your favorite GUI-based comparison tool will obviously won't be there at your disposal. Needless to say, you'll have to make do with a command line utility to get your job done.

In Linux, the diff command can be used to compare two files, but the thing is that there's a slight learning curve involved with this utility. If you don't know how diff works, and are looking for quick tutorial to get started, look no further, as in this article, we will discuss the basics of this command along with some easy to understand examples.

Before we proceed, keep in mind that all the examples in this tutorial have been tested on Ubuntu 14.04 with Bash version 4.3.11(1) and diff version 3.3.

 

Linux Diff Command

Instead of directly jumping on to examples, it's good to know a bit about the command first. The man page of the diff command reveals that the tool compares files line by line. Its syntax is:

diff [OPTION]... FILES

While [OPTION] represents the various command line options the tool offers, FILES is usually a couple of file names. Although the diff man page contains useful information about the command, the full documentation for diff is maintained as a Texinfo manual. If the info and diff programs are properly installed at your site, the command

info diff

should give you access to the complete manual.

 

Diff Usage/Examples

Now let's discuss how diff is used. For this, let's begin with a basic example. Suppose following are the two files that we want to compare:

file1:

test
test2
test3

file2:

test
test23
test3

Here's how you can use the diff command to compare these two files:

diff file1 file2

And here's the output the above command produces:

2c2
< test2
---
> test23

The output seems cryptic, right? We'll come to it in a bit. Let's first understand the basic structure of the output that diff produces in general.

The first thing to keep in mind is that the output represents the changes required to transform file1 (usually the original file) into file2 (the new or changed file). The output usually consists of lines that begin with a number (or a range) followed by an alphabet (a,d, or c) and another number (or range). For example 2c2 (from the output above).

The first number represents the line (or range of lines) from file1 (original file), while the last number represents the line (or range of lines) from file2 (the new file). As for the alphabet in between, a represents added, d is for deleted, and c represents changed.

So, 2c2 means second line in the original file has changed and needs to be replaced with the second line from the new file in order to make the files same. If you manually compare the two files (file1 and file2) then you'll see that's exactly the case. 

As for the three lines that follow 2c2 in the aforementioned example, the one that starts with '<' is nothing but the second line from file1, and the one that begins with '>' is nothing but the line in question from file 2. The three hyphens in between them (---) are just for separation purposes.

Is there any need to revisit the output of the first example now? Most likely not as the above explanation should have made it self explanatory. Now, let's take another example.

So, here's file1:

Hi all,
This is a diff command tutorial
from HowtoForge.
Hope you'll benefit from it.
Thanks.

Here's file2:

Hi all,
Welcome to HowtoForge.
In this tutorial, we'll discuss the diff tool.
Hope you'll find it beneficial.
Thanks.

Here's the command (which remains the same):

diff file1 file2

And here's the output:

2,4c2,4
< This is a diff command tutorial
< from HowtoForge.
< Hope you'll benefit from it.
---
> Welcome to HowtoForge.
> In this tutorial, we'll discuss the diff tool.
> Hope you'll find it beneficial.

So you can see that in this case, the main output - 2,4c2,4 - consists of multiple numbers both before and after the alphabet. These are basically ranges - 2,4 represents line number 2,3, and 4. So the output means that line numbers 2 to 4 in the original file (file1) have changed, and need to be replaced by line numbers 2 to 4 from file2 in order to make the files same. 

Moving on, let's change the contents of the files a bit. While the file1 remains the same, file2 now becomes:

Welcome to HowtoForge.
In this tutorial, we'll discuss the diff tool.
Hope you'll find it beneficial.
Thanks.

This is a diff command tutorial
from HowtoForge.
Hope you'll benefit from it.
Thanks.

Now, if you run the diff command, the following output will be produced:

0a1,5
> Welcome to HowtoForge.
> In this tutorial, we'll discuss the diff tool.
> Hope you'll find it beneficial.
> Thanks.
>

So you can see that the tool immediately recognized that the second paragraph in file2 is nothing but the what all file1 contains. So the output says that line 1 to 5 from file2 should be appended at the beginning of file1 to make the two files same.

And if you delete the last line ("Thanks.") from file2, here's the output:

0a1,5
> Welcome to HowtoForge.
> In this tutorial, we'll discuss the diff tool.
> Hope you'll find it beneficial.
> Thanks.
>
4d8
< Thanks.

You can see that the output now also contains 4d8, which means that the fourth line in file1 should be deleted in order to make both files in sync beginning at line number 8. Of course, this is after you address the 0a1,5 change that's mentioned first.

 

Conclusion

Agreed, the output of the diff command isn't easy to comprehend, but the learning curve isn't that steep. Spend a couple of hours with the tool, and you'll surely get comfortable with it. As for the tutorial, we've just scratched the surface here. Take a look at the command's man page, and you'll realize that there's much more to learn about diff, something which we'll do in the next part of this tutorial series.

Share this page:

Suggested articles

3 Comment(s)

Add comment

Comments

From: Pete at: 2016-12-29 03:03:32

diff has a place. Mainly for use in scripts.

sdiff is good for humans on a server. side-by-side diff.

$ sdiff file1 file2 | less

For GUI users, check out meld. It is like tkdiff in the old days, without all the dependencies.

From: Herbert Meyer at: 2016-12-29 11:33:37

Discussing diff without mentioning its evil twin, patch, ignores why the output of diff is so simplistic and cryptic.

From: PJ at: 2016-12-31 10:22:12

I know that diff has its uses but for eyeballing changes to a config file during an installation I think it compares poorly with with something that gives a side-by-side view of differences. I normally use Beyond Compare from Scootersoftware.com (has a linux version) that is probably my nr 1 utility. Meld I'd come across but somehow I missed sdiff. Article worth reading for that comment!