Sometimes, you may want to combine two files in a way that the output makes even more sense. For example, there could be a file containing name of continents, and another file containing names of countries located in these continents, and the requirement is to combine both files in a way that a continent and the corresponding country appear in the same line.
That's just one example - there could be hundreds of such use-cases. If you are on Linux, and are looking for a tool that can help you in situations line these, you may want to check out join, which is a command line utility. In this tutorial, we will discuss this command using some easy to understand examples.
Please note that all examples mentioned in this article have been tested on Ubuntu 16.04, and the join command version we've used is 8.25.
Linux join command
The join command lets you combine lines of two files on a common field.
join [OPTION]... FILE1 FILE2
Here's what the man page says about this tool:
For each pair of input lines with identical join fields, write a line to standard output. The default
join field is the first, delimited by blanks. When FILE1 or FILE2 (not both) is -, read standard input.
The following examples should give you a good idea about how the join command works.
1. How to combine lines of files using join command?
Let's understand the basic usage of join command. Suppose there are two files (file1 and file2) that contain the following lines:
4. North America:
3. The Netherlands
4. The US
Now, you can combine these two files in the following way:
join file1 file2
Here's the output of the above command in our case:
2. How to make join print unpairable lines?
By default, the join command only prints pairable lines. For example, even if file1 contains an extra field (line number 5):
4. North America:
5. South America:
joining file1 and file2 won't produce any different output:
That's because unpairable lines are left out in the output. However, if you want, you can still have them in the output using the -a command line option. This option requires you to pass a file number so that the tool knows which file you are talking about.
For example, in our case, the command would be:
join file1 file2 -a 1
So you can see that the unpaired line from file number 1 (file1 in our case) was also displayed in the output.
Note that in case you just want to print unpaired lines (meaning, suppress the paired lines in output), you can do this using the -v command line option. This options works exactly the way -a works.
Here's an example of the -v option:
3. How to provide custom join fields?
As we already know, join combines lines of files on a common field, which is the first field by default. However, if you want, you can specify a different field for each file. For example, consider the following contents in file1 and file2, respectively.
* 1. Asia:
& 2. Africa:
@ 3. Europe:
# 4. North America:
# 1. India
@ 2. Nigeria
& 3. The Netherlands
* 4. The US
Now, if you want the second field of each line to be the common field for join, you can tell this to the tool by using the -1 and -2 command line options. While the former represents the first file, the latter refers to the second file. These options requires a numeric argument that refers to the joining field for the corresponding file.
For example, in our case, the command will be:
join -1 2 -2 2 file1 file2
And here's the output of this command:
Note that in case the position of common field is same in both files (like in the example we just discussed, where it's 2), you can replace the part -1 [field] -2 [field] in the command with -j [field]. So in our case, the command would become:
join -j2 file1 file2
4. How to make join operation case-insensitive?
By default, the join command operation is case sensitive. For example, consider the following files:
D. North America:
c. The Netherlands
d. The US
Now, if you try joining these two files, using the default (first) common field, nothing will happen. That's because the case of field elements in both files is different. To make join ignore this case issue, use the -i command line option.
Here's the command for our case:
join -i file1 file2
And the following screenshot shows the command in action:
5. How to make join not check for sorted input?
By default, the join command checks whether or not the supplied input is sorted, and reports if not. For example, consider the following output when the information in file1 was not sorted:
Now, in case you want to this error/warning to go away, you can do so using the --nocheck-order option. Here's the same command, but with this option enabled:
So you can see that the join command didn't check for sorted input this time.
Join may not be a very straight forward tool to understand, but once you get used to it, it could act as a massive time-saver for you in some situations. We've covered most of the command line options here. Try these, and once done, go through the command's man page for the rest.