View Full Version : Bash - Deleting duplicate records
Wire323
4th December 2005, 04:06
I have a text file full of user-submitted email addresses. I want to remove the duplicate records, but it isn't as simple as using "uniq." When I find a dupe I want to remove both of them, not just one. If it's possible I'd also like to create a text file containing all of the email addresses that had duplicates.
Is this possible?
Thanks
Wire323
4th December 2005, 04:51
I've changed things slightly. Instead of removing them completely I'd like to leave on, and only take the dupes out. I know I can do that with uniq, but how would I know which ones were taken out so I can write them to a file?
Wire323
4th December 2005, 06:56
I don't know if this was the best way, but I was able to do it like this:
sort participants | uniq > temp1
sort participants > temp2
comm -1 -3 temp1 temp2 > temp3
sort temp3 | uniq > outputfile
falko
4th December 2005, 12:37
I don't know if this was the best way
If it works it's ok! ;)
muha
8th March 2006, 14:26
An old post but heh, thought i might add a bit:
To show only unique lines from <file>:
$ uniq file
To show only the non-unique lines once:
$ uniq -d file
If the lines are not ordered yet. So remove non-consequtive duplicate lines spread out through the file:
$ sort file| uniq
vBulletin® v3.8.4, Copyright ©2000-2009, Jelsoft Enterprises Ltd.