Bash - Deleting duplicate records

Discussion in 'Programming/Scripts' started by Wire323, Dec 4, 2005.

  1. Wire323

    Wire323 New Member

    I have a text file full of user-submitted email addresses. I want to remove the duplicate records, but it isn't as simple as using "uniq." When I find a dupe I want to remove both of them, not just one. If it's possible I'd also like to create a text file containing all of the email addresses that had duplicates.

    Is this possible?

  2. Wire323

    Wire323 New Member

    I've changed things slightly. Instead of removing them completely I'd like to leave on, and only take the dupes out. I know I can do that with uniq, but how would I know which ones were taken out so I can write them to a file?
  3. Wire323

    Wire323 New Member

    I don't know if this was the best way, but I was able to do it like this:

    sort participants | uniq > temp1
    sort participants > temp2
    comm -1 -3 temp1 temp2 > temp3
    sort temp3 | uniq > outputfile
  4. falko

    falko Super Moderator Howtoforge Staff

    If it works it's ok! ;)
  5. muha

    muha New Member

    An old post but heh, thought i might add a bit:
    To show only unique lines from <file>:
    $ uniq file
    To show only the non-unique lines once:
    $ uniq -d file
    If the lines are not ordered yet. So remove non-consequtive duplicate lines spread out through the file:
    $ sort file| uniq
    Last edited: Mar 8, 2006

Share This Page