Re: [personal] Re: Filtering duplicated row with a trigger - Mailing list pgsql-novice

From Bruno Wolff III
Subject Re: [personal] Re: Filtering duplicated row with a trigger
Date
Msg-id 20031006180141.GA28877@wolff.to
Whole thread Raw
In response to Filtering duplicated row with a trigger  (papapep <papapep@gmx.net>)
List pgsql-novice
Please keep messages copied to the list.

On Mon, Oct 06, 2003 at 19:38:46 +0200,
  papapep <papapep@gmx.net> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Bruno Wolff III wrote:
> | And what do want to happen when you run accross a duplicate row?
> | Do you just want to discard tuples with a duplicate primary key?
>
> Initially I wanted to do so. But after I thought to "register" them in a
> dups table.

OK.

>
> | If you are discarding duplicates, do you care which of the duplicates
> | is discarded?
>
> That's why I said that perhaps it should be a good thing to keep them in
> a duplicated-rows table, for reviewing them.
>
> | If you want to combine data from the duplicates, do you have a precise
> | description of what you want to happen?
>
> No, I do not need to combine data from the duplicated. The entire row is
> accepted or not.

If you initially don't care which dup gets inserted, then the select DISTINCT
ON suggestion will work for loading the real table.
Something like:
insert into real_table select distinct on (pk1, pk2, pk3, pk4, pk5) * from
  temp_table;

To see the sets of duplicates you can do something like:
select * from temp_table, (select pk1, pk2, pk3, pk4, pk5 from temp_table
  group by pk1, pk2, pk3, pk4, pk5 having count(*) > 1) as dups where
  temp_table.pk1 = dups.pk1 and
  temp_table.pk2 = dups.pk2 and
  temp_table.pk3 = dups.pk3 and
  temp_table.pk4 = dups.pk4 and
  temp_table.pk5 = dups.pk5;

If there are a large number of records being imported, you probably want
to create an multicolumn index on pk1, pk2, pk3, pk4, and pk5 on the
temp table after loading it and before doing the query for duplicates.

pgsql-novice by date:

Previous
From: Josh Berkus
Date:
Subject: Re: [personal] Re: Filtering duplicated row with a trigger
Next
From: "Partha Sur"
Date:
Subject: Re: Filtering duplicated row with a trigger