Please keep messages copied to the list.
On Mon, Oct 06, 2003 at 19:38:46 +0200,
papapep <papapep@gmx.net> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Bruno Wolff III wrote:
> | And what do want to happen when you run accross a duplicate row?
> | Do you just want to discard tuples with a duplicate primary key?
>
> Initially I wanted to do so. But after I thought to "register" them in a
> dups table.
OK.
>
> | If you are discarding duplicates, do you care which of the duplicates
> | is discarded?
>
> That's why I said that perhaps it should be a good thing to keep them in
> a duplicated-rows table, for reviewing them.
>
> | If you want to combine data from the duplicates, do you have a precise
> | description of what you want to happen?
>
> No, I do not need to combine data from the duplicated. The entire row is
> accepted or not.
If you initially don't care which dup gets inserted, then the select DISTINCT
ON suggestion will work for loading the real table.
Something like:
insert into real_table select distinct on (pk1, pk2, pk3, pk4, pk5) * from
temp_table;
To see the sets of duplicates you can do something like:
select * from temp_table, (select pk1, pk2, pk3, pk4, pk5 from temp_table
group by pk1, pk2, pk3, pk4, pk5 having count(*) > 1) as dups where
temp_table.pk1 = dups.pk1 and
temp_table.pk2 = dups.pk2 and
temp_table.pk3 = dups.pk3 and
temp_table.pk4 = dups.pk4 and
temp_table.pk5 = dups.pk5;
If there are a large number of records being imported, you probably want
to create an multicolumn index on pk1, pk2, pk3, pk4, and pk5 on the
temp table after loading it and before doing the query for duplicates.