Re: Filtering duplicated row with a trigger - Mailing list pgsql-novice

From Partha Sur
Subject Re: Filtering duplicated row with a trigger
Date
Msg-id 003b01c38c69$d624da60$3ead590c@welcome
Whole thread Raw
In response to Filtering duplicated row with a trigger  (papapep <papapep@gmx.net>)
List pgsql-novice
Hello,

 If you wish to "clean" a table of duplicates the following SQL will work.
It  is used widely in the Oracle world - and I just tried it successfully on
my Red Hat Linux 9 PostgreSQL 7.3.2 database.

 If you have a table T with columns C1, C2, ... Cn and C1, C2, ... Cr are
the
candidate keys for a concatenated primary key (r <= n), then if T currently
contains duplicates (obviously with with no primary key constraint defined
on it - otherwise such a table with duplicate rows could not exist) then to
remove the duplicates so that only tuples with unique candidate key values
remain (i.e. cleaned) run the following delete SQL which uses a correlated
subquery :

delete from T  t1
where oid < (select max (oid)
                     from T  t2
                     where t1.C1 = t2.C1
                     and     t1.C2 = t2.C2
                      ...
                     and     t1.Cr  = t2.Cr) ;

This is based on the fact that though rows in a table may be duplicate they
must always have unique oid numbers.

Then with the above delete statement, only the row with the maximum value
for oid  for a particular set of duplicate rows (for a particular value of
C1||C2...||Cr) will remain.
In the above SQL if you use instead:

 where oid > (select min (oid)  ...

 then only the row with the minimum value of oid will remain ...

 So \copy to a temp table (with no primary key constraints defined) and then
apply SQL similar to above to clean the loaded temp table which can then be
used to load the target production table with no duplicates.

 Regards,

 Partha Sur

> ----- Original Message -----
> From: "papapep" <papapep@gmx.net>
> To: "pgsql-novice" <pgsql-novice@postgresql.org>
> Sent: Monday, October 06, 2003 12:30 PM
> Subject: [NOVICE] Filtering duplicated row with a trigger
>
>
> > -----BEGIN PGP SIGNED MESSAGE-----
> > Hash: SHA1
> >
> > I've got plenty of data files (prepared to be inserted with the \copy
> > statement) but I have to filter them to be sure that there are no
> > duplicated rows inserted.
> > I know I should do it with a trigger that executes a function before
> > inserting the row and if it's duplicated do something with it (insert it
> > in another table, simply forget it, etc...). The theory is clear :-)
> > But the practice is not so clear (for me, of course).
> > Anyone can give me some guide to how the function should do the control
> > of duplicated rows?
> >
> > Thanks.
> >
> > Josep Sànchez
> > [papapep]
> > -----BEGIN PGP SIGNATURE-----
> > Version: GnuPG v1.2.1 (MingW32)
> > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
> >
> > iD8DBQE/gZik2vx52x0kyz4RApbZAKCrhvCywbNH8Zce0xpfDhMNQBfQ+ACfShG6
> > 96nY7di8KnV8gJrcWIOzqLI=
> > =32il
> > -----END PGP SIGNATURE-----
> >
> >
> >
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 4: Don't 'kill -9' the postmaster
>



pgsql-novice by date:

Previous
From: Bruno Wolff III
Date:
Subject: Re: [personal] Re: Filtering duplicated row with a trigger
Next
From: Joe Conway
Date:
Subject: Re: Getting "connectby" installed