Re: Filtering duplicated row with a trigger - Mailing list pgsql-novice
From | Partha Sur |
---|---|
Subject | Re: Filtering duplicated row with a trigger |
Date | |
Msg-id | 003b01c38c69$d624da60$3ead590c@welcome Whole thread Raw |
In response to | Filtering duplicated row with a trigger (papapep <papapep@gmx.net>) |
List | pgsql-novice |
Hello, If you wish to "clean" a table of duplicates the following SQL will work. It is used widely in the Oracle world - and I just tried it successfully on my Red Hat Linux 9 PostgreSQL 7.3.2 database. If you have a table T with columns C1, C2, ... Cn and C1, C2, ... Cr are the candidate keys for a concatenated primary key (r <= n), then if T currently contains duplicates (obviously with with no primary key constraint defined on it - otherwise such a table with duplicate rows could not exist) then to remove the duplicates so that only tuples with unique candidate key values remain (i.e. cleaned) run the following delete SQL which uses a correlated subquery : delete from T t1 where oid < (select max (oid) from T t2 where t1.C1 = t2.C1 and t1.C2 = t2.C2 ... and t1.Cr = t2.Cr) ; This is based on the fact that though rows in a table may be duplicate they must always have unique oid numbers. Then with the above delete statement, only the row with the maximum value for oid for a particular set of duplicate rows (for a particular value of C1||C2...||Cr) will remain. In the above SQL if you use instead: where oid > (select min (oid) ... then only the row with the minimum value of oid will remain ... So \copy to a temp table (with no primary key constraints defined) and then apply SQL similar to above to clean the loaded temp table which can then be used to load the target production table with no duplicates. Regards, Partha Sur > ----- Original Message ----- > From: "papapep" <papapep@gmx.net> > To: "pgsql-novice" <pgsql-novice@postgresql.org> > Sent: Monday, October 06, 2003 12:30 PM > Subject: [NOVICE] Filtering duplicated row with a trigger > > > > -----BEGIN PGP SIGNED MESSAGE----- > > Hash: SHA1 > > > > I've got plenty of data files (prepared to be inserted with the \copy > > statement) but I have to filter them to be sure that there are no > > duplicated rows inserted. > > I know I should do it with a trigger that executes a function before > > inserting the row and if it's duplicated do something with it (insert it > > in another table, simply forget it, etc...). The theory is clear :-) > > But the practice is not so clear (for me, of course). > > Anyone can give me some guide to how the function should do the control > > of duplicated rows? > > > > Thanks. > > > > Josep Sànchez > > [papapep] > > -----BEGIN PGP SIGNATURE----- > > Version: GnuPG v1.2.1 (MingW32) > > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org > > > > iD8DBQE/gZik2vx52x0kyz4RApbZAKCrhvCywbNH8Zce0xpfDhMNQBfQ+ACfShG6 > > 96nY7di8KnV8gJrcWIOzqLI= > > =32il > > -----END PGP SIGNATURE----- > > > > > > > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 4: Don't 'kill -9' the postmaster >
pgsql-novice by date: