Re: remote duplicate rows - Mailing list pgsql-general

From Ron Johnson
Subject Re: remote duplicate rows
Date
Msg-id 4508C3ED.6050805@cox.net
Whole thread Raw
In response to Re: remote duplicate rows  (ljb <ljb220@mindspring.com>)
List pgsql-general
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 09/13/06 19:36, ljb wrote:
> junkone1@gmail.com wrote:
>> hI
>> i have a bad situation that i did not have primary key. so i have a
>> table like this
>> colname1                colname2
>> 1                                 apple
>> 1                                 apple
>> 2                                  orange
>> 2                                   orange
>>
>> It is a very large table. how do i remove the duplctes quickly annd
>> without much change.

Since the 2 colname1 == 2 records are different (extra spaces in
colname2), how do you determine which is the correct record?  (Or is
the extra space just an artifact?)

> Make a new table (with a primary key) and the same columns in order,
> and do: INSERT INTO newtable SELECT DISTINCT * FROM oldtable;

That's a single transaction, and since this is a "very large table",
it would be very unpleasant if it rolled back at 95%.

Of course, we don't know what junkone1's definition of "very large"
is and how beefy his hardware is...

If there are OIDs on the table, you could write a script with this
pseudocode, which because of the "candidate key table", transaction
block and LIMIT TO, allows the script to be restated.  Niceties like
 printing timestamp and a counter after every commit are always helpful.

CREATE TABLE BIGTABLE_PK (
COLNAME1   INTEGER);

INSERT INTO BIGTABLE_PK
    SELECT DISTINCT COLNAME1
    FROM BIGTABLE;

CREATE INDEX I_BIGTABLE_PK
ON BIGTABLE_PK (COLNAME1)
TYPE IS SORTED;

DECLARE LOOP_FLAG INTEGER = 1;
WHILE LOOP_FLAG DO
    BEGIN TRANSACTION
    FOR :X AS EACH ROW OF
        SELECT COLNAME1
        FROM BIGTABLE_PK
        ORDER BY COLNAME1
        LIMIT TO 2000 ROWS
    DO
        DELETE FROM BIGTABLE
        WHERE OID IN (SELECT OID
                      FROM BIGTABLE_PK
                      WHERE COLNAME1 = :X.COLNAME1
                      LIMIT TO 1 ROWS);
        DELETE FROM BIGTABLE_PK
        WHERE COLNAME1 = :X.COLNAME1;
    END FOR;
    IF (SELECT COUNT(*) FROM BIGTABLE_PK) == 0 THEN
        SET LOOP_FLAG = 0;
    END IF;
    COMMIT;
END ;

- --
Ron Johnson, Jr.
Jefferson LA  USA

Is "common sense" really valid?
For example, it is "common sense" to white-power racists that
whites are superior to blacks, and that those with brown skins
are mud people.
However, that "common sense" is obviously wrong.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFFCMPtS9HxQb37XmcRArYMAKCGEK7ft1PLprnHtpjsLYlgs4t5gACdEksT
JY42ieEmRvehOsuU/o6YFR8=
=MJhV
-----END PGP SIGNATURE-----

pgsql-general by date:

Previous
From: "J S B"
Date:
Subject: Re: berkley sockets
Next
From: Simon_Kelly@moh.govt.nz
Date:
Subject: Re: Issue with order by for type varchar