Thread: [GENERAL] Delete Duplicates with Using

[GENERAL] Delete Duplicates with Using

From
"Igal @ Lucee.org"
Date:

Hello,

I run the SQL query below to delete duplicates from a table.  The subquery is used to identify the duplicated rows (row_num is a BIGSERIAL column).

/** delete older copies of duplicates */
DELETE FROM table_with_duplicates AS T
WHERE row_num IN (
    SELECT     T1.row_num
    FROM    table_with_duplicates  AS T1
        JOIN table_with_duplicates AS T2
            ON         T1.column_1 = T2.column_1
                AND T1.column_2 = T2.column_2
                AND T1.column_3 = T2.column_3
                AND T1.row_num < T2.row_num
);

Can anyone tell me how to rewrite that query to use the USING clause and hopefully remove the subquery?

The documentation mentions USING but there is no example and the only examples I found online are very trivial.

Thanks,

Igal Sapir
Lucee Core Developer
Lucee.org

Re: [GENERAL] Delete Duplicates with Using

From
legrand legrand
Date:
DELETE FROM table_with_duplicates AS T1 USING table_with_duplicates AS T2
WHERE        T1.column_1 = T2.column_1       AND T1.column_2 = T2.column_2       AND T1.column_3 = T2.column_3
ANDT1.row_num < T2.row_num
 




--
Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Delete Duplicates with Using

From
"Igal @ Lucee.org"
Date:
On 10/14/2017 12:32 AM, legrand legrand wrote:
> DELETE FROM table_with_duplicates AS T1 USING table_with_duplicates AS T2
> WHERE
>          T1.column_1 = T2.column_1
>          AND T1.column_2 = T2.column_2
>          AND T1.column_3 = T2.column_3
>          AND T1.row_num < T2.row_num

Thank you, I actually thought about that at first but it seemed "too 
easy" and I was looking for some solution with JOIN.

Anyway, this seems to work great.

Thanks,


Igal


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Delete Duplicates with Using

From
Andreas Kretschmer
Date:

Am 14.10.2017 um 08:20 schrieb Igal @ Lucee.org:
>
> Hello,
>
> I run the SQL query below to delete duplicates from a table. The 
> subquery is used to identify the duplicated rows (row_num is a 
> BIGSERIAL column).
>

other solution, using the CTID-column: (rows with (1,1,1) and (5,5,5) 
are identical)

test=*# select * from dubletten ; c1 | c2 | c3
----+----+----  1 |  1 |  1  1 |  1 |  1  1 |  2 |  3  2 |  3 |  4  3 |  4 |  5  4 |  5 |  5  5 |  5 |  5  5 |  5 |  5
(8 Zeilen)

test=*# with keep as (select max(ctid) as ctid from dubletten group by 
c1,c2,c3) delete from dubletten where ctid not in (select ctid from keep);;
DELETE 2
test=*# select * from dubletten ; c1 | c2 | c3
----+----+----  1 |  1 |  1  1 |  2 |  3  2 |  3 |  4  3 |  4 |  5  4 |  5 |  5  5 |  5 |  5
(6 Zeilen)

test=*#

Regards, Andreas

-- 
2ndQuadrant - The PostgreSQL Support Company.
www.2ndQuadrant.com



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Delete Duplicates with Using

From
"Igal @ Lucee.org"
Date:
Andreas,

On 10/15/2017 11:53 PM, Andreas Kretschmer wrote:
other solution, using the CTID-column: (rows with (1,1,1) and (5,5,5) are identical)

test=*# select * from dubletten ;
 c1 | c2 | c3
----+----+----
  1 |  1 |  1
  1 |  1 |  1
  1 |  2 |  3
  2 |  3 |  4
  3 |  4 |  5
  4 |  5 |  5
  5 |  5 |  5
  5 |  5 |  5
(8 Zeilen)

test=*# with keep as (select max(ctid) as ctid from dubletten group by c1,c2,c3) delete from dubletten where ctid not in (select ctid from keep);;
DELETE 2
test=*# select * from dubletten ;
 c1 | c2 | c3
----+----+----
  1 |  1 |  1
  1 |  2 |  3
  2 |  3 |  4
  3 |  4 |  5
  4 |  5 |  5
  5 |  5 |  5
(6 Zeilen)

test=*#

Regards, Andreas


I like this solution, but would using a subquery be much slower than the implicit join of `using`?  My tables are not big in Postgres-standards, so it's probably not an issue, but I'm trying to learn as much as I can about Postgres now that I'm getting ready to move it to production.

Thanks,

Igal Sapir
Lucee Core Developer
Lucee.org

Re: [GENERAL] Delete Duplicates with Using

From
"Igal @ Lucee.org"
Date:
FYI,

On 10/16/2017 8:58 AM, Igal @ Lucee.org wrote:


test=*# with keep as (select max(ctid) as ctid from dubletten group by c1,c2,c3) delete from dubletten where ctid not in (select ctid from keep);


I like this solution, but would using a subquery be much slower than the implicit join of `using`?  My tables are not big in Postgres-standards, so it's probably not an issue, but I'm trying to learn as much as I can about Postgres now that I'm getting ready to move it to production.

I was able to compare the performance on a table with about 350k rows, with duplicates over 7 columns and no indices.

The GROUP BY solution with the subquery (though I was using a simpler version of it without a CTE), was taking over 30 seconds so I killed the request.

I then ran the USING version which completed in 16 seconds and deleted 39 rows.

Best,

Igal Sapir
Lucee Core Developer
Lucee.org