Home > mailing lists

Re: improving performance for a delete - Mailing list pgsql-performance

From	PFC
Subject	Re: improving performance for a delete
Date	May 20, 2008 17:47:53
Msg-id	op.ubg0cxr9cigqcu@apollo13.peufeu.com Whole thread
In response to	Re: improving performance for a delete (kevin kempter <kevin@kevinkempterllc.com>)
List	pgsql-performance

Tree view

On Tue, 20 May 2008 22:03:30 +0200, kevin kempter
<kevin@kevinkempterllc.com> wrote:

> Version 8.3.1
>
>
> On May 20, 2008, at 1:51 PM, kevin kempter wrote:
>
>> Hi all;
>>
>> I have 2 tables where I basically want to delete from the first table
>> (seg_id_tmp7) any rows where the entire row already exists in the
>> second table (sl_cd_segment_dim)
>>
>> I have a query that looks like this (and it's slow):
>>
>>
>> delete from seg_id_tmp7
>> where
>>     customer_srcid::text ||

    Besides being slow as hell and not able to use any indexes, the string
concatenation can also yield incorrect results, for instance :

season_name::text || episode_srcid::text

    Will have the same contents for

season_name='season 1' episode_srcid=12
season_name='season 11' episode_srcid=2

    I suggest doing it the right way, one possibility being :

test=> EXPLAIN DELETE from test where (id,value) in (select id,value from
test2);
                                QUERY PLAN
-------------------------------------------------------------------------
  Hash IN Join  (cost=2943.00..6385.99 rows=2 width=6)
    Hash Cond: ((test.id = test2.id) AND (test.value = test2.value))
    ->  Seq Scan on test  (cost=0.00..1442.99 rows=99999 width=14)
    ->  Hash  (cost=1443.00..1443.00 rows=100000 width=8)
          ->  Seq Scan on test2  (cost=0.00..1443.00 rows=100000 width=8)

    Thanks to the hash it is very fast, one seq scan on both tables, instead
of one seq scan PER ROW in your query.

    Another solution would be :

test=> EXPLAIN DELETE FROM test USING test2 WHERE test.id=test2.id AND
test.value=test2.value;
                                QUERY PLAN
-------------------------------------------------------------------------
  Hash Join  (cost=2943.00..6385.99 rows=2 width=6)
    Hash Cond: ((test.id = test2.id) AND (test.value = test2.value))
    ->  Seq Scan on test  (cost=0.00..1442.99 rows=99999 width=14)
    ->  Hash  (cost=1443.00..1443.00 rows=100000 width=8)
          ->  Seq Scan on test2  (cost=0.00..1443.00 rows=100000 width=8)

    Which chooses the same plan here, quite logically, as it is the best one
in this particular case.

pgsql-performance by date:

From: kevin kempter
Date: 20 May 2008, 17:03:40
Subject: Re: improving performance for a delete

From: "Robins Tharakan"
Date: 21 May 2008, 02:52:16
Subject: Varchar pkey instead of integer

Re: improving performance for a delete - Mailing list pgsql-performance

Previous

Next