Re: Help speeding up delete - Mailing list pgsql-performance

From Magnus Hagander
Subject Re: Help speeding up delete
Date
Msg-id 6BCB9D8A16AC4241919521715F4D8BCE6C7BC4@algol.sollentuna.se
Whole thread Raw
In response to Help speeding up delete  (Steve Wampler <swampler@noao.edu>)
Responses Re: Help speeding up delete
List pgsql-performance
> Because I think we need to.  The above would only delete rows
> that have name = 'obsid' and value = 'oid080505'.  We need to
> delete all rows that have the same ids as those rows.
> However, from what you note, I bet we could do:
>
>    DELETE FROM "tmp_table2" WHERE id IN
>       (SELECT id FROM "temp_table2" WHERE name = 'obsid' and
> value= 'oid080505');
>
> However, even that seems to have a much higher cost than I'd expect:
>
>    lab.devel.configdb=# explain delete from "tmp_table2" where id in
>         (select id from tmp_table2 where name='obsid' and
> value = 'oid080505');
>    NOTICE:  QUERY PLAN:
>
>    Seq Scan on tmp_table2  (cost=0.00..65705177237.26
> rows=769844 width=6)
>      SubPlan
>        ->  Materialize  (cost=42674.32..42674.32 rows=38 width=50)
>              ->  Seq Scan on tmp_table2  (cost=0.00..42674.32
> rows=38 width=50)
>
>    EXPLAIN
>
> And, sure enough, is taking an extrordinarily long time to
> run (more than 10 minutes so far, compared to < 10seconds for
> the select).  Is this really typical of deletes?  It appears
> (to me) to be the Seq Scan on tmp_table2 that is the killer
> here.  If we put an index on, would it help?  (The user
> claims she tried that and it's EXPLAIN cost went even higher,
> but I haven't checked that...)


Earlier pg versions have always been bad at dealing with IN subqueries.
Try rewriting it as (with fixing any broken syntax, I'm not actually
testing this :P)

DELETE FROM tmp_table2 WHERE EXISTS
 (SELECT * FROM tmp_table2 t2 WHERE t2.id=tmp_table2.id AND
t2.name='obsid' AND t2.value='oid080505')


I assume you do have an index on tmp_table2.id :-) And that it's
non-unique? (If it was unique, the previous simplification of the query
really should've worked..)

Do you also have an index on "name,value" or something like that, so you
get an index scan from it?

//Magnus

pgsql-performance by date:

Previous
From: Claus Guttesen
Date:
Subject: Re: Hardware/OS recommendations for large databases (5TB)
Next
From: "Luke Lonergan"
Date:
Subject: Re: Hardware/OS recommendations for large databases (