Thread: Slow Bulk Delete

Slow Bulk Delete

From
thilo
Date:
Hi all!

We moved from MySQL to Postgresql for some of our projects. So far
we're very impressed with the performance (especially INSERTs and
UPDATEs), except for a strange problem with the following bulk delete
query:

DELETE FROM table1 WHERE table2_id = ?

I went through these Wiki pages, trying to solve the problem:

http://wiki.postgresql.org/wiki/SlowQueryQuestions and
http://wiki.postgresql.org/wiki/Performance_Optimization

but unfortunately without much luck.

Our application is doing batch jobs. On every batch run, we must
delete approx. 1M rows in table1 and recreate these entries. The
inserts are very fast, but deletes are not. We cannot make updates,
because there's no identifying property in the objects of table1.

This is what EXPLAIN is telling me:

EXPLAIN ANALYZE DELETE FROM table1 WHERE table2_id = 11242939
                                                         QUERY
PLAN

----------------------------------------------------------------------------------------------------------------------------
 Index Scan using sr_index on table1  (cost=0.00..8.56 rows=4 width=6)
(actual time=0.111..0.154 rows=4 loops=1)
   Index Cond: (table2_id = 11242939)
 Total runtime: 0.421 ms
(3 rows)

This seems to be very fast (using the index), but running this query
from JDBC takes up to 20ms each. For 1M rows this sum up to several
hours. When I have a look at pg_top psql uses most of the time for the
deletes. CPU usage is 100% (for the core used by postgresql). So it
seems that postgresql is doing some sequential scanning or constraint
checks.

This is the table structure:

id    bigint     (primary key)
table2_id    bigint     (foreign key constraint to table 2, *indexed*)
table3_id    bigint     (foreign key constraint to table 3, *indexed*)
some non-referenced text and boolean fields

My server settings (Potgresql 8.4.2):

shared_buffers = 1024MB
effective_cache_size = 2048MB
work_mem = 128MB
wal_buffers = 64MB
checkpoint_segments = 32
checkpoint_timeout = 15min
checkpoint_completion_target = 0.9

It would be very nice to give me a hint to solve the problem. It
drives me crazy ;-)

If you need more details please feel free to ask!

Thanks in advance for your help!

Kind regards

Thilo

Re: Slow Bulk Delete

From
Bob Lunney
Date:
Thilo,

Just a few of thoughts off the top of my head:

1.  If you know the ids of the rows you want to delete beforhand, insert them in a table, then run the delete based on
ajoin with this table. 

2.  Better yet, insert the ids into a table using COPY, then use a join to create a new table with the rows you want to
keepfrom the first table.  Drop the original source table, truncate the id table, rename the copied table and add
indexesand constraints. 

3.  See if you can partition the table somehow so the rows you want to delete are in a single partitioned child table.
Whenits time to delete them just drop the child table. 

Of course, if the 1M rows you need to delete is very small compared to the total overall size of the original table the
firsttwo techniques might now buy you anything, but its worth a try. 

Good luck!

Bob Lunney

--- On Sat, 5/8/10, thilo <thilo.tanner@gmail.com> wrote:

> From: thilo <thilo.tanner@gmail.com>
> Subject: [PERFORM] Slow Bulk Delete
> To: pgsql-performance@postgresql.org
> Date: Saturday, May 8, 2010, 7:39 AM
> Hi all!
>
> We moved from MySQL to Postgresql for some of our projects.
> So far
> we're very impressed with the performance (especially
> INSERTs and
> UPDATEs), except for a strange problem with the following
> bulk delete
> query:
>
> DELETE FROM table1 WHERE table2_id = ?
>
> I went through these Wiki pages, trying to solve the
> problem:
>
> http://wiki.postgresql.org/wiki/SlowQueryQuestions and
> http://wiki.postgresql.org/wiki/Performance_Optimization
>
> but unfortunately without much luck.
>
> Our application is doing batch jobs. On every batch run, we
> must
> delete approx. 1M rows in table1 and recreate these
> entries. The
> inserts are very fast, but deletes are not. We cannot make
> updates,
> because there's no identifying property in the objects of
> table1.
>
> This is what EXPLAIN is telling me:
>
> EXPLAIN ANALYZE DELETE FROM table1 WHERE table2_id =
> 11242939
>                
>                
>                
>          QUERY
> PLAN
>
----------------------------------------------------------------------------------------------------------------------------
>  Index Scan using sr_index on table1  (cost=0.00..8.56
> rows=4 width=6)
> (actual time=0.111..0.154 rows=4 loops=1)
>    Index Cond: (table2_id = 11242939)
>  Total runtime: 0.421 ms
> (3 rows)
>
> This seems to be very fast (using the index), but running
> this query
> from JDBC takes up to 20ms each. For 1M rows this sum up to
> several
> hours. When I have a look at pg_top psql uses most of the
> time for the
> deletes. CPU usage is 100% (for the core used by
> postgresql). So it
> seems that postgresql is doing some sequential scanning or
> constraint
> checks.
>
> This is the table structure:
>
> id   
> bigint     (primary key)
> table2_id   
> bigint     (foreign key constraint
> to table 2, *indexed*)
> table3_id   
> bigint     (foreign key constraint
> to table 3, *indexed*)
> some non-referenced text and boolean fields
>
> My server settings (Potgresql 8.4.2):
>
> shared_buffers = 1024MB
> effective_cache_size = 2048MB
> work_mem = 128MB
> wal_buffers = 64MB
> checkpoint_segments = 32
> checkpoint_timeout = 15min
> checkpoint_completion_target = 0.9
>
> It would be very nice to give me a hint to solve the
> problem. It
> drives me crazy ;-)
>
> If you need more details please feel free to ask!
>
> Thanks in advance for your help!
>
> Kind regards
>
> Thilo
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>




Re: Slow Bulk Delete

From
"Pierre C"
Date:
> DELETE FROM table1 WHERE table2_id = ?

For bulk deletes, try :

DELETE FROM table1 WHERE table2_id IN (list of a few thousands ids)

- or use a JOIN delete with a virtual VALUES table
- or fill a temp table with ids and use a JOIN DELETE

This will save cliet/server roundtrips.

Now, something that can make a DELETE very slow is a non-indexed ON DELETE
CASCADE foreign key : when you DELETE FROM table1 and it cascades to a
DELETE on table2, and you forget the index on table2. Also check the time
spent in triggers. Do you have a GIN index ?

Re: Slow Bulk Delete

From
Jon Nelson
Date:
On Mon, May 17, 2010 at 5:10 AM, Pierre C <lists@peufeu.com> wrote:
> - or use a JOIN delete with a virtual VALUES table
> - or fill a temp table with ids and use a JOIN DELETE

What is a virtual VALUES table? Can you give me an example of using a
virtual table with selects, joins, and also deletes?

--
Jon

Re: Slow Bulk Delete

From
Віталій Тимчишин
Date:


2010/5/17 Jon Nelson <jnelson+pgsql@jamponi.net>
On Mon, May 17, 2010 at 5:10 AM, Pierre C <lists@peufeu.com> wrote:
> - or use a JOIN delete with a virtual VALUES table
> - or fill a temp table with ids and use a JOIN DELETE

What is a virtual VALUES table? Can you give me an example of using a
virtual table with selects, joins, and also deletes?



delete from a using (values (1),(2),(5),(8)) b(x) where a.id=b.x


--
Best regards,
Vitalii Tymchyshyn

Re: Slow Bulk Delete

From
Grzegorz Jaśkiewicz
Date:
On Mon, May 17, 2010 at 12:54 PM, Jon Nelson <jnelson+pgsql@jamponi.net> wrote:
> On Mon, May 17, 2010 at 5:10 AM, Pierre C <lists@peufeu.com> wrote:
>> - or use a JOIN delete with a virtual VALUES table
>> - or fill a temp table with ids and use a JOIN DELETE
>
> What is a virtual VALUES table? Can you give me an example of using a
> virtual table with selects, joins, and also deletes?
>

I think he refers to the way you pass values in insert, and alike:
INSERT INTO foo(a,b)      VALUES(1,2), (2,3), (3,4);

Re: Slow Bulk Delete

From
"A. Kretschmer"
Date:
In response to Jon Nelson :
> On Mon, May 17, 2010 at 5:10 AM, Pierre C <lists@peufeu.com> wrote:
> > - or use a JOIN delete with a virtual VALUES table
> > - or fill a temp table with ids and use a JOIN DELETE
>
> What is a virtual VALUES table? Can you give me an example of using a
> virtual table with selects, joins, and also deletes?

Something like this:

test=# select * from foo;
 c1
----
  1
  2
  3
  4
(4 rows)

test=*# delete from foo using (values (1),(2) ) as bla where foo.c1=bla.column1;
DELETE 2
test=*# select * from foo;
 c1
----
  3
  4
(2 rows)



values (1), (2) as bla   -> returns a 'virtual table' bla with one
column column1.


Andreas
--
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: -> Header)
GnuPG: 0x31720C99, 1006 CCB4 A326 1D42 6431  2EB0 389D 1DC2 3172 0C99

Re: Slow Bulk Delete

From
Jon Nelson
Date:
2010/5/17 Віталій Тимчишин <tivv00@gmail.com>:
>
>
> 2010/5/17 Jon Nelson <jnelson+pgsql@jamponi.net>
>>
>> On Mon, May 17, 2010 at 5:10 AM, Pierre C <lists@peufeu.com> wrote:
>> > - or use a JOIN delete with a virtual VALUES table
>> > - or fill a temp table with ids and use a JOIN DELETE
>>
>> What is a virtual VALUES table? Can you give me an example of using a
>> virtual table with selects, joins, and also deletes?
>>
>>
>
> delete from a using (values (1),(2),(5),(8)) b(x) where a.id=b.x
> See http://www.postgresql.org/docs/8.4/static/sql-values.html

This syntax I'm familiar with. The author of the previous message
(Pierre C) indicated that there is a concept of a virtual table which
could be joined to.  I'd like to know what this virtual table thing
is, specifically in the context of joins.


--
Jon

Re: Slow Bulk Delete

From
Grzegorz Jaśkiewicz
Date:
again VALUES(1,2), (2,3), ....; is a 'virtual table', as he calls it.
It really is not a table to postgresql. I guess he is just using that
naming convention.

Re: Slow Bulk Delete

From
Jon Nelson
Date:
On Mon, May 17, 2010 at 7:28 AM, A. Kretschmer
<andreas.kretschmer@schollglas.com> wrote:
> In response to Jon Nelson :
>> On Mon, May 17, 2010 at 5:10 AM, Pierre C <lists@peufeu.com> wrote:
>> > - or use a JOIN delete with a virtual VALUES table
>> > - or fill a temp table with ids and use a JOIN DELETE
>>
>> What is a virtual VALUES table? Can you give me an example of using a
>> virtual table with selects, joins, and also deletes?
>
> Something like this:
...

delete from foo using (values (1),(2) ) as bla where foo.c1=bla.column1;
...

Aha! Cool. That's not quite what I envisioned when you said virtual
table, but it surely clarifies things.
Thanks!

--
Jon