Re: Slow Bulk Delete - Mailing list pgsql-performance

From Andy Colson
Subject Re: Slow Bulk Delete
Date
Msg-id 4BE56459.4040007@squeakycode.net
Whole thread Raw
List pgsql-performance
On 05/08/2010 06:39 AM, thilo wrote:
> Hi all!
>
> We moved from MySQL to Postgresql for some of our projects. So far
> we're very impressed with the performance (especially INSERTs and
> UPDATEs), except for a strange problem with the following bulk delete
> query:
>
> DELETE FROM table1 WHERE table2_id = ?
>
> I went through these Wiki pages, trying to solve the problem:
>
> http://wiki.postgresql.org/wiki/SlowQueryQuestions and
> http://wiki.postgresql.org/wiki/Performance_Optimization
>
> but unfortunately without much luck.
>
> Our application is doing batch jobs. On every batch run, we must
> delete approx. 1M rows in table1 and recreate these entries. The
> inserts are very fast, but deletes are not. We cannot make updates,
> because there's no identifying property in the objects of table1.
>
> This is what EXPLAIN is telling me:
>
> EXPLAIN ANALYZE DELETE FROM table1 WHERE table2_id = 11242939
>                                                           QUERY
> PLAN
>
----------------------------------------------------------------------------------------------------------------------------
>   Index Scan using sr_index on table1  (cost=0.00..8.56 rows=4 width=6)
> (actual time=0.111..0.154 rows=4 loops=1)
>     Index Cond: (table2_id = 11242939)
>   Total runtime: 0.421 ms
> (3 rows)
>
> This seems to be very fast (using the index), but running this query
> from JDBC takes up to 20ms each. For 1M rows this sum up to several
> hours. When I have a look at pg_top psql uses most of the time for the
> deletes. CPU usage is 100% (for the core used by postgresql). So it
> seems that postgresql is doing some sequential scanning or constraint
> checks.
>
> This is the table structure:
>
> id    bigint     (primary key)
> table2_id    bigint     (foreign key constraint to table 2, *indexed*)
> table3_id    bigint     (foreign key constraint to table 3, *indexed*)
> some non-referenced text and boolean fields
>
> My server settings (Potgresql 8.4.2):
>
> shared_buffers = 1024MB
> effective_cache_size = 2048MB
> work_mem = 128MB
> wal_buffers = 64MB
> checkpoint_segments = 32
> checkpoint_timeout = 15min
> checkpoint_completion_target = 0.9
>
> It would be very nice to give me a hint to solve the problem. It
> drives me crazy ;-)
>
> If you need more details please feel free to ask!
>
> Thanks in advance for your help!
>
> Kind regards
>
> Thilo


I am going to guess the slow part is sending 1M different queries back and forth from client to server.  You could try
batchingthem together: 

DELETE FROM table1 WHERE table2_id in (11242939, 1,2,3,4,5...., 42);

Also are you preparing the query?

-Andy

pgsql-performance by date:

Previous
From: Josh Berkus
Date:
Subject: 8K recordsize bad on ZFS?
Next
From: Karl Denninger
Date:
Subject: Ugh - bad plan with LIMIT in a complex SELECT, any way to fix this?