Thread: Slow Bulk Delete
Hi all! We moved from MySQL to Postgresql for some of our projects. So far we're very impressed with the performance (especially INSERTs and UPDATEs), except for a strange problem with the following bulk delete query: DELETE FROM table1 WHERE table2_id = ? I went through these Wiki pages, trying to solve the problem: http://wiki.postgresql.org/wiki/SlowQueryQuestions and http://wiki.postgresql.org/wiki/Performance_Optimization but unfortunately without much luck. Our application is doing batch jobs. On every batch run, we must delete approx. 1M rows in table1 and recreate these entries. The inserts are very fast, but deletes are not. We cannot make updates, because there's no identifying property in the objects of table1. This is what EXPLAIN is telling me: EXPLAIN ANALYZE DELETE FROM table1 WHERE table2_id = 11242939 QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------- Index Scan using sr_index on table1 (cost=0.00..8.56 rows=4 width=6) (actual time=0.111..0.154 rows=4 loops=1) Index Cond: (table2_id = 11242939) Total runtime: 0.421 ms (3 rows) This seems to be very fast (using the index), but running this query from JDBC takes up to 20ms each. For 1M rows this sum up to several hours. When I have a look at pg_top psql uses most of the time for the deletes. CPU usage is 100% (for the core used by postgresql). So it seems that postgresql is doing some sequential scanning or constraint checks. This is the table structure: id bigint (primary key) table2_id bigint (foreign key constraint to table 2, *indexed*) table3_id bigint (foreign key constraint to table 3, *indexed*) some non-referenced text and boolean fields My server settings (Potgresql 8.4.2): shared_buffers = 1024MB effective_cache_size = 2048MB work_mem = 128MB wal_buffers = 64MB checkpoint_segments = 32 checkpoint_timeout = 15min checkpoint_completion_target = 0.9 It would be very nice to give me a hint to solve the problem. It drives me crazy ;-) If you need more details please feel free to ask! Thanks in advance for your help! Kind regards Thilo
Thilo, Just a few of thoughts off the top of my head: 1. If you know the ids of the rows you want to delete beforhand, insert them in a table, then run the delete based on ajoin with this table. 2. Better yet, insert the ids into a table using COPY, then use a join to create a new table with the rows you want to keepfrom the first table. Drop the original source table, truncate the id table, rename the copied table and add indexesand constraints. 3. See if you can partition the table somehow so the rows you want to delete are in a single partitioned child table. Whenits time to delete them just drop the child table. Of course, if the 1M rows you need to delete is very small compared to the total overall size of the original table the firsttwo techniques might now buy you anything, but its worth a try. Good luck! Bob Lunney --- On Sat, 5/8/10, thilo <thilo.tanner@gmail.com> wrote: > From: thilo <thilo.tanner@gmail.com> > Subject: [PERFORM] Slow Bulk Delete > To: pgsql-performance@postgresql.org > Date: Saturday, May 8, 2010, 7:39 AM > Hi all! > > We moved from MySQL to Postgresql for some of our projects. > So far > we're very impressed with the performance (especially > INSERTs and > UPDATEs), except for a strange problem with the following > bulk delete > query: > > DELETE FROM table1 WHERE table2_id = ? > > I went through these Wiki pages, trying to solve the > problem: > > http://wiki.postgresql.org/wiki/SlowQueryQuestions and > http://wiki.postgresql.org/wiki/Performance_Optimization > > but unfortunately without much luck. > > Our application is doing batch jobs. On every batch run, we > must > delete approx. 1M rows in table1 and recreate these > entries. The > inserts are very fast, but deletes are not. We cannot make > updates, > because there's no identifying property in the objects of > table1. > > This is what EXPLAIN is telling me: > > EXPLAIN ANALYZE DELETE FROM table1 WHERE table2_id = > 11242939 > > > > QUERY > PLAN > ---------------------------------------------------------------------------------------------------------------------------- > Index Scan using sr_index on table1 (cost=0.00..8.56 > rows=4 width=6) > (actual time=0.111..0.154 rows=4 loops=1) > Index Cond: (table2_id = 11242939) > Total runtime: 0.421 ms > (3 rows) > > This seems to be very fast (using the index), but running > this query > from JDBC takes up to 20ms each. For 1M rows this sum up to > several > hours. When I have a look at pg_top psql uses most of the > time for the > deletes. CPU usage is 100% (for the core used by > postgresql). So it > seems that postgresql is doing some sequential scanning or > constraint > checks. > > This is the table structure: > > id > bigint (primary key) > table2_id > bigint (foreign key constraint > to table 2, *indexed*) > table3_id > bigint (foreign key constraint > to table 3, *indexed*) > some non-referenced text and boolean fields > > My server settings (Potgresql 8.4.2): > > shared_buffers = 1024MB > effective_cache_size = 2048MB > work_mem = 128MB > wal_buffers = 64MB > checkpoint_segments = 32 > checkpoint_timeout = 15min > checkpoint_completion_target = 0.9 > > It would be very nice to give me a hint to solve the > problem. It > drives me crazy ;-) > > If you need more details please feel free to ask! > > Thanks in advance for your help! > > Kind regards > > Thilo > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance >
> DELETE FROM table1 WHERE table2_id = ? For bulk deletes, try : DELETE FROM table1 WHERE table2_id IN (list of a few thousands ids) - or use a JOIN delete with a virtual VALUES table - or fill a temp table with ids and use a JOIN DELETE This will save cliet/server roundtrips. Now, something that can make a DELETE very slow is a non-indexed ON DELETE CASCADE foreign key : when you DELETE FROM table1 and it cascades to a DELETE on table2, and you forget the index on table2. Also check the time spent in triggers. Do you have a GIN index ?
On Mon, May 17, 2010 at 5:10 AM, Pierre C <lists@peufeu.com> wrote: > - or use a JOIN delete with a virtual VALUES table > - or fill a temp table with ids and use a JOIN DELETE What is a virtual VALUES table? Can you give me an example of using a virtual table with selects, joins, and also deletes? -- Jon
2010/5/17 Jon Nelson <jnelson+pgsql@jamponi.net>
On Mon, May 17, 2010 at 5:10 AM, Pierre C <lists@peufeu.com> wrote:What is a virtual VALUES table? Can you give me an example of using a
> - or use a JOIN delete with a virtual VALUES table
> - or fill a temp table with ids and use a JOIN DELETE
virtual table with selects, joins, and also deletes?
delete from a using (values (1),(2),(5),(8)) b(x) where a.id=b.x
--
Best regards,
Vitalii Tymchyshyn
On Mon, May 17, 2010 at 12:54 PM, Jon Nelson <jnelson+pgsql@jamponi.net> wrote: > On Mon, May 17, 2010 at 5:10 AM, Pierre C <lists@peufeu.com> wrote: >> - or use a JOIN delete with a virtual VALUES table >> - or fill a temp table with ids and use a JOIN DELETE > > What is a virtual VALUES table? Can you give me an example of using a > virtual table with selects, joins, and also deletes? > I think he refers to the way you pass values in insert, and alike: INSERT INTO foo(a,b) VALUES(1,2), (2,3), (3,4);
In response to Jon Nelson : > On Mon, May 17, 2010 at 5:10 AM, Pierre C <lists@peufeu.com> wrote: > > - or use a JOIN delete with a virtual VALUES table > > - or fill a temp table with ids and use a JOIN DELETE > > What is a virtual VALUES table? Can you give me an example of using a > virtual table with selects, joins, and also deletes? Something like this: test=# select * from foo; c1 ---- 1 2 3 4 (4 rows) test=*# delete from foo using (values (1),(2) ) as bla where foo.c1=bla.column1; DELETE 2 test=*# select * from foo; c1 ---- 3 4 (2 rows) values (1), (2) as bla -> returns a 'virtual table' bla with one column column1. Andreas -- Andreas Kretschmer Kontakt: Heynitz: 035242/47150, D1: 0160/7141639 (mehr: -> Header) GnuPG: 0x31720C99, 1006 CCB4 A326 1D42 6431 2EB0 389D 1DC2 3172 0C99
2010/5/17 Віталій Тимчишин <tivv00@gmail.com>: > > > 2010/5/17 Jon Nelson <jnelson+pgsql@jamponi.net> >> >> On Mon, May 17, 2010 at 5:10 AM, Pierre C <lists@peufeu.com> wrote: >> > - or use a JOIN delete with a virtual VALUES table >> > - or fill a temp table with ids and use a JOIN DELETE >> >> What is a virtual VALUES table? Can you give me an example of using a >> virtual table with selects, joins, and also deletes? >> >> > > delete from a using (values (1),(2),(5),(8)) b(x) where a.id=b.x > See http://www.postgresql.org/docs/8.4/static/sql-values.html This syntax I'm familiar with. The author of the previous message (Pierre C) indicated that there is a concept of a virtual table which could be joined to. I'd like to know what this virtual table thing is, specifically in the context of joins. -- Jon
again VALUES(1,2), (2,3), ....; is a 'virtual table', as he calls it. It really is not a table to postgresql. I guess he is just using that naming convention.
On Mon, May 17, 2010 at 7:28 AM, A. Kretschmer <andreas.kretschmer@schollglas.com> wrote: > In response to Jon Nelson : >> On Mon, May 17, 2010 at 5:10 AM, Pierre C <lists@peufeu.com> wrote: >> > - or use a JOIN delete with a virtual VALUES table >> > - or fill a temp table with ids and use a JOIN DELETE >> >> What is a virtual VALUES table? Can you give me an example of using a >> virtual table with selects, joins, and also deletes? > > Something like this: ... delete from foo using (values (1),(2) ) as bla where foo.c1=bla.column1; ... Aha! Cool. That's not quite what I envisioned when you said virtual table, but it surely clarifies things. Thanks! -- Jon