Thread: single transaction vs multiple transactions
Hi, I have to refactoring a 'DELETE FROM x WHERE y IN (...)' because IN got to much parameters. => 'stack depth limit exceeded' I don't want to increase just the parameter for max_stack_depth. It is better to refactoring because the number of arguments to IN may increase in the future. My approach is to do multiple 'DELETE FROM x WHERE y=...'. My question is now, what is better for PostgreSQL from a performance perspective? 1. all multiple deletes in one transaction 2. each delete in its own transaction The number of arguments is around 10,000. BTW: The arguments are generate in the application tier. I would have to create a temporary table which I can use in 'DELETE FROM x WHERE y IN (SELECT z FROM tmp)'. Cheers Sven
Sven Geisler wrote: > I have to refactoring a 'DELETE FROM x WHERE y IN (...)' because IN got > to much parameters. => 'stack depth limit exceeded' > I don't want to increase just the parameter for max_stack_depth. It is > better to refactoring because the number of arguments to IN may increase > in the future. > > My approach is to do multiple 'DELETE FROM x WHERE y=...'. You could also do something in between, issuing the deletes in batches of say 100 deletes each. But using a temporary table is much better. > My question is now, what is better for PostgreSQL from a performance > perspective? > 1. all multiple deletes in one transaction > 2. each delete in its own transaction All in one transaction is definitely faster. > The number of arguments is around 10,000. > > BTW: The arguments are generate in the application tier. I would have to > create a temporary table which I can use in 'DELETE FROM x WHERE y IN > (SELECT z FROM tmp)'. I think that's exactly what you should do. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Hi Heikki Heikki Linnakangas schrieb: > Sven Geisler wrote: >> I have to refactoring a 'DELETE FROM x WHERE y IN (...)' because IN got >> to much parameters. => 'stack depth limit exceeded' >> I don't want to increase just the parameter for max_stack_depth. It is >> better to refactoring because the number of arguments to IN may increase >> in the future. [...] >> >> BTW: The arguments are generate in the application tier. I would have to >> create a temporary table which I can use in 'DELETE FROM x WHERE y IN >> (SELECT z FROM tmp)'. > > I think that's exactly what you should do. I have to insert my arguments to a temporary table first, because the arguments are only known in the application tier. Is a multiple insert to a temporary table and a final 'DELETE FROM x WHERE y IN (SELECT z FROM tmp)' faster than multiple deletes? Sven.
Sven Geisler wrote: > I have to insert my arguments to a temporary table first, because the > arguments are only known in the application tier. > Is a multiple insert to a temporary table and a final 'DELETE FROM x > WHERE y IN (SELECT z FROM tmp)' faster than multiple deletes? If the number of records is high, it most likely is faster. You should try it with your data to be sure, but in general doing all the deletes in one batch is faster when the number of records is high because it allows using efficient merge joins or sequential scans. Populating the temporary table with no indexes should be quite inexpensive if you make sure you don't do it one record at a time. Use the COPY command or batched inserts instead. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Hi, Many thanks for the fast response. I will use temporary table with copy. Other issue I have is the connection between app and db. I guess, the approach with the copy will also hold down the network I/O between app and db. Keeping in mind that I produce 10,000+ statements. Thx Sven Heikki Linnakangas schrieb: > Sven Geisler wrote: >> I have to insert my arguments to a temporary table first, because the >> arguments are only known in the application tier. >> Is a multiple insert to a temporary table and a final 'DELETE FROM x >> WHERE y IN (SELECT z FROM tmp)' faster than multiple deletes? > > If the number of records is high, it most likely is faster. You should > try it with your data to be sure, but in general doing all the deletes > in one batch is faster when the number of records is high because it > allows using efficient merge joins or sequential scans. > > Populating the temporary table with no indexes should be quite > inexpensive if you make sure you don't do it one record at a time. Use > the COPY command or batched inserts instead. > >
"Heikki Linnakangas" <heikki@enterprisedb.com> writes: > Sven Geisler wrote: >> I have to refactoring a 'DELETE FROM x WHERE y IN (...)' because IN got >> to much parameters. => 'stack depth limit exceeded' >> The number of arguments is around 10,000. >> ... >> BTW: The arguments are generate in the application tier. I would have to >> create a temporary table which I can use in 'DELETE FROM x WHERE y IN >> (SELECT z FROM tmp)'. > I think that's exactly what you should do. Also, if you're planning to update to 8.2 soon, the tradeoffs will change completely. 8.2 should avoid the stack depth problem, and you can get something closely approximating the plan you'd get for a join against a temp table using VALUES: regression=# explain select * from tenk1 where unique2 in (1,2,3,4,6,8); QUERY PLAN ----------------------------------------------------------------------------- Bitmap Heap Scan on tenk1 (cost=24.01..45.79 rows=6 width=244) Recheck Cond: (unique2 = ANY ('{1,2,3,4,6,8}'::integer[])) -> Bitmap Index Scan on tenk1_unique2 (cost=0.00..24.01 rows=6 width=0) Index Cond: (unique2 = ANY ('{1,2,3,4,6,8}'::integer[])) (4 rows) regression=# explain select * from tenk1 where unique2 in (values(1),(2),(3),(4),(6),(8)); QUERY PLAN ---------------------------------------------------------------------------------- Nested Loop (cost=4.10..48.34 rows=6 width=244) -> HashAggregate (cost=0.09..0.15 rows=6 width=4) -> Values Scan on "*VALUES*" (cost=0.00..0.08 rows=6 width=4) -> Bitmap Heap Scan on tenk1 (cost=4.01..8.02 rows=1 width=244) Recheck Cond: (tenk1.unique2 = "*VALUES*".column1) -> Bitmap Index Scan on tenk1_unique2 (cost=0.00..4.01 rows=1 width=0) Index Cond: (tenk1.unique2 = "*VALUES*".column1) (7 rows) regards, tom lane
Hi, PostgreSQL offers some proprietary SQL parameters and commands which easily solves such problems. If you are sure PostgreSQL is the DB for your app forever ;) , you could use this parameters and commands. Here a possible resolution for your problem. DELETE FROM x USING tmp WHERE x.y=tmp.z; Please read the extensive documentation of PostgreSQL first, before posting. CU, Jens On Tue, 05 Dec 2006 16:26:55 +0100, Sven Geisler <sgeisler@aeccom.com> wrote: > Hi, > > I have to refactoring a 'DELETE FROM x WHERE y IN (...)' because IN got > to much parameters. => 'stack depth limit exceeded' > I don't want to increase just the parameter for max_stack_depth. It is > better to refactoring because the number of arguments to IN may increase > in the future. > > My approach is to do multiple 'DELETE FROM x WHERE y=...'. > > My question is now, what is better for PostgreSQL from a performance > perspective? > 1. all multiple deletes in one transaction > 2. each delete in its own transaction > > The number of arguments is around 10,000. > > BTW: The arguments are generate in the application tier. I would have to > create a temporary table which I can use in 'DELETE FROM x WHERE y IN > (SELECT z FROM tmp)'. > > Cheers > Sven > > ---------------------------(end of broadcast)--------------------------- > TIP 2: Don't 'kill -9' the postmaster -- ** Dipl.-Designer Jens Schipkowski APUS Software GmbH