Thread: improving performance for a delete
Hi all; I have 2 tables where I basically want to delete from the first table (seg_id_tmp7) any rows where the entire row already exists in the second table (sl_cd_segment_dim) I have a query that looks like this (and it's slow): delete from seg_id_tmp7 where customer_srcid::text || show_srcid::text || show_name::text || season_srcid::text || season_name::text || episode_srcid::text || episode_name::text || segment_type_id::text || segment_type::text || segment_srcid::text || segment_name::text in ( select customer_srcid::text || show_srcid::text || show_name::text || season_srcid::text || season_name::text || episode_srcid::text || episode_name::text || segment_type_id::text || segment_type::text || segment_srcid::text || segment_name::text from sl_cd_location_dim ) ; Here's the query plan for it: QUERY PLAN ----------------------------------------------------------------------------------- Seq Scan on seg_id_tmp7 (cost=0.00..138870701.56 rows=2136 width=6) Filter: (subplan) SubPlan -> Seq Scan on sl_cd_location_dim (cost=0.00..63931.60 rows=433040 width=8) (4 rows) I also tried this: delete from seg_id_tmp7 where ( customer_srcid , show_srcid , show_name , season_srcid , season_name , episode_srcid , episode_name , segment_type_id , segment_type , segment_srcid , segment_name ) in ( select customer_srcid , show_srcid , show_name , season_srcid , season_name , episode_srcid , episode_name , segment_type_id , segment_type , segment_srcid , segment_name from sl_cd_location_dim ) ; and I get this query plan: QUERY PLAN ----------------------------------------------------------------------------------- Seq Scan on seg_id_tmp7 (cost=0.00..87997034.20 rows=2136 width=6) Filter: (subplan) SubPlan -> Seq Scan on sl_cd_location_dim (cost=0.00..40114.40 rows=433040 width=8) (4 rows) If it helps here's the describe's (including indexes) for both tables: # \d seg_id_tmp7 Table "public.seg_id_tmp7" Column | Type | Modifiers -----------------+-----------------------------+----------- customer_srcid | bigint | show_srcid | bigint | show_name | character varying | season_srcid | bigint | season_name | character varying | episode_srcid | bigint | episode_name | character varying | segment_type_id | bigint | segment_type | character varying | segment_srcid | bigint | segment_name | character varying | create_dt | timestamp without time zone | # \d sl_cd_segment_dim Table "public.sl_cd_segment_dim" Column | Type | Modifiers ----------------------+----------------------------- +------------------------------------------------------------- sl_cd_segment_dim_id | bigint | not null default nextval('sl_cd_segment_dim_seq'::regclass) customer_srcid | bigint | not null show_srcid | bigint | not null show_name | character varying(500) | not null season_srcid | bigint | not null season_name | character varying(500) | not null episode_srcid | bigint | not null episode_name | character varying(500) | not null segment_type_id | integer | segment_type | character varying(500) | segment_srcid | bigint | segment_name | character varying(500) | effective_dt | timestamp without time zone | not null default now() inactive_dt | timestamp without time zone | last_update_dt | timestamp without time zone | not null default now() Indexes: "sl_cd_segment_dim_pk" PRIMARY KEY, btree (sl_cd_segment_dim_id) "seg1" btree (customer_srcid) "seg2" btree (show_srcid) "seg3" btree (season_srcid) "seg4" btree (episode_srcid) "seg5" btree (segment_srcid) "sl_cd_segment_dim_ix1" btree (customer_srcid) Any thoughts, suggestions, etc on how to improve performance for this delete ? Thanks in advance.. /Kevin
Version 8.3.1 On May 20, 2008, at 1:51 PM, kevin kempter wrote: > Hi all; > > I have 2 tables where I basically want to delete from the first > table (seg_id_tmp7) any rows where the entire row already exists in > the second table (sl_cd_segment_dim) > > I have a query that looks like this (and it's slow): > > > delete from seg_id_tmp7 > where > customer_srcid::text || > show_srcid::text || > show_name::text || > season_srcid::text || > season_name::text || > episode_srcid::text || > episode_name::text || > segment_type_id::text || > segment_type::text || > segment_srcid::text || > segment_name::text > in > ( select > customer_srcid::text || > show_srcid::text || > show_name::text || > season_srcid::text || > season_name::text || > episode_srcid::text || > episode_name::text || > segment_type_id::text || > segment_type::text || > segment_srcid::text || > segment_name::text > from sl_cd_location_dim ) > ; > > > > > > Here's the query plan for it: > > QUERY PLAN > ----------------------------------------------------------------------------------- > Seq Scan on seg_id_tmp7 (cost=0.00..138870701.56 rows=2136 width=6) > Filter: (subplan) > SubPlan > -> Seq Scan on sl_cd_location_dim (cost=0.00..63931.60 > rows=433040 width=8) > (4 rows) > > > > > > > > > I also tried this: > > delete from seg_id_tmp7 > where > ( customer_srcid , > show_srcid , > show_name , > season_srcid , > season_name , > episode_srcid , > episode_name , > segment_type_id , > segment_type , > segment_srcid , > segment_name ) > in > ( select > customer_srcid , > show_srcid , > show_name , > season_srcid , > season_name , > episode_srcid , > episode_name , > segment_type_id , > segment_type , > segment_srcid , > segment_name > from sl_cd_location_dim ) > ; > > > and I get this query plan: > > QUERY PLAN > ----------------------------------------------------------------------------------- > Seq Scan on seg_id_tmp7 (cost=0.00..87997034.20 rows=2136 width=6) > Filter: (subplan) > SubPlan > -> Seq Scan on sl_cd_location_dim (cost=0.00..40114.40 > rows=433040 width=8) > (4 rows) > > > > If it helps here's the describe's (including indexes) for both tables: > > # \d seg_id_tmp7 > Table "public.seg_id_tmp7" > Column | Type | Modifiers > -----------------+-----------------------------+----------- > customer_srcid | bigint | > show_srcid | bigint | > show_name | character varying | > season_srcid | bigint | > season_name | character varying | > episode_srcid | bigint | > episode_name | character varying | > segment_type_id | bigint | > segment_type | character varying | > segment_srcid | bigint | > segment_name | character varying | > create_dt | timestamp without time zone | > > > > > # \d sl_cd_segment_dim > Table > "public.sl_cd_segment_dim" > Column | Type > | Modifiers > ----------------------+----------------------------- > +------------------------------------------------------------- > sl_cd_segment_dim_id | bigint | not null > default nextval('sl_cd_segment_dim_seq'::regclass) > customer_srcid | bigint | not null > show_srcid | bigint | not null > show_name | character varying(500) | not null > season_srcid | bigint | not null > season_name | character varying(500) | not null > episode_srcid | bigint | not null > episode_name | character varying(500) | not null > segment_type_id | integer | > segment_type | character varying(500) | > segment_srcid | bigint | > segment_name | character varying(500) | > effective_dt | timestamp without time zone | not null > default now() > inactive_dt | timestamp without time zone | > last_update_dt | timestamp without time zone | not null > default now() > Indexes: > "sl_cd_segment_dim_pk" PRIMARY KEY, btree (sl_cd_segment_dim_id) > "seg1" btree (customer_srcid) > "seg2" btree (show_srcid) > "seg3" btree (season_srcid) > "seg4" btree (episode_srcid) > "seg5" btree (segment_srcid) > "sl_cd_segment_dim_ix1" btree (customer_srcid) > > > > > > > Any thoughts, suggestions, etc on how to improve performance for > this delete ? > > > Thanks in advance.. > > /Kevin > > > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org > ) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance
On Tue, 20 May 2008 22:03:30 +0200, kevin kempter <kevin@kevinkempterllc.com> wrote: > Version 8.3.1 > > > On May 20, 2008, at 1:51 PM, kevin kempter wrote: > >> Hi all; >> >> I have 2 tables where I basically want to delete from the first table >> (seg_id_tmp7) any rows where the entire row already exists in the >> second table (sl_cd_segment_dim) >> >> I have a query that looks like this (and it's slow): >> >> >> delete from seg_id_tmp7 >> where >> customer_srcid::text || Besides being slow as hell and not able to use any indexes, the string concatenation can also yield incorrect results, for instance : season_name::text || episode_srcid::text Will have the same contents for season_name='season 1' episode_srcid=12 season_name='season 11' episode_srcid=2 I suggest doing it the right way, one possibility being : test=> EXPLAIN DELETE from test where (id,value) in (select id,value from test2); QUERY PLAN ------------------------------------------------------------------------- Hash IN Join (cost=2943.00..6385.99 rows=2 width=6) Hash Cond: ((test.id = test2.id) AND (test.value = test2.value)) -> Seq Scan on test (cost=0.00..1442.99 rows=99999 width=14) -> Hash (cost=1443.00..1443.00 rows=100000 width=8) -> Seq Scan on test2 (cost=0.00..1443.00 rows=100000 width=8) Thanks to the hash it is very fast, one seq scan on both tables, instead of one seq scan PER ROW in your query. Another solution would be : test=> EXPLAIN DELETE FROM test USING test2 WHERE test.id=test2.id AND test.value=test2.value; QUERY PLAN ------------------------------------------------------------------------- Hash Join (cost=2943.00..6385.99 rows=2 width=6) Hash Cond: ((test.id = test2.id) AND (test.value = test2.value)) -> Seq Scan on test (cost=0.00..1442.99 rows=99999 width=14) -> Hash (cost=1443.00..1443.00 rows=100000 width=8) -> Seq Scan on test2 (cost=0.00..1443.00 rows=100000 width=8) Which chooses the same plan here, quite logically, as it is the best one in this particular case.