improving performance for a delete - Mailing list pgsql-performance

From kevin kempter
Subject improving performance for a delete
Date
Msg-id 1F6D96EA-31C8-47D0-9D72-664639C387C0@kevinkempterllc.com
Whole thread Raw
Responses Re: improving performance for a delete
List pgsql-performance
Hi all;

I have 2 tables where I basically want to delete from the first table
(seg_id_tmp7) any rows where the entire row already exists in the
second table (sl_cd_segment_dim)

I have a query that looks like this (and it's slow):


delete from seg_id_tmp7
where
    customer_srcid::text ||
    show_srcid::text ||
    show_name::text ||
    season_srcid::text ||
    season_name::text ||
    episode_srcid::text ||
    episode_name::text ||
    segment_type_id::text ||
    segment_type::text ||
    segment_srcid::text ||
    segment_name::text
in
    ( select
        customer_srcid::text ||
        show_srcid::text ||
        show_name::text ||
        season_srcid::text ||
        season_name::text ||
        episode_srcid::text ||
        episode_name::text ||
        segment_type_id::text ||
        segment_type::text ||
        segment_srcid::text ||
        segment_name::text
        from sl_cd_location_dim )
;





Here's the query plan for it:

                                     QUERY PLAN
-----------------------------------------------------------------------------------
  Seq Scan on seg_id_tmp7  (cost=0.00..138870701.56 rows=2136 width=6)
    Filter: (subplan)
    SubPlan
      ->  Seq Scan on sl_cd_location_dim  (cost=0.00..63931.60
rows=433040 width=8)
(4 rows)








I also tried this:

delete from seg_id_tmp7
where
    ( customer_srcid ,
    show_srcid ,
    show_name ,
    season_srcid ,
    season_name ,
    episode_srcid ,
    episode_name ,
    segment_type_id ,
    segment_type ,
    segment_srcid ,
    segment_name )
in
    ( select
        customer_srcid ,
        show_srcid ,
        show_name ,
        season_srcid ,
        season_name ,
        episode_srcid ,
        episode_name ,
        segment_type_id ,
        segment_type ,
        segment_srcid ,
        segment_name
        from sl_cd_location_dim )
;


and I get this query plan:

                                     QUERY PLAN
-----------------------------------------------------------------------------------
  Seq Scan on seg_id_tmp7  (cost=0.00..87997034.20 rows=2136 width=6)
    Filter: (subplan)
    SubPlan
      ->  Seq Scan on sl_cd_location_dim  (cost=0.00..40114.40
rows=433040 width=8)
(4 rows)



If it helps here's the describe's (including indexes) for both tables:

# \d seg_id_tmp7
                 Table "public.seg_id_tmp7"
      Column      |            Type             | Modifiers
-----------------+-----------------------------+-----------
  customer_srcid  | bigint                      |
  show_srcid      | bigint                      |
  show_name       | character varying           |
  season_srcid    | bigint                      |
  season_name     | character varying           |
  episode_srcid   | bigint                      |
  episode_name    | character varying           |
  segment_type_id | bigint                      |
  segment_type    | character varying           |
  segment_srcid   | bigint                      |
  segment_name    | character varying           |
  create_dt       | timestamp without time zone |




# \d sl_cd_segment_dim
                                          Table
"public.sl_cd_segment_dim"
         Column        |            Type
|                          Modifiers
----------------------+-----------------------------
+-------------------------------------------------------------
  sl_cd_segment_dim_id | bigint                      | not null
default nextval('sl_cd_segment_dim_seq'::regclass)
  customer_srcid       | bigint                      | not null
  show_srcid           | bigint                      | not null
  show_name            | character varying(500)      | not null
  season_srcid         | bigint                      | not null
  season_name          | character varying(500)      | not null
  episode_srcid        | bigint                      | not null
  episode_name         | character varying(500)      | not null
  segment_type_id      | integer                     |
  segment_type         | character varying(500)      |
  segment_srcid        | bigint                      |
  segment_name         | character varying(500)      |
  effective_dt         | timestamp without time zone | not null
default now()
  inactive_dt          | timestamp without time zone |
  last_update_dt       | timestamp without time zone | not null
default now()
Indexes:
     "sl_cd_segment_dim_pk" PRIMARY KEY, btree (sl_cd_segment_dim_id)
     "seg1" btree (customer_srcid)
     "seg2" btree (show_srcid)
     "seg3" btree (season_srcid)
     "seg4" btree (episode_srcid)
     "seg5" btree (segment_srcid)
     "sl_cd_segment_dim_ix1" btree (customer_srcid)






Any thoughts, suggestions, etc on how to improve performance for this
delete ?


Thanks in advance..

/Kevin



pgsql-performance by date:

Previous
From: "Scott Marlowe"
Date:
Subject: Re: slow update
Next
From: kevin kempter
Date:
Subject: Re: improving performance for a delete