trying to delete most of the table by range of date col - Mailing list pgsql-performance

From Mariel Cherkassky
Subject trying to delete most of the table by range of date col
Date
Msg-id CA+t6e1krNiXeqbXPi7pSYA8vvu7KsRCcE8yexzFBGrnBhS7sZg@mail.gmail.com
Whole thread Raw
Responses Re: trying to delete most of the table by range of date col  (Justin Pryzby <pryzby@telsasoft.com>)
List pgsql-performance
Hi,
I have a big table (with 1.6 milion records). One of the columns is called end_date and it`s type is timestamp. I'm trying to find the best way to delete most of the table but not all of it according to a range of dates. The table structure : 
afa=# \d my_table;
                                              Table "public.my_table"
             Column              |           Type           |                        Modifiers
---------------------------------+--------------------------+----------------------------------------------------------
 id                              | bigint                   | not null default nextval('my_table_id_seq'::regclass)
 devid| integer                  | not null
 column_name| integer                  | not null
 column_name| integer                  | not null
 column_name| integer                  | not null
 column_name| timestamp with time zone |
 column_name| integer                  | not null
 column_name| integer                  | not null
 column_name| integer                  | not null
 column_name| integer                  | not null
 column_name| integer                  | not null
 column_name| integer                  | not null
 column_name| integer                  | not null
 column_name| text                     | not null
 column_name| integer                  | not null default 0
 column_name| integer                  | not null default 0
 column_name| integer                  | not null default 0
 column_name| integer                  | not null default 0
 column_name| integer                  | not null default 0
 column_name| integer                  | not null default 0
 column_name| integer                  | not null default 0
 column_name| integer                  | default 0
 column_name| integer                  | default 0
 column_name| integer                  | default 0
 column_name| integer                  | default 0
 column_name| integer                  | default 0
 column_name| integer                  | default 0
 end_date                        | timestamp with time zone |

Indexes:
    "my_table_pkey" PRIMARY KEY, btree (id)
    "my_table_date_idx" btree (date)
    "my_table_device_idx" btree (devid)
    "end_date_idx" btree (end_date)
Foreign-key constraints:
    "fk_aaaaa" FOREIGN KEY (devid) REFERENCES device_data(id)
Referenced by:
    TABLE "table1" CONSTRAINT "application_change_my_table_id_fkey" FOREIGN KEY (my_table_id) REFERENCES my_table(id)
    TABLE "table2" CONSTRAINT "configuration_changes_my_table_id_fkey" FOREIGN KEY (my_table_id) REFERENCES my_table(id)
    TABLE "table3" CONSTRAINT "fk_57hmvnx423bw9h203260r8gic" FOREIGN KEY (my_table) REFERENCES my_table(id)
    TABLE "table3" CONSTRAINT "interface_change_my_table_fk" FOREIGN KEY (my_table) REFERENCES my_table(id)
    TABLE "table4" CONSTRAINT "my_table_id_fkey" FOREIGN KEY (my_table_id) REFERENCES my_table(id) ON DELETE CASCADE
    TABLE "table5" CONSTRAINT "my_table_report_my_table_fk" FOREIGN KEY (my_table_id) REFERENCES my_table(id)
    TABLE "table6" CONSTRAINT "my_table_to_policy_change_my_table_foreign_key" FOREIGN KEY (my_table) REFERENCES my_table(id)
    TABLE "table7" CONSTRAINT "network_object_change_my_table_id_fkey" FOREIGN KEY (my_table_id) REFERENCES my_table(id)
    TABLE "table8" CONSTRAINT "orig_nat_rule_change_my_table_id_fkey" FOREIGN KEY (my_table_id) REFERENCES my_table(id)
    TABLE "table9" CONSTRAINT "risk_change_my_table_id_fkey" FOREIGN KEY (my_table_id) REFERENCES my_table(id)
    TABLE "table10" CONSTRAINT "rule_change_my_table_id_fkey" FOREIGN KEY (my_table_id) REFERENCES my_table(id)
    TABLE "table11" CONSTRAINT "service_change_my_table_id_fkey" FOREIGN KEY (my_table_id) REFERENCES my_table(id)

As you can see alot of other tables uses the id col as a foreign key which make the delete much slower.

Solution I tried for the query : 

delete  from my_table where end_date <= to_date('12/12/2018','DD/MM/YYYY') and end_date > to_date('11/12/2018','DD/MM/YYYY');

                                                                    QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------
 Delete on my_table  (cost=0.00..65183.30 rows=1573862 width=6) (actual time=5121.344..5121.344 rows=0 loops=1)
   ->  Seq Scan on my_table  (cost=0.00..65183.30 rows=1573862 width=6) (actual time=0.012..2244.393 rows=1572864 loops=1)
         Filter: ((end_date <= to_date('12/12/2018'::text, 'DD/MM/YYYY'::text)) AND (end_date > to_date('11/12/2018'::text, 'DD/MM/YYYY'::text)))
         Rows Removed by Filter: 40253
 Planning time: 0.210 ms
 Trigger for constraint table1: time=14730.816 calls=1572864
 Trigger for constraint table2: time=30718.084 calls=1572864
 Trigger for constraint table3: time=28170.363 calls=1572864
 Trigger for constraint table4: time=29573.681 calls=1572864
 Trigger for constraint table5: time=29629.263 calls=1572864
 Trigger for constraint table6: time=29628.489 calls=1572864
 Trigger for constraint table7: time=29798.121 calls=1572864
 Trigger for constraint table8: time=29645.705 calls=1572864
 Trigger for constraint table9: time=29657.177 calls=1572864
 Trigger for constraint table10: time=29487.054 calls=1572864
 Trigger for constraint table11: time=30010.978 calls=1572864
 Trigger for constraint table12: time=26383.924 calls=1572864
 Execution time: 350603.047 ms
(18 rows)

-----------------------

DELETE FROM my_table WHERE id IN (select id from my_table where end_date <= to_date('12/12/2018','DD/MM/YYYY') and end_date > to_date('11/12/2018','DD/MM/YYYY'));



                                                                          QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------
 Delete on my_table  (cost=92522.54..186785.27 rows=1572738 width=12) (actual time=9367.477..9367.477 rows=0 loops=1)
   ->  Hash Join  (cost=92522.54..186785.27 rows=1572738 width=12) (actual time=2871.906..5503.732 rows=1572864 loops=1)
         Hash Cond: (my_table.id = my_table_1.id)
         ->  Seq Scan on my_table  (cost=0.00..49052.16 rows=1613116 width=14) (actual time=0.004..669.184 rows=1613117 loops=1)
         ->  Hash  (cost=65183.32..65183.32 rows=1572738 width=14) (actual time=2871.301..2871.301 rows=1572864 loops=1)
               Buckets: 131072  Batches: 32  Memory Usage: 3332kB
               ->  Seq Scan on my_table my_table_1  (cost=0.00..65183.32 rows=1572738 width=14) (actual time=0.009..2115.826 rows=1572864 loops=1)
                     Filter: ((end_date <= to_date('12/12/2018'::text, 'DD/MM/YYYY'::text)) AND (end_date > to_date('11/12/2018'::text, 'DD/MM/YYYY'::text)))
                     Rows Removed by Filter: 40253
 Planning time: 0.419 ms
 Trigger for constraint my_table_id_fkey: time=14291.206 calls=1572864
 Trigger for constraint table2_fk: time=29171.591 calls=1572864
 Trigger for constraint table3_fk: time=26356.711 calls=1572864
 Trigger for constraint table4_fk: time=27579.694 calls=1572864
 Trigger for constraint table5_fk: time=27537.491 calls=1572864
 Trigger for constraint table6_fk: time=27574.169 calls=1572864
 Trigger for constraint table7_fk: time=27716.636 calls=1572864
 Trigger for constraint table8_fk: time=27780.192 calls=1572864
....
....

 Execution time: 333166.233 ms ~ 5.5 minutes
(23 rows)


Loading into a temp table the data isnt option because I cant truncate the table because of all the dependencies...

Any idea what else can I check ?

pgsql-performance by date:

Previous
From: Andreas Kretschmer
Date:
Subject: Re: Extremely slow when query uses GIST exclusion index
Next
From: Justin Pryzby
Date:
Subject: Re: trying to delete most of the table by range of date col