Thread: trying to delete most of the table by range of date col

trying to delete most of the table by range of date col

From
Mariel Cherkassky
Date:
Hi,
I have a big table (with 1.6 milion records). One of the columns is called end_date and it`s type is timestamp. I'm trying to find the best way to delete most of the table but not all of it according to a range of dates. The table structure : 
afa=# \d my_table;
                                              Table "public.my_table"
             Column              |           Type           |                        Modifiers
---------------------------------+--------------------------+----------------------------------------------------------
 id                              | bigint                   | not null default nextval('my_table_id_seq'::regclass)
 devid| integer                  | not null
 column_name| integer                  | not null
 column_name| integer                  | not null
 column_name| integer                  | not null
 column_name| timestamp with time zone |
 column_name| integer                  | not null
 column_name| integer                  | not null
 column_name| integer                  | not null
 column_name| integer                  | not null
 column_name| integer                  | not null
 column_name| integer                  | not null
 column_name| integer                  | not null
 column_name| text                     | not null
 column_name| integer                  | not null default 0
 column_name| integer                  | not null default 0
 column_name| integer                  | not null default 0
 column_name| integer                  | not null default 0
 column_name| integer                  | not null default 0
 column_name| integer                  | not null default 0
 column_name| integer                  | not null default 0
 column_name| integer                  | default 0
 column_name| integer                  | default 0
 column_name| integer                  | default 0
 column_name| integer                  | default 0
 column_name| integer                  | default 0
 column_name| integer                  | default 0
 end_date                        | timestamp with time zone |

Indexes:
    "my_table_pkey" PRIMARY KEY, btree (id)
    "my_table_date_idx" btree (date)
    "my_table_device_idx" btree (devid)
    "end_date_idx" btree (end_date)
Foreign-key constraints:
    "fk_aaaaa" FOREIGN KEY (devid) REFERENCES device_data(id)
Referenced by:
    TABLE "table1" CONSTRAINT "application_change_my_table_id_fkey" FOREIGN KEY (my_table_id) REFERENCES my_table(id)
    TABLE "table2" CONSTRAINT "configuration_changes_my_table_id_fkey" FOREIGN KEY (my_table_id) REFERENCES my_table(id)
    TABLE "table3" CONSTRAINT "fk_57hmvnx423bw9h203260r8gic" FOREIGN KEY (my_table) REFERENCES my_table(id)
    TABLE "table3" CONSTRAINT "interface_change_my_table_fk" FOREIGN KEY (my_table) REFERENCES my_table(id)
    TABLE "table4" CONSTRAINT "my_table_id_fkey" FOREIGN KEY (my_table_id) REFERENCES my_table(id) ON DELETE CASCADE
    TABLE "table5" CONSTRAINT "my_table_report_my_table_fk" FOREIGN KEY (my_table_id) REFERENCES my_table(id)
    TABLE "table6" CONSTRAINT "my_table_to_policy_change_my_table_foreign_key" FOREIGN KEY (my_table) REFERENCES my_table(id)
    TABLE "table7" CONSTRAINT "network_object_change_my_table_id_fkey" FOREIGN KEY (my_table_id) REFERENCES my_table(id)
    TABLE "table8" CONSTRAINT "orig_nat_rule_change_my_table_id_fkey" FOREIGN KEY (my_table_id) REFERENCES my_table(id)
    TABLE "table9" CONSTRAINT "risk_change_my_table_id_fkey" FOREIGN KEY (my_table_id) REFERENCES my_table(id)
    TABLE "table10" CONSTRAINT "rule_change_my_table_id_fkey" FOREIGN KEY (my_table_id) REFERENCES my_table(id)
    TABLE "table11" CONSTRAINT "service_change_my_table_id_fkey" FOREIGN KEY (my_table_id) REFERENCES my_table(id)

As you can see alot of other tables uses the id col as a foreign key which make the delete much slower.

Solution I tried for the query : 

delete  from my_table where end_date <= to_date('12/12/2018','DD/MM/YYYY') and end_date > to_date('11/12/2018','DD/MM/YYYY');

                                                                    QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------
 Delete on my_table  (cost=0.00..65183.30 rows=1573862 width=6) (actual time=5121.344..5121.344 rows=0 loops=1)
   ->  Seq Scan on my_table  (cost=0.00..65183.30 rows=1573862 width=6) (actual time=0.012..2244.393 rows=1572864 loops=1)
         Filter: ((end_date <= to_date('12/12/2018'::text, 'DD/MM/YYYY'::text)) AND (end_date > to_date('11/12/2018'::text, 'DD/MM/YYYY'::text)))
         Rows Removed by Filter: 40253
 Planning time: 0.210 ms
 Trigger for constraint table1: time=14730.816 calls=1572864
 Trigger for constraint table2: time=30718.084 calls=1572864
 Trigger for constraint table3: time=28170.363 calls=1572864
 Trigger for constraint table4: time=29573.681 calls=1572864
 Trigger for constraint table5: time=29629.263 calls=1572864
 Trigger for constraint table6: time=29628.489 calls=1572864
 Trigger for constraint table7: time=29798.121 calls=1572864
 Trigger for constraint table8: time=29645.705 calls=1572864
 Trigger for constraint table9: time=29657.177 calls=1572864
 Trigger for constraint table10: time=29487.054 calls=1572864
 Trigger for constraint table11: time=30010.978 calls=1572864
 Trigger for constraint table12: time=26383.924 calls=1572864
 Execution time: 350603.047 ms
(18 rows)

-----------------------

DELETE FROM my_table WHERE id IN (select id from my_table where end_date <= to_date('12/12/2018','DD/MM/YYYY') and end_date > to_date('11/12/2018','DD/MM/YYYY'));



                                                                          QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------
 Delete on my_table  (cost=92522.54..186785.27 rows=1572738 width=12) (actual time=9367.477..9367.477 rows=0 loops=1)
   ->  Hash Join  (cost=92522.54..186785.27 rows=1572738 width=12) (actual time=2871.906..5503.732 rows=1572864 loops=1)
         Hash Cond: (my_table.id = my_table_1.id)
         ->  Seq Scan on my_table  (cost=0.00..49052.16 rows=1613116 width=14) (actual time=0.004..669.184 rows=1613117 loops=1)
         ->  Hash  (cost=65183.32..65183.32 rows=1572738 width=14) (actual time=2871.301..2871.301 rows=1572864 loops=1)
               Buckets: 131072  Batches: 32  Memory Usage: 3332kB
               ->  Seq Scan on my_table my_table_1  (cost=0.00..65183.32 rows=1572738 width=14) (actual time=0.009..2115.826 rows=1572864 loops=1)
                     Filter: ((end_date <= to_date('12/12/2018'::text, 'DD/MM/YYYY'::text)) AND (end_date > to_date('11/12/2018'::text, 'DD/MM/YYYY'::text)))
                     Rows Removed by Filter: 40253
 Planning time: 0.419 ms
 Trigger for constraint my_table_id_fkey: time=14291.206 calls=1572864
 Trigger for constraint table2_fk: time=29171.591 calls=1572864
 Trigger for constraint table3_fk: time=26356.711 calls=1572864
 Trigger for constraint table4_fk: time=27579.694 calls=1572864
 Trigger for constraint table5_fk: time=27537.491 calls=1572864
 Trigger for constraint table6_fk: time=27574.169 calls=1572864
 Trigger for constraint table7_fk: time=27716.636 calls=1572864
 Trigger for constraint table8_fk: time=27780.192 calls=1572864
....
....

 Execution time: 333166.233 ms ~ 5.5 minutes
(23 rows)


Loading into a temp table the data isnt option because I cant truncate the table because of all the dependencies...

Any idea what else can I check ?

Re: trying to delete most of the table by range of date col

From
Justin Pryzby
Date:
On Mon, Sep 03, 2018 at 09:27:52AM +0300, Mariel Cherkassky wrote:
> I'm trying to find the best way to delete most of the table but not all of it
> according to a range of dates.

> Indexes:
>     "end_date_idx" btree (end_date)

> Referenced by:
>     TABLE "table1" CONSTRAINT "application_change_my_table_id_fkey" FOREIGN
> KEY (my_table_id) REFERENCES my_table(id)
>     TABLE "table2" CONSTRAINT "configuration_changes_my_table_id_fkey"
> FOREIGN KEY (my_table_id) REFERENCES my_table(id)
...

> As you can see alot of other tables uses the id col as a foreign key which
> make the delete much slower.

> Trigger for constraint table1: time=14730.816 calls=1572864
> Trigger for constraint table2: time=30718.084 calls=1572864
> Trigger for constraint table3: time=28170.363 calls=1572864
...

Do the other tables have indices on their referencING columns ?

https://www.postgresql.org/docs/devel/static/ddl-constraints.html#DDL-CONSTRAINTS-FK
"Since a DELETE of a row from the referenced table [...] will require a scan of
the referencing table for rows matching the old value, it is often a good idea
to index the referencing columns too."

Note, I believe it's planned in the future for foreign keys to support
referenes to partitioned tables, at which point you could just DROP the monthly
partition...but not supported right now.

Justin


Re: trying to delete most of the table by range of date col

From
Andreas Kretschmer
Date:

Am 03.09.2018 um 09:06 schrieb Justin Pryzby:
> Note, I believe it's planned in the future for foreign keys to support
> referenes to partitioned tables, at which point you could just DROP the monthly
> partition...but not supported right now.

the future is close, that's possible in 11 ;-)

Regards, Andreas

-- 
2ndQuadrant - The PostgreSQL Support Company.
www.2ndQuadrant.com



Re: trying to delete most of the table by range of date col

From
Mariel Cherkassky
Date:
Hi,
I already checked and on all the tables that uses the id col of the main table as a foreign key have index on that column.
I tried all the next 4 solutions : 

1)delete  from my_table where end_date <= to_date('12/12/2018','DD/MM/YYYY') and end_date > to_date('11/12/2018','DD/MM/YYYY');
 Execution time: 350603.047 ms ~ 5.8 minutes

2)DELETE FROM my_table WHERE id IN (select id from my_table where end_date <= to_date('12/12/2018','DD/MM/YYYY') and end_date > to_date('11/12/2018','DD/MM/YYYY'));
 Execution time: 333166.233 ms ~ 5.5 minutes

3) set temp_buffers='1GB';
SET

create temp table id_temp as select id from my_Table where end_date <= to_date('12/12/2018','DD/MM/YYYY') and end_date > to_date('11/12/2018','DD/MM/YYYY') ;
SELECT 1572864
Time: 2196.670 ms

 DELETE FROM my_table USING id_temp WHERE my_table.id = id_temp.id;
 Execution time: 459650.621 ms 7.6minutes

4)delete in chunks : 
do $$
declare 
rec integer;
begin
select count(*) from my_table into rec where end_date <= to_date('12/12/2018','DD/MM/YYYY') and end_date > to_date('11/12/2018','DD/MM/YYYY');
while rec > 0 loop
DELETE FROM my_Table WHERE id IN (select id from my_tablewhere end_date <= to_date('12/12/2018','DD/MM/YYYY') and end_date > to_date('11/12/2018','DD/MM/YYYY') limit 5000);
rec := rec - 5000;
raise notice '5000 records were deleted, current rows :%',rec;
end loop;

end;
$$
;

Execution time : 6 minutes.

So, it seems that the second solution is the fastest one. It there a reason why the delete chunks (solution 4) wasnt faster?

‫בתאריך יום ב׳, 3 בספט׳ 2018 ב-10:35 מאת ‪Andreas Kretschmer‬‏ <‪andreas@a-kretschmer.de‬‏>:‬


Am 03.09.2018 um 09:06 schrieb Justin Pryzby:
> Note, I believe it's planned in the future for foreign keys to support
> referenes to partitioned tables, at which point you could just DROP the monthly
> partition...but not supported right now.

the future is close, that's possible in 11 ;-)

Regards, Andreas

--
2ndQuadrant - The PostgreSQL Support Company.
www.2ndQuadrant.com


Re: trying to delete most of the table by range of date col

From
Sergei Kornilov
Date:
Hello

>  Delete on my_table  (cost=0.00..65183.30 rows=1573862 width=6) (actual time=5121.344..5121.344 rows=0 loops=1)
>    ->  Seq Scan on my_table  (cost=0.00..65183.30 rows=1573862 width=6) (actual time=0.012..2244.393 rows=1572864
loops=1)
>          Filter: ((end_date <= to_date('12/12/2018'::text, 'DD/MM/YYYY'::text)) AND (end_date >
to_date('11/12/2018'::text,'DD/MM/YYYY'::text)))
 
>          Rows Removed by Filter: 40253
>  Planning time: 0.210 ms
>  Trigger for constraint table1: time=14730.816 calls=1572864
>  Trigger for constraint table2: time=30718.084 calls=1572864
>  Trigger for constraint table3: time=28170.363 calls=1572864
>  Trigger for constraint table4: time=29573.681 calls=1572864
>  Trigger for constraint table5: time=29629.263 calls=1572864
>  Trigger for constraint table6: time=29628.489 calls=1572864
>  Trigger for constraint table7: time=29798.121 calls=1572864
>  Trigger for constraint table8: time=29645.705 calls=1572864
>  Trigger for constraint table9: time=29657.177 calls=1572864
>  Trigger for constraint table10: time=29487.054 calls=1572864
>  Trigger for constraint table11: time=30010.978 calls=1572864
>  Trigger for constraint table12: time=26383.924 calls=1572864
>  Execution time: 350603.047 ms

As you can see in "actual time" - delete was run only 5 sec. All the other time postgresql checked foreign keys
triggers.0,02ms per row seems adequate for index lookup.
 
It may be better drop foreign keys, delete data, and create foreign keys back.

regards, Sergei


Re: trying to delete most of the table by range of date col

From
Sergei Kornilov
Date:
Hello

>  Delete on my_table  (cost=0.00..65183.30 rows=1573862 width=6) (actual time=5121.344..5121.344 rows=0 loops=1)
>    ->  Seq Scan on my_table  (cost=0.00..65183.30 rows=1573862 width=6) (actual time=0.012..2244.393 rows=1572864
loops=1)
>          Filter: ((end_date <= to_date('12/12/2018'::text, 'DD/MM/YYYY'::text)) AND (end_date >
to_date('11/12/2018'::text,'DD/MM/YYYY'::text)))
 
>          Rows Removed by Filter: 40253
>  Planning time: 0.210 ms
>  Trigger for constraint table1: time=14730.816 calls=1572864
>  Trigger for constraint table2: time=30718.084 calls=1572864
>  Trigger for constraint table3: time=28170.363 calls=1572864
>  Trigger for constraint table4: time=29573.681 calls=1572864
>  Trigger for constraint table5: time=29629.263 calls=1572864
>  Trigger for constraint table6: time=29628.489 calls=1572864
>  Trigger for constraint table7: time=29798.121 calls=1572864
>  Trigger for constraint table8: time=29645.705 calls=1572864
>  Trigger for constraint table9: time=29657.177 calls=1572864
>  Trigger for constraint table10: time=29487.054 calls=1572864
>  Trigger for constraint table11: time=30010.978 calls=1572864
>  Trigger for constraint table12: time=26383.924 calls=1572864
>  Execution time: 350603.047 ms

As you can see in "actual time" - delete was run only 5 sec. All the other time postgresql checked foreign keys
triggers.0,02ms per row seems adequate for index lookup.
 
It may be better drop foreign keys, delete data, and create foreign keys back.

regards, Sergei


Re: trying to delete most of the table by range of date col

From
Mariel Cherkassky
Date:
Cant drop foreign keys, there are too much.

‫בתאריך יום ב׳, 3 בספט׳ 2018 ב-11:35 מאת ‪Sergei Kornilov‬‏ <‪sk@zsrv.org‬‏>:‬
Hello

>  Delete on my_table  (cost=0.00..65183.30 rows=1573862 width=6) (actual time=5121.344..5121.344 rows=0 loops=1)
>    ->  Seq Scan on my_table  (cost=0.00..65183.30 rows=1573862 width=6) (actual time=0.012..2244.393 rows=1572864 loops=1)
>          Filter: ((end_date <= to_date('12/12/2018'::text, 'DD/MM/YYYY'::text)) AND (end_date > to_date('11/12/2018'::text, 'DD/MM/YYYY'::text)))
>          Rows Removed by Filter: 40253
>  Planning time: 0.210 ms
>  Trigger for constraint table1: time=14730.816 calls=1572864
>  Trigger for constraint table2: time=30718.084 calls=1572864
>  Trigger for constraint table3: time=28170.363 calls=1572864
>  Trigger for constraint table4: time=29573.681 calls=1572864
>  Trigger for constraint table5: time=29629.263 calls=1572864
>  Trigger for constraint table6: time=29628.489 calls=1572864
>  Trigger for constraint table7: time=29798.121 calls=1572864
>  Trigger for constraint table8: time=29645.705 calls=1572864
>  Trigger for constraint table9: time=29657.177 calls=1572864
>  Trigger for constraint table10: time=29487.054 calls=1572864
>  Trigger for constraint table11: time=30010.978 calls=1572864
>  Trigger for constraint table12: time=26383.924 calls=1572864
>  Execution time: 350603.047 ms

As you can see in "actual time" - delete was run only 5 sec. All the other time postgresql checked foreign keys triggers. 0,02ms per row seems adequate for index lookup.
It may be better drop foreign keys, delete data, and create foreign keys back.

regards, Sergei

Re: trying to delete most of the table by range of date col

From
Mariel Cherkassky
Date:
Cant drop foreign keys, there are too much.

‫בתאריך יום ב׳, 3 בספט׳ 2018 ב-11:35 מאת ‪Sergei Kornilov‬‏ <‪sk@zsrv.org‬‏>:‬
Hello

>  Delete on my_table  (cost=0.00..65183.30 rows=1573862 width=6) (actual time=5121.344..5121.344 rows=0 loops=1)
>    ->  Seq Scan on my_table  (cost=0.00..65183.30 rows=1573862 width=6) (actual time=0.012..2244.393 rows=1572864 loops=1)
>          Filter: ((end_date <= to_date('12/12/2018'::text, 'DD/MM/YYYY'::text)) AND (end_date > to_date('11/12/2018'::text, 'DD/MM/YYYY'::text)))
>          Rows Removed by Filter: 40253
>  Planning time: 0.210 ms
>  Trigger for constraint table1: time=14730.816 calls=1572864
>  Trigger for constraint table2: time=30718.084 calls=1572864
>  Trigger for constraint table3: time=28170.363 calls=1572864
>  Trigger for constraint table4: time=29573.681 calls=1572864
>  Trigger for constraint table5: time=29629.263 calls=1572864
>  Trigger for constraint table6: time=29628.489 calls=1572864
>  Trigger for constraint table7: time=29798.121 calls=1572864
>  Trigger for constraint table8: time=29645.705 calls=1572864
>  Trigger for constraint table9: time=29657.177 calls=1572864
>  Trigger for constraint table10: time=29487.054 calls=1572864
>  Trigger for constraint table11: time=30010.978 calls=1572864
>  Trigger for constraint table12: time=26383.924 calls=1572864
>  Execution time: 350603.047 ms

As you can see in "actual time" - delete was run only 5 sec. All the other time postgresql checked foreign keys triggers. 0,02ms per row seems adequate for index lookup.
It may be better drop foreign keys, delete data, and create foreign keys back.

regards, Sergei

Re: trying to delete most of the table by range of date col

From
Justin Pryzby
Date:
On Mon, Sep 03, 2018 at 11:17:58AM +0300, Mariel Cherkassky wrote:
> Hi,
> I already checked and on all the tables that uses the id col of the main
> table as a foreign key have index on that column.
> 
> So, it seems that the second solution is the fastest one. It there a reason
> why the delete chunks (solution 4) wasnt faster?

I suggest running:

SET track_io_timing=on; -- requires superuser
explain(ANALYZE,BUFFERS) DELETE [...]

https://wiki.postgresql.org/wiki/Slow_Query_Questions

Maybe you just need larger shared_buffers ?

Justin


Re: trying to delete most of the table by range of date col

From
Mariel Cherkassky
Date:
I checked, the results : 

1)explain (analyze,buffers) delete  from my_table where end_date <= to_date('12/12/2018','DD/MM/YYYY') and end_date > to_date('11/12/2018','DD/MM/YYYY');


                                                                    QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------
 Delete on my_table  (cost=0.00..97294.80 rows=1571249 width=6) (actual time=4706.791..4706.791 rows=0 loops=1)
   Buffers: shared hit=3242848
   ->  Seq Scan on my_table  (cost=0.00..97294.80 rows=1571249 width=6) (actual time=0.022..2454.686 rows=1572864 loops=1)
         Filter: ((end_date <= to_date('12/12/2018'::text, 'DD/MM/YYYY'::text)) AND (end_date > to_date('11/12/2018'::text, 'DD/MM/YYYY'::text)))
         Rows Removed by Filter: 40253
         Buffers: shared hit=65020(*8k/1024)=507MB
 Planning time: 0.182 ms

2)explain (analyze,buffers) DELETE FROM my_table WHERE id IN (select id from my_table where end_date <= to_date('12/12/2018','DD/MM/YYYY') and end_date > to_date('11/12/2018','DD/MM/YYYY'));
                                                                       QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------
 Delete on my_table  (cost=108908.17..252425.01 rows=1559172 width=12) (actual time=11168.090..11168.090 rows=0 loops=1)
   Buffers: shared hit=3307869 dirtied=13804, temp read=13656 written=13594
   ->  Hash Join  (cost=108908.17..252425.01 rows=1559172 width=12) (actual time=1672.222..6401.288 rows=1572864 loops=1)
         Hash Cond: (my_table_1.id = my_table.id)
         Buffers: shared hit=130040, temp read=13656 written=13594
         ->  Seq Scan on my_table my_table_1  (cost=0.00..97075.26 rows=1559172 width=14) (actual time=0.008..2474.671 rows=1572864 loops=1)
               Filter: ((end_date <= to_date('12/12/2018'::text, 'DD/MM/YYYY'::text)) AND (end_date > to_date('11/12/2018'::text, 'DD/MM/YYYY'::text)))
               Rows Removed by Filter: 40253
               Buffers: shared hit=65020
         ->  Hash  (cost=81047.63..81047.63 rows=1602763 width=14) (actual time=1671.613..1671.613 rows=1613117 loops=1)
               Buckets: 131072  Batches: 32  Memory Usage: 3392kB
               Buffers: shared hit=65020, temp written=6852
               ->  Seq Scan on my_table  (cost=0.00..81047.63 rows=1602763 width=14) (actual time=0.003..778.311 rows=1613117 loops=1)
                     Buffers: shared hit=65020


3)explain (analyze,buffers) DELETE FROM my_table my_table USING id_test WHERE my_table.id = id_test.id;


                                                                    QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------
 Delete on my_table my_table  (cost=109216.05..178743.05 rows=1572960 width=12) (actual time=7307.465..7307.465 rows=0 loops=1)
   Buffers: shared hit=3210748, local hit=6960, temp read=13656 written=13594
   ->  Hash Join  (cost=109216.05..178743.05 rows=1572960 width=12) (actual time=1636.744..4489.246 rows=1572864 loops=1)
         Hash Cond: (id_test.id = my_table.id)
         Buffers: shared hit=65020, local hit=6960, temp read=13656 written=13594
         ->  Seq Scan on id_test(cost=0.00..22689.60 rows=1572960 width=14) (actual time=0.009..642.859 rows=1572864 loops=1)
               Buffers: local hit=6960
         ->  Hash  (cost=81160.02..81160.02 rows=1614002 width=14) (actual time=1636.228..1636.228 rows=1613117 loops=1)
               Buckets: 131072  Batches: 32  Memory Usage: 3392kB
               Buffers: shared hit=65020, temp written=6852
               ->  Seq Scan on my_table my_table  (cost=0.00..81160.02 rows=1614002 width=14) (actual time=0.297..815.133 rows=1613117 loops=1)
                     Buffers: shared hit=65020


I restarted the cluster after running every query.


‫בתאריך יום ב׳, 3 בספט׳ 2018 ב-12:23 מאת ‪Justin Pryzby‬‏ <‪pryzby@telsasoft.com‬‏>:‬
On Mon, Sep 03, 2018 at 11:17:58AM +0300, Mariel Cherkassky wrote:
> Hi,
> I already checked and on all the tables that uses the id col of the main
> table as a foreign key have index on that column.
>
> So, it seems that the second solution is the fastest one. It there a reason
> why the delete chunks (solution 4) wasnt faster?

I suggest running:

SET track_io_timing=on; -- requires superuser
explain(ANALYZE,BUFFERS) DELETE [...]

https://wiki.postgresql.org/wiki/Slow_Query_Questions

Maybe you just need larger shared_buffers ?

Justin

Re: trying to delete most of the table by range of date col

From
Carrie Berlin
Date:
This is a terribley inflexible design, why so many foreign keys?  If the table requires removing data, rebuild with partitions.  Parent keys should be in reference tables, not in fact table. 

On Mon, Sep 3, 2018 at 04:51 Mariel Cherkassky <mariel.cherkassky@gmail.com> wrote:
Cant drop foreign keys, there are too much.

‫בתאריך יום ב׳, 3 בספט׳ 2018 ב-11:35 מאת ‪Sergei Kornilov‬‏ <‪sk@zsrv.org‬‏>:‬
Hello

>  Delete on my_table  (cost=0.00..65183.30 rows=1573862 width=6) (actual time=5121.344..5121.344 rows=0 loops=1)
>    ->  Seq Scan on my_table  (cost=0.00..65183.30 rows=1573862 width=6) (actual time=0.012..2244.393 rows=1572864 loops=1)
>          Filter: ((end_date <= to_date('12/12/2018'::text, 'DD/MM/YYYY'::text)) AND (end_date > to_date('11/12/2018'::text, 'DD/MM/YYYY'::text)))
>          Rows Removed by Filter: 40253
>  Planning time: 0.210 ms
>  Trigger for constraint table1: time=14730.816 calls=1572864
>  Trigger for constraint table2: time=30718.084 calls=1572864
>  Trigger for constraint table3: time=28170.363 calls=1572864
>  Trigger for constraint table4: time=29573.681 calls=1572864
>  Trigger for constraint table5: time=29629.263 calls=1572864
>  Trigger for constraint table6: time=29628.489 calls=1572864
>  Trigger for constraint table7: time=29798.121 calls=1572864
>  Trigger for constraint table8: time=29645.705 calls=1572864
>  Trigger for constraint table9: time=29657.177 calls=1572864
>  Trigger for constraint table10: time=29487.054 calls=1572864
>  Trigger for constraint table11: time=30010.978 calls=1572864
>  Trigger for constraint table12: time=26383.924 calls=1572864
>  Execution time: 350603.047 ms

As you can see in "actual time" - delete was run only 5 sec. All the other time postgresql checked foreign keys triggers. 0,02ms per row seems adequate for index lookup.
It may be better drop foreign keys, delete data, and create foreign keys back.

regards, Sergei

Re: trying to delete most of the table by range of date col

From
Carrie Berlin
Date:
This is a terribley inflexible design, why so many foreign keys?  If the table requires removing data, rebuild with partitions.  Parent keys should be in reference tables, not in fact table. 

On Mon, Sep 3, 2018 at 04:51 Mariel Cherkassky <mariel.cherkassky@gmail.com> wrote:
Cant drop foreign keys, there are too much.

‫בתאריך יום ב׳, 3 בספט׳ 2018 ב-11:35 מאת ‪Sergei Kornilov‬‏ <‪sk@zsrv.org‬‏>:‬
Hello

>  Delete on my_table  (cost=0.00..65183.30 rows=1573862 width=6) (actual time=5121.344..5121.344 rows=0 loops=1)
>    ->  Seq Scan on my_table  (cost=0.00..65183.30 rows=1573862 width=6) (actual time=0.012..2244.393 rows=1572864 loops=1)
>          Filter: ((end_date <= to_date('12/12/2018'::text, 'DD/MM/YYYY'::text)) AND (end_date > to_date('11/12/2018'::text, 'DD/MM/YYYY'::text)))
>          Rows Removed by Filter: 40253
>  Planning time: 0.210 ms
>  Trigger for constraint table1: time=14730.816 calls=1572864
>  Trigger for constraint table2: time=30718.084 calls=1572864
>  Trigger for constraint table3: time=28170.363 calls=1572864
>  Trigger for constraint table4: time=29573.681 calls=1572864
>  Trigger for constraint table5: time=29629.263 calls=1572864
>  Trigger for constraint table6: time=29628.489 calls=1572864
>  Trigger for constraint table7: time=29798.121 calls=1572864
>  Trigger for constraint table8: time=29645.705 calls=1572864
>  Trigger for constraint table9: time=29657.177 calls=1572864
>  Trigger for constraint table10: time=29487.054 calls=1572864
>  Trigger for constraint table11: time=30010.978 calls=1572864
>  Trigger for constraint table12: time=26383.924 calls=1572864
>  Execution time: 350603.047 ms

As you can see in "actual time" - delete was run only 5 sec. All the other time postgresql checked foreign keys triggers. 0,02ms per row seems adequate for index lookup.
It may be better drop foreign keys, delete data, and create foreign keys back.

regards, Sergei

Re: trying to delete most of the table by range of date col

From
Mariel Cherkassky
Date:
I'm not responsible for this design but I'm trying to improve it. Using partition isnt an option because partitions doesnt support foreign key. Moreover,  most queries on all those tables uses the id col of the main table. 

‫בתאריך יום ב׳, 3 בספט׳ 2018 ב-14:09 מאת ‪Carrie Berlin‬‏ <‪berlincarrie@gmail.com‬‏>:‬
This is a terribley inflexible design, why so many foreign keys?  If the table requires removing data, rebuild with partitions.  Parent keys should be in reference tables, not in fact table. 

On Mon, Sep 3, 2018 at 04:51 Mariel Cherkassky <mariel.cherkassky@gmail.com> wrote:
Cant drop foreign keys, there are too much.

‫בתאריך יום ב׳, 3 בספט׳ 2018 ב-11:35 מאת ‪Sergei Kornilov‬‏ <‪sk@zsrv.org‬‏>:‬
Hello

>  Delete on my_table  (cost=0.00..65183.30 rows=1573862 width=6) (actual time=5121.344..5121.344 rows=0 loops=1)
>    ->  Seq Scan on my_table  (cost=0.00..65183.30 rows=1573862 width=6) (actual time=0.012..2244.393 rows=1572864 loops=1)
>          Filter: ((end_date <= to_date('12/12/2018'::text, 'DD/MM/YYYY'::text)) AND (end_date > to_date('11/12/2018'::text, 'DD/MM/YYYY'::text)))
>          Rows Removed by Filter: 40253
>  Planning time: 0.210 ms
>  Trigger for constraint table1: time=14730.816 calls=1572864
>  Trigger for constraint table2: time=30718.084 calls=1572864
>  Trigger for constraint table3: time=28170.363 calls=1572864
>  Trigger for constraint table4: time=29573.681 calls=1572864
>  Trigger for constraint table5: time=29629.263 calls=1572864
>  Trigger for constraint table6: time=29628.489 calls=1572864
>  Trigger for constraint table7: time=29798.121 calls=1572864
>  Trigger for constraint table8: time=29645.705 calls=1572864
>  Trigger for constraint table9: time=29657.177 calls=1572864
>  Trigger for constraint table10: time=29487.054 calls=1572864
>  Trigger for constraint table11: time=30010.978 calls=1572864
>  Trigger for constraint table12: time=26383.924 calls=1572864
>  Execution time: 350603.047 ms

As you can see in "actual time" - delete was run only 5 sec. All the other time postgresql checked foreign keys triggers. 0,02ms per row seems adequate for index lookup.
It may be better drop foreign keys, delete data, and create foreign keys back.

regards, Sergei

Re: trying to delete most of the table by range of date col

From
Mariel Cherkassky
Date:
I'm not responsible for this design but I'm trying to improve it. Using partition isnt an option because partitions doesnt support foreign key. Moreover,  most queries on all those tables uses the id col of the main table. 

‫בתאריך יום ב׳, 3 בספט׳ 2018 ב-14:09 מאת ‪Carrie Berlin‬‏ <‪berlincarrie@gmail.com‬‏>:‬
This is a terribley inflexible design, why so many foreign keys?  If the table requires removing data, rebuild with partitions.  Parent keys should be in reference tables, not in fact table. 

On Mon, Sep 3, 2018 at 04:51 Mariel Cherkassky <mariel.cherkassky@gmail.com> wrote:
Cant drop foreign keys, there are too much.

‫בתאריך יום ב׳, 3 בספט׳ 2018 ב-11:35 מאת ‪Sergei Kornilov‬‏ <‪sk@zsrv.org‬‏>:‬
Hello

>  Delete on my_table  (cost=0.00..65183.30 rows=1573862 width=6) (actual time=5121.344..5121.344 rows=0 loops=1)
>    ->  Seq Scan on my_table  (cost=0.00..65183.30 rows=1573862 width=6) (actual time=0.012..2244.393 rows=1572864 loops=1)
>          Filter: ((end_date <= to_date('12/12/2018'::text, 'DD/MM/YYYY'::text)) AND (end_date > to_date('11/12/2018'::text, 'DD/MM/YYYY'::text)))
>          Rows Removed by Filter: 40253
>  Planning time: 0.210 ms
>  Trigger for constraint table1: time=14730.816 calls=1572864
>  Trigger for constraint table2: time=30718.084 calls=1572864
>  Trigger for constraint table3: time=28170.363 calls=1572864
>  Trigger for constraint table4: time=29573.681 calls=1572864
>  Trigger for constraint table5: time=29629.263 calls=1572864
>  Trigger for constraint table6: time=29628.489 calls=1572864
>  Trigger for constraint table7: time=29798.121 calls=1572864
>  Trigger for constraint table8: time=29645.705 calls=1572864
>  Trigger for constraint table9: time=29657.177 calls=1572864
>  Trigger for constraint table10: time=29487.054 calls=1572864
>  Trigger for constraint table11: time=30010.978 calls=1572864
>  Trigger for constraint table12: time=26383.924 calls=1572864
>  Execution time: 350603.047 ms

As you can see in "actual time" - delete was run only 5 sec. All the other time postgresql checked foreign keys triggers. 0,02ms per row seems adequate for index lookup.
It may be better drop foreign keys, delete data, and create foreign keys back.

regards, Sergei

Re: trying to delete most of the table by range of date col

From
Carrie Berlin
Date:
Hi
I understand about having to deal with a bad design.  How big is the table "select pg_size_pretty(pg_table_size(table_name)).?  If the table is not that large relative to the IOPS on your disk system, another solution is to add a binary column IS_DELETED to the table and modify the queries that hit the table to exclude rows where IS_DELETED=y.  Also you need an index on this column.  I did this with a user table that was a parent table to 120 data tables and users could not be dropped from the system.

On Mon, Sep 3, 2018 at 7:19 AM Mariel Cherkassky <mariel.cherkassky@gmail.com> wrote:
I'm not responsible for this design but I'm trying to improve it. Using partition isnt an option because partitions doesnt support foreign key. Moreover,  most queries on all those tables uses the id col of the main table. 

‫בתאריך יום ב׳, 3 בספט׳ 2018 ב-14:09 מאת ‪Carrie Berlin‬‏ <‪berlincarrie@gmail.com‬‏>:‬
This is a terribley inflexible design, why so many foreign keys?  If the table requires removing data, rebuild with partitions.  Parent keys should be in reference tables, not in fact table. 

On Mon, Sep 3, 2018 at 04:51 Mariel Cherkassky <mariel.cherkassky@gmail.com> wrote:
Cant drop foreign keys, there are too much.

‫בתאריך יום ב׳, 3 בספט׳ 2018 ב-11:35 מאת ‪Sergei Kornilov‬‏ <‪sk@zsrv.org‬‏>:‬
Hello

>  Delete on my_table  (cost=0.00..65183.30 rows=1573862 width=6) (actual time=5121.344..5121.344 rows=0 loops=1)
>    ->  Seq Scan on my_table  (cost=0.00..65183.30 rows=1573862 width=6) (actual time=0.012..2244.393 rows=1572864 loops=1)
>          Filter: ((end_date <= to_date('12/12/2018'::text, 'DD/MM/YYYY'::text)) AND (end_date > to_date('11/12/2018'::text, 'DD/MM/YYYY'::text)))
>          Rows Removed by Filter: 40253
>  Planning time: 0.210 ms
>  Trigger for constraint table1: time=14730.816 calls=1572864
>  Trigger for constraint table2: time=30718.084 calls=1572864
>  Trigger for constraint table3: time=28170.363 calls=1572864
>  Trigger for constraint table4: time=29573.681 calls=1572864
>  Trigger for constraint table5: time=29629.263 calls=1572864
>  Trigger for constraint table6: time=29628.489 calls=1572864
>  Trigger for constraint table7: time=29798.121 calls=1572864
>  Trigger for constraint table8: time=29645.705 calls=1572864
>  Trigger for constraint table9: time=29657.177 calls=1572864
>  Trigger for constraint table10: time=29487.054 calls=1572864
>  Trigger for constraint table11: time=30010.978 calls=1572864
>  Trigger for constraint table12: time=26383.924 calls=1572864
>  Execution time: 350603.047 ms

As you can see in "actual time" - delete was run only 5 sec. All the other time postgresql checked foreign keys triggers. 0,02ms per row seems adequate for index lookup.
It may be better drop foreign keys, delete data, and create foreign keys back.

regards, Sergei

Re: trying to delete most of the table by range of date col

From
Carrie Berlin
Date:
Hi
I understand about having to deal with a bad design.  How big is the table "select pg_size_pretty(pg_table_size(table_name)).?  If the table is not that large relative to the IOPS on your disk system, another solution is to add a binary column IS_DELETED to the table and modify the queries that hit the table to exclude rows where IS_DELETED=y.  Also you need an index on this column.  I did this with a user table that was a parent table to 120 data tables and users could not be dropped from the system.

On Mon, Sep 3, 2018 at 7:19 AM Mariel Cherkassky <mariel.cherkassky@gmail.com> wrote:
I'm not responsible for this design but I'm trying to improve it. Using partition isnt an option because partitions doesnt support foreign key. Moreover,  most queries on all those tables uses the id col of the main table. 

‫בתאריך יום ב׳, 3 בספט׳ 2018 ב-14:09 מאת ‪Carrie Berlin‬‏ <‪berlincarrie@gmail.com‬‏>:‬
This is a terribley inflexible design, why so many foreign keys?  If the table requires removing data, rebuild with partitions.  Parent keys should be in reference tables, not in fact table. 

On Mon, Sep 3, 2018 at 04:51 Mariel Cherkassky <mariel.cherkassky@gmail.com> wrote:
Cant drop foreign keys, there are too much.

‫בתאריך יום ב׳, 3 בספט׳ 2018 ב-11:35 מאת ‪Sergei Kornilov‬‏ <‪sk@zsrv.org‬‏>:‬
Hello

>  Delete on my_table  (cost=0.00..65183.30 rows=1573862 width=6) (actual time=5121.344..5121.344 rows=0 loops=1)
>    ->  Seq Scan on my_table  (cost=0.00..65183.30 rows=1573862 width=6) (actual time=0.012..2244.393 rows=1572864 loops=1)
>          Filter: ((end_date <= to_date('12/12/2018'::text, 'DD/MM/YYYY'::text)) AND (end_date > to_date('11/12/2018'::text, 'DD/MM/YYYY'::text)))
>          Rows Removed by Filter: 40253
>  Planning time: 0.210 ms
>  Trigger for constraint table1: time=14730.816 calls=1572864
>  Trigger for constraint table2: time=30718.084 calls=1572864
>  Trigger for constraint table3: time=28170.363 calls=1572864
>  Trigger for constraint table4: time=29573.681 calls=1572864
>  Trigger for constraint table5: time=29629.263 calls=1572864
>  Trigger for constraint table6: time=29628.489 calls=1572864
>  Trigger for constraint table7: time=29798.121 calls=1572864
>  Trigger for constraint table8: time=29645.705 calls=1572864
>  Trigger for constraint table9: time=29657.177 calls=1572864
>  Trigger for constraint table10: time=29487.054 calls=1572864
>  Trigger for constraint table11: time=30010.978 calls=1572864
>  Trigger for constraint table12: time=26383.924 calls=1572864
>  Execution time: 350603.047 ms

As you can see in "actual time" - delete was run only 5 sec. All the other time postgresql checked foreign keys triggers. 0,02ms per row seems adequate for index lookup.
It may be better drop foreign keys, delete data, and create foreign keys back.

regards, Sergei

Re: trying to delete most of the table by range of date col

From
Jeff Janes
Date:




4)delete in chunks : 
do $$
declare 
rec integer;
begin
select count(*) from my_table into rec where end_date <= to_date('12/12/2018','DD/MM/YYYY') and end_date > to_date('11/12/2018','DD/MM/YYYY');
while rec > 0 loop
DELETE FROM my_Table WHERE id IN (select id from my_tablewhere end_date <= to_date('12/12/2018','DD/MM/YYYY') and end_date > to_date('11/12/2018','DD/MM/YYYY') limit 5000);
rec := rec - 5000;
raise notice '5000 records were deleted, current rows :%',rec;
end loop;

end;
$$
;

Execution time : 6 minutes.

So, it seems that the second solution is the fastest one. It there a reason why the delete chunks (solution 4) wasnt faster?

Why would it be faster?  The same amount of work needs to get done, no matter how you slice it.  Unless there is a specific reason to think it would be faster, I would expect it won't be.

If you aren't willing to drop the constraints, then I think you just need to resign yourself to paying the price of checking those constraints. Maybe some future version of PostgreSQL will be able to do them in parallel.

Cheers,

Jeff

Re: trying to delete most of the table by range of date col

From
Mariel Cherkassky
Date:
Hi jefff,
I tried every solution that I checked on net. I cant disable foreign keys or indexes.

Trying to have better performance by just changing the query / changing parameters.

‫בתאריך יום ב׳, 3 בספט׳ 2018 ב-18:25 מאת ‪Jeff Janes‬‏ <‪jeff.janes@gmail.com‬‏>:‬




4)delete in chunks : 
do $$
declare 
rec integer;
begin
select count(*) from my_table into rec where end_date <= to_date('12/12/2018','DD/MM/YYYY') and end_date > to_date('11/12/2018','DD/MM/YYYY');
while rec > 0 loop
DELETE FROM my_Table WHERE id IN (select id from my_tablewhere end_date <= to_date('12/12/2018','DD/MM/YYYY') and end_date > to_date('11/12/2018','DD/MM/YYYY') limit 5000);
rec := rec - 5000;
raise notice '5000 records were deleted, current rows :%',rec;
end loop;

end;
$$
;

Execution time : 6 minutes.

So, it seems that the second solution is the fastest one. It there a reason why the delete chunks (solution 4) wasnt faster?

Why would it be faster?  The same amount of work needs to get done, no matter how you slice it.  Unless there is a specific reason to think it would be faster, I would expect it won't be.

If you aren't willing to drop the constraints, then I think you just need to resign yourself to paying the price of checking those constraints. Maybe some future version of PostgreSQL will be able to do them in parallel.

Cheers,

Jeff