Re: trying to delete most of the table by range of date col - Mailing list pgsql-performance

From Mariel Cherkassky
Subject Re: trying to delete most of the table by range of date col
Date
Msg-id CA+t6e1=Q-am1hZenTRH3yJKMJZOeF1Z+KG4RPqX2gz4u8vU8Yw@mail.gmail.com
Whole thread Raw
In response to Re: trying to delete most of the table by range of date col  (Justin Pryzby <pryzby@telsasoft.com>)
List pgsql-performance
I checked, the results : 

1)explain (analyze,buffers) delete  from my_table where end_date <= to_date('12/12/2018','DD/MM/YYYY') and end_date > to_date('11/12/2018','DD/MM/YYYY');


                                                                    QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------
 Delete on my_table  (cost=0.00..97294.80 rows=1571249 width=6) (actual time=4706.791..4706.791 rows=0 loops=1)
   Buffers: shared hit=3242848
   ->  Seq Scan on my_table  (cost=0.00..97294.80 rows=1571249 width=6) (actual time=0.022..2454.686 rows=1572864 loops=1)
         Filter: ((end_date <= to_date('12/12/2018'::text, 'DD/MM/YYYY'::text)) AND (end_date > to_date('11/12/2018'::text, 'DD/MM/YYYY'::text)))
         Rows Removed by Filter: 40253
         Buffers: shared hit=65020(*8k/1024)=507MB
 Planning time: 0.182 ms

2)explain (analyze,buffers) DELETE FROM my_table WHERE id IN (select id from my_table where end_date <= to_date('12/12/2018','DD/MM/YYYY') and end_date > to_date('11/12/2018','DD/MM/YYYY'));
                                                                       QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------
 Delete on my_table  (cost=108908.17..252425.01 rows=1559172 width=12) (actual time=11168.090..11168.090 rows=0 loops=1)
   Buffers: shared hit=3307869 dirtied=13804, temp read=13656 written=13594
   ->  Hash Join  (cost=108908.17..252425.01 rows=1559172 width=12) (actual time=1672.222..6401.288 rows=1572864 loops=1)
         Hash Cond: (my_table_1.id = my_table.id)
         Buffers: shared hit=130040, temp read=13656 written=13594
         ->  Seq Scan on my_table my_table_1  (cost=0.00..97075.26 rows=1559172 width=14) (actual time=0.008..2474.671 rows=1572864 loops=1)
               Filter: ((end_date <= to_date('12/12/2018'::text, 'DD/MM/YYYY'::text)) AND (end_date > to_date('11/12/2018'::text, 'DD/MM/YYYY'::text)))
               Rows Removed by Filter: 40253
               Buffers: shared hit=65020
         ->  Hash  (cost=81047.63..81047.63 rows=1602763 width=14) (actual time=1671.613..1671.613 rows=1613117 loops=1)
               Buckets: 131072  Batches: 32  Memory Usage: 3392kB
               Buffers: shared hit=65020, temp written=6852
               ->  Seq Scan on my_table  (cost=0.00..81047.63 rows=1602763 width=14) (actual time=0.003..778.311 rows=1613117 loops=1)
                     Buffers: shared hit=65020


3)explain (analyze,buffers) DELETE FROM my_table my_table USING id_test WHERE my_table.id = id_test.id;


                                                                    QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------
 Delete on my_table my_table  (cost=109216.05..178743.05 rows=1572960 width=12) (actual time=7307.465..7307.465 rows=0 loops=1)
   Buffers: shared hit=3210748, local hit=6960, temp read=13656 written=13594
   ->  Hash Join  (cost=109216.05..178743.05 rows=1572960 width=12) (actual time=1636.744..4489.246 rows=1572864 loops=1)
         Hash Cond: (id_test.id = my_table.id)
         Buffers: shared hit=65020, local hit=6960, temp read=13656 written=13594
         ->  Seq Scan on id_test(cost=0.00..22689.60 rows=1572960 width=14) (actual time=0.009..642.859 rows=1572864 loops=1)
               Buffers: local hit=6960
         ->  Hash  (cost=81160.02..81160.02 rows=1614002 width=14) (actual time=1636.228..1636.228 rows=1613117 loops=1)
               Buckets: 131072  Batches: 32  Memory Usage: 3392kB
               Buffers: shared hit=65020, temp written=6852
               ->  Seq Scan on my_table my_table  (cost=0.00..81160.02 rows=1614002 width=14) (actual time=0.297..815.133 rows=1613117 loops=1)
                     Buffers: shared hit=65020


I restarted the cluster after running every query.


‫בתאריך יום ב׳, 3 בספט׳ 2018 ב-12:23 מאת ‪Justin Pryzby‬‏ <‪pryzby@telsasoft.com‬‏>:‬
On Mon, Sep 03, 2018 at 11:17:58AM +0300, Mariel Cherkassky wrote:
> Hi,
> I already checked and on all the tables that uses the id col of the main
> table as a foreign key have index on that column.
>
> So, it seems that the second solution is the fastest one. It there a reason
> why the delete chunks (solution 4) wasnt faster?

I suggest running:

SET track_io_timing=on; -- requires superuser
explain(ANALYZE,BUFFERS) DELETE [...]

https://wiki.postgresql.org/wiki/Slow_Query_Questions

Maybe you just need larger shared_buffers ?

Justin

pgsql-performance by date:

Previous
From: Justin Pryzby
Date:
Subject: Re: trying to delete most of the table by range of date col
Next
From: Carrie Berlin
Date:
Subject: Re: trying to delete most of the table by range of date col