Re: Query hitting empty tables taking 48 minutes - Mailing list pgsql-general

From Robert Creager
Subject Re: Query hitting empty tables taking 48 minutes
Date
Msg-id D36E44EC-C3DC-4524-BC2C-8FAECD123F60@logicalchaos.org
Whole thread Raw
In response to Query hitting empty tables taking 48 minutes  (Robert Creager <robert@logicalchaos.org>)
Responses Re: Query hitting empty tables taking 48 minutes  (Laurenz Albe <laurenz.albe@cybertec.at>)
List pgsql-general


On Jun 7, 2018, at 4:18 PM, Robert <domain@logicalchaos.org> wrote:

You can capture the execution plan of the bad statement by using auto_explain,
that would certainly shed more light on the problem.


A different query started showing up as the problem, the auto_explain with analyze shows an oddity, the total query duration is 11k seconds, while the query itself is 3 seconds.  I captured a ZFS snapshot as soon as the problem was noticed.

Jun 16 23:15:30 blackpearl postgres[9860]: [79-1] db=tapesystem,user=Administrator,app=[unknown],client=127.0.0.1 LOG:  duration: 10856644.336 ms  plan:                                               
Jun 16 23:15:30 blackpearl postgres[9860]: [79-2]       Query Text: DELETE FROM ds3.blob WHERE EXISTS (SELECT * FROM ds3.s3_object WHERE id = ds3.blob.object_id AND (bucket_id = $1))                 
Jun 16 23:15:30 blackpearl postgres[9860]: [79-3]       Delete on blob  (cost=1308.79..1312.82 rows=1 width=12) (actual time=3465.919..3465.919 rows=0 loops=1)                                        
Jun 16 23:15:30 blackpearl postgres[9860]: [79-4]         ->  Nested Loop  (cost=1308.79..1312.82 rows=1 width=12) (actual time=50.293..2435.271 rows=300000 loops=1)                                  
Jun 16 23:15:30 blackpearl postgres[9860]: [79-5]               ->  Bitmap Heap Scan on s3_object  (cost=634.39..636.41 rows=1 width=22) (actual time=50.269..153.885 rows=300000 loops=1)             
Jun 16 23:15:30 blackpearl postgres[9860]: [79-6]                     Recheck Cond: (bucket_id = 'bc6e6b10-80ad-4329-9fb9-1a66d8c1505e'::uuid)                                                         
Jun 16 23:15:30 blackpearl postgres[9860]: [79-7]                     Heap Blocks: exact=3704
Jun 16 23:15:30 blackpearl postgres[9860]: [79-8]                     ->  Bitmap Index Scan on ds3_s3_object__bucket_id  (cost=0.00..634.39 rows=1 width=0) (actual time=49.552..49.552 rows=300000 loops=1)
Jun 16 23:15:30 blackpearl postgres[9860]: [79-9]                           Index Cond: (bucket_id = 'bc6e6b10-80ad-4329-9fb9-1a66d8c1505e'::uuid)                                                     
Jun 16 23:15:30 blackpearl postgres[9860]: [79-10]              ->  Bitmap Heap Scan on blob  (cost=674.39..676.41 rows=1 width=22) (actual time=0.005..0.006 rows=1 loops=300000)
Jun 16 23:15:30 blackpearl postgres[9860]: [79-11]                    Recheck Cond: (object_id = s3_object.id)
Jun 16 23:15:30 blackpearl postgres[9860]: [79-12]                    Heap Blocks: exact=300000
Jun 16 23:15:30 blackpearl postgres[9860]: [79-13]                    ->  Bitmap Index Scan on ds3_blob__object_id  (cost=0.00..674.39 rows=1 width=0) (actual time=0.004..0.004 rows=1 loops=300000)
Jun 16 23:15:30 blackpearl postgres[9860]: [79-14]                          Index Cond: (object_id = s3_object.id)


Doing a ZFS rollback and executing the query shows what is happening, although not to the extent above.  If I read this correctly, it’s the constraint checks that are causing the query to take so long.  I don’t see any server configuration that might allow those checks to be parallelized.  Is that possible?

tapesystem=# PREPARE foo(uuid) AS DELETE FROM ds3.blob WHERE EXISTS (SELECT * FROM ds3.s3_object WHERE id = ds3.blob.object_id AND (bucket_id = $1)); EXPLAIN ANALYZE EXECUTE foo('bc6e6b10-80ad-4329-9fb9-1a66d8c1505e');
PREPARE
                                                           QUERY PLAN                                                                                                                                  
--------------------------------------------------------------------------------------------------------------------------------                                                                       
 Delete on blob  (cost=9555.07..21134.01 rows=299993 width=12) (actual time=1516.140..1516.140 rows=0 loops=1)
   ->  Hash Join  (cost=9555.07..21134.01 rows=299993 width=12) (actual time=237.816..621.306 rows=300000 loops=1)
         Hash Cond: (s3_object.id = blob.object_id)
         ->  Seq Scan on s3_object  (cost=0.00..7454.04 rows=299993 width=22) (actual time=0.027..148.503 rows=300000 loops=1)                                                                         
               Filter: (bucket_id = 'bc6e6b10-80ad-4329-9fb9-1a66d8c1505e'::uuid)
               Rows Removed by Filter: 3
         ->  Hash  (cost=5805.03..5805.03 rows=300003 width=22) (actual time=235.219..235.219 rows=300003 loops=1)
               Buckets: 524288  Batches: 1  Memory Usage: 19917kB
               ->  Seq Scan on blob  (cost=0.00..5805.03 rows=300003 width=22) (actual time=0.038..114.107 rows=300003 loops=1)                                                                        
 Trigger for constraint blob_tape_blob_id_fkey: time=5389.627 calls=300000
 Trigger for constraint multi_part_upload_placeholder_blob_id_fkey: time=4566.305 calls=300000
 Trigger for constraint multi_part_upload_part_blob_id_fkey: time=3597.896 calls=300000
 Trigger for constraint blob_pool_blob_id_fkey: time=4631.851 calls=300000
 Trigger for constraint blob_target_blob_id_fkey: time=4688.731 calls=300000
 Trigger for constraint suspect_blob_tape_blob_id_fkey: time=4473.787 calls=300000
 Trigger for constraint suspect_blob_pool_blob_id_fkey: time=4475.286 calls=300000
 Trigger for constraint suspect_blob_target_blob_id_fkey: time=4353.237 calls=300000
 Trigger for constraint blob_s3_target_blob_id_fkey: time=4451.687 calls=300000
 Trigger for constraint blob_azure_target_blob_id_fkey: time=4448.613 calls=300000
 Trigger for constraint suspect_blob_azure_target_blob_id_fkey: time=4353.467 calls=300000
 Execution time: 85175.581 ms

I executed a vacuum analyze, then a vacuum full analyze, neither changed the timing.  Other than removing constraints, is there any way to address this?

Best,
Robert

pgsql-general by date:

Previous
From: Andres Freund
Date:
Subject: Re: What to do when dynamic shared memory control segment is corrupt
Next
From: Tom Lane
Date:
Subject: Re: What to do when dynamic shared memory control segment is corrupt