Re: BUG #18909: Query creates millions of temporary files and stalls - Mailing list pgsql-bugs

From Sergey Koposov
Subject Re: BUG #18909: Query creates millions of temporary files and stalls
Date
Msg-id 0ce957be37434cfca4820d1e89a70cb2b456d3db.camel@ed.ac.uk
Whole thread Raw
In response to Re: BUG #18909: Query creates millions of temporary files and stalls  (Sergey Koposov <Sergey.Koposov@ed.ac.uk>)
Responses Re: BUG #18909: Query creates millions of temporary files and stalls
List pgsql-bugs
And this is explain analyze run that finishes in psql, with the default settings (i.e. with parallelism) :

                                                                                      QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------
 Gather  (cost=14443102.62..111487693.90 rows=65470868 width=100) (actual time=212912.377..334892.676 rows=65514296
loops=1)
   Workers Planned: 4
   Workers Launched: 4
   ->  Nested Loop Left Join  (cost=14442102.62..104939607.10 rows=16367717 width=100) (actual
time=209806.019..317684.146rows=13102859 loops=5)
 
         ->  Parallel Hash Left Join  (cost=14442102.04..22124798.60 rows=16367717 width=60) (actual
time=209805.943..273008.489rows=13102859 loops=5)
 
               Hash Cond: (d.objid = gaps1.original_ext_source_id)
               ->  Parallel Seq Scan on disk_sample1 d  (cost=0.00..1218371.17 rows=16367717 width=60) (actual
time=37.353..25185.340rows=13095751 loops=5)
 
               ->  Parallel Hash  (cost=10307380.24..10307380.24 rows=237862624 width=16) (actual
time=169633.067..169633.068rows=190290095 loops=5)
 
                     Buckets: 67108864  Batches: 32  Memory Usage: 1919904kB
                     ->  Parallel Seq Scan on panstarrs1bestneighbour gaps1  (cost=0.00..10307380.24 rows=237862624
width=16)(actual time=132.295..117548.803
 
rows=190290095 loops=5)
         ->  Index Scan using gaia_sourcex_source_id_idx on gaia_source g  (cost=0.58..5.05 rows=1 width=48) (actual
time=0.003..0.003rows=0 loops=65514296)
 
               Index Cond: (source_id = gaps1.source_id)
 Planning Time: 1.266 ms
 JIT:
   Functions: 75
   Options: Inlining true, Optimization true, Expressions true, Deforming true
   Timing: Generation 11.796 ms, Inlining 293.374 ms, Optimization 81.354 ms, Emission 173.338 ms, Total 559.861 ms
 Execution Time: 337814.695 ms
(18 rows)

And I did verify again that when I query through python (i.e. using the cursor) with
max_parallel_workers_per_gather=0 it finishes fine.

Also I clearly see that when I query through the cursor in python with the default settings (and when I see the issue),
itclearly uses
 
 a different plan, as opposed to just running the query in psql.
Because when running in psql I see these kinds of files in tmp
pgsql_tmp75270.0.fileset
as opposed to :
pgsql_tmp73459.0 ...

I don't think I know how to see the plan of the declare cursor query.

So summarizing:

the query produces millions of files in

1) query through cursor with default settings (max_parallel_workers_per_gather=4)
2) query through psql with no parallelism(max_parallel_workers_per_gather=0)

it works
3) query through cursor with no parallelism (max_parallel_workers_per_gather=0)
4) query through psql with default settings (max_parallel_workers_per_gather=4)

I hope it makes sense to someone &  helps.

        Sergey




The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. Is e
buidheanncarthannais a th’ ann an Oilthigh Dhùn Èideann, clàraichte an Alba, àireamh clàraidh SC005336.
 

pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: Planner does not use btree index for LIKE 'prefix%' on text column, but does for equivalent range query (PostgreSQL 17.4)
Next
From: Tom Lane
Date:
Subject: Re: BUG #18909: Query creates millions of temporary files and stalls