And this is explain analyze run that finishes in psql, with the default settings (i.e. with parallelism) :
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------
Gather (cost=14443102.62..111487693.90 rows=65470868 width=100) (actual time=212912.377..334892.676 rows=65514296
loops=1)
Workers Planned: 4
Workers Launched: 4
-> Nested Loop Left Join (cost=14442102.62..104939607.10 rows=16367717 width=100) (actual
time=209806.019..317684.146rows=13102859 loops=5)
-> Parallel Hash Left Join (cost=14442102.04..22124798.60 rows=16367717 width=60) (actual
time=209805.943..273008.489rows=13102859 loops=5)
Hash Cond: (d.objid = gaps1.original_ext_source_id)
-> Parallel Seq Scan on disk_sample1 d (cost=0.00..1218371.17 rows=16367717 width=60) (actual
time=37.353..25185.340rows=13095751 loops=5)
-> Parallel Hash (cost=10307380.24..10307380.24 rows=237862624 width=16) (actual
time=169633.067..169633.068rows=190290095 loops=5)
Buckets: 67108864 Batches: 32 Memory Usage: 1919904kB
-> Parallel Seq Scan on panstarrs1bestneighbour gaps1 (cost=0.00..10307380.24 rows=237862624
width=16)(actual time=132.295..117548.803
rows=190290095 loops=5)
-> Index Scan using gaia_sourcex_source_id_idx on gaia_source g (cost=0.58..5.05 rows=1 width=48) (actual
time=0.003..0.003rows=0 loops=65514296)
Index Cond: (source_id = gaps1.source_id)
Planning Time: 1.266 ms
JIT:
Functions: 75
Options: Inlining true, Optimization true, Expressions true, Deforming true
Timing: Generation 11.796 ms, Inlining 293.374 ms, Optimization 81.354 ms, Emission 173.338 ms, Total 559.861 ms
Execution Time: 337814.695 ms
(18 rows)
And I did verify again that when I query through python (i.e. using the cursor) with
max_parallel_workers_per_gather=0 it finishes fine.
Also I clearly see that when I query through the cursor in python with the default settings (and when I see the issue),
itclearly uses
a different plan, as opposed to just running the query in psql.
Because when running in psql I see these kinds of files in tmp
pgsql_tmp75270.0.fileset
as opposed to :
pgsql_tmp73459.0 ...
I don't think I know how to see the plan of the declare cursor query.
So summarizing:
the query produces millions of files in
1) query through cursor with default settings (max_parallel_workers_per_gather=4)
2) query through psql with no parallelism(max_parallel_workers_per_gather=0)
it works
3) query through cursor with no parallelism (max_parallel_workers_per_gather=0)
4) query through psql with default settings (max_parallel_workers_per_gather=4)
I hope it makes sense to someone & helps.
Sergey
The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. Is e
buidheanncarthannais a th’ ann an Oilthigh Dhùn Èideann, clàraichte an Alba, àireamh clàraidh SC005336.