Re: Potential "AIO / io workers" inter-worker locking issue in PG18? - Mailing list pgsql-bugs

From Marco Boeringa
Subject Re: Potential "AIO / io workers" inter-worker locking issue in PG18?
Date
Msg-id 8798d95e-ed5c-46ad-8d22-26bcd0885fc2@boeringa.demon.nl
Whole thread Raw
In response to Re: Potential "AIO / io workers" inter-worker locking issue in PG18?  (Marco Boeringa <marco@boeringa.demon.nl>)
Responses Re: Potential "AIO / io workers" inter-worker locking issue in PG18?
List pgsql-bugs
Hi Andres,

I have now been able to capture an actual bad plan using PostgreSQL's 
'auto_explain' option, so this the bad plan that makes a process and job 
that should take just a few seconds, cost more than 7 hours to execute. 
And this is of course without updating the tables statistics with 
ANALYZE after adding the primary key objectid, as that solved the issue 
in PG18. It is now clear where PostgreSQL spends all of its time when 
the bad plan is generated, see the loop number of the second index scan. 
I am not sure why in this nested loop, two index scans on essentially 
the same key and table are executed. You can compare this bad plan with 
the one below it, that was generated with EXPLAIN in pgAdmin, not from 
actual auto_explain output, but shows a different query execution plan.

Note again:

- I did not see the bad plan in PG<=17, even without the now added 
ANALYZE step that solves the issue in PG18.

- The two tables involved ('landcover_grassy_small_scale_2_ply' and 
'landcover_grassy_small_scale_2_ply_pg') both have primary keys and 
unique objectid indexes.

- The join is one-to-one. 'landcover_grassy_small_scale_2_ply_pg' has 
unique values for objectids, so only record per corresponding record 
in 'landcover_grassy_small_scale_2_ply'

- The 'osm_tmp_28232_ch3' references a database view and represents the 
chunk / multi-threaded job that needs to be executed from my Python 
code, and references a selection of 
the 'landcover_grassy_small_scale_2_ply_pg' table's records, which is 
the filter in the last index scan in the bad plan below.

Marco

*** ACTUAL BAD PLAN AS CAPTURED BY auto_explain ***:

2025-10-17 23:32:17.375 CEST [242803] osm@mini_test LOG: duration: 
27133980.536 ms  plan:
     Query Text: UPDATE osm.landcover_grassy_small_scale_2_ply AS t1 SET 
area_geo = t2.area_geo, perim_geo = t2.perim_geo, compact_geo = CASE 
WHEN t2.area_geo > 0 THEN ((power(t2.perim_geo,2) / t2.area_geo) / (4 * 
pi())) ELSE 0 END, npoints_geo = t2.npoints_geo, comp_npoints_geo = CASE 
WHEN t2.npoints_geo > 0 THEN (CASE WHEN t2.area_geo > 0 THEN 
((power(t2.perim_geo,2) / t2.area_geo) / (4 * pi())) ELSE 0 END / 
t2.npoints_geo) ELSE 0 END, convex_ratio_geo = CASE WHEN 
ST_Area(ST_ConvexHull(way)::geography,true) > 0 THEN (t2.area_geo / 
ST_Area(ST_ConvexHull(way)::geography,true)) ELSE 1 END FROM (SELECT 
objectid,ST_Area(way::geography,true) AS 
area_geo,ST_Perimeter(way::geography,true) AS perim_geo,ST_NPoints(way) 
AS npoints_geo FROM osm.landcover_grassy_small_scale_2_ply)  AS t2 WHERE 
(t2.objectid = t1.objectid) AND t1.objectid IN (SELECT t3.objectid FROM 
mini_test.osm.osm_tmp_28232_ch3 AS t3)
     Update on osm.landcover_grassy_small_scale_2_ply t1 
(cost=1.17..134.56 rows=0 width=0) (actual 
time=27133980.532..27133980.534 rows=0.00 loops=1)
       Buffers: shared hit=7585627828
       ->  Nested Loop  (cost=1.17..134.56 rows=1 width=62) (actual 
time=225948.297..27133979.894 rows=8.00 loops=1)
             Output: 
st_area((landcover_grassy_small_scale_2_ply.way)::geography, true), 
st_perimeter((landcover_grassy_small_scale_2_ply.way)::geography, true), 
CASE WHEN (st_area((landcover_grassy_small_scale_2_ply.way)::geography, 
true) > '0'::double precision) THEN 
((power(st_perimeter((landcover_grassy_small_scale_2_ply.way)::geography, 
true), '2'::double precision) / 
st_area((landcover_grassy_small_scale_2_ply.way)::geography, true)) / 
'12.566370614359172'::double precision) ELSE '0'::double precision END, 
st_npoints(landcover_grassy_small_scale_2_ply.way), CASE WHEN 
(st_npoints(landcover_grassy_small_scale_2_ply.way) > 0) THEN (CASE WHEN 
(st_area((landcover_grassy_small_scale_2_ply.way)::geography, true) > 
'0'::double precision) THEN 
((power(st_perimeter((landcover_grassy_small_scale_2_ply.way)::geography, 
true), '2'::double precision) / 
st_area((landcover_grassy_small_scale_2_ply.way)::geography, true)) / 
'12.566370614359172'::double precision) ELSE '0'::double precision END / 
(st_npoints(landcover_grassy_small_scale_2_ply.way))::double precision) 
ELSE '0'::double precision END, CASE WHEN 
(st_area((st_convexhull(t1.way))::geography, true) > '0'::double 
precision) THEN 
(st_area((landcover_grassy_small_scale_2_ply.way)::geography, true) / 
st_area((st_convexhull(t1.way))::geography, true)) ELSE '1'::double 
precision END, t1.ctid, landcover_grassy_small_scale_2_ply.ctid, 
landcover_grassy_small_scale_2_ply_pg.ctid
             Inner Unique: true
             Join Filter: (landcover_grassy_small_scale_2_ply.objectid = 
landcover_grassy_small_scale_2_ply_pg.objectid)
             Buffers: shared hit=7585627705
             ->  Nested Loop  (cost=0.75..4.79 rows=1 width=620) (actual 
time=5.784..27131099.372 rows=222396.00 loops=1)
                   Output: t1.way, t1.ctid, t1.objectid, 
landcover_grassy_small_scale_2_ply.way, 
landcover_grassy_small_scale_2_ply.ctid, 
landcover_grassy_small_scale_2_ply.objectid
                   Inner Unique: true
                   Join Filter: (t1.objectid = 
landcover_grassy_small_scale_2_ply.objectid)
                   Rows Removed by Join Filter: 24729879210
                   Buffers: shared hit=7584737792
                   ->  Index Scan using 
landcover_grassy_small_scale_2_ply_pkey on 
osm.landcover_grassy_small_scale_2_ply t1  (cost=0.38..2.39 rows=1 
width=310) (actual time=5.176..462.613 rows=222396.00 loops=1)
                         Output: t1.way, t1.ctid, t1.objectid
                         Index Searches: 1
                         Buffers: shared hit=66411
                   ->  Index Scan using 
landcover_grassy_small_scale_2_ply_pkey on 
osm.landcover_grassy_small_scale_2_ply  (cost=0.38..2.39 rows=1 
width=310) (actual time=0.035..114.709 rows=111198.50 loops=222396)
                         Output: landcover_grassy_small_scale_2_ply.way, 
landcover_grassy_small_scale_2_ply.ctid, 
landcover_grassy_small_scale_2_ply.objectid
                         Index Searches: 222396
                         Buffers: shared hit=7584671381
             ->  Index Scan using idx_osm_41 on 
osm.landcover_grassy_small_scale_2_ply_pg  (cost=0.42..2.44 rows=1 
width=14) (actual time=0.009..0.009 rows=0.00 loops=222396)
                   Output: landcover_grassy_small_scale_2_ply_pg.ctid, 
landcover_grassy_small_scale_2_ply_pg.objectid
                   Index Cond: 
(landcover_grassy_small_scale_2_ply_pg.objectid = t1.objectid)
                   Filter: 
((landcover_grassy_small_scale_2_ply_pg.page_number >= 31276) AND 
(landcover_grassy_small_scale_2_ply_pg.page_number < 31766))
                   Rows Removed by Filter: 1
                   Index Searches: 222396
                   Buffers: shared hit=889584


*** LIKELY GOOD PLAN AS GENERATED BY CAPTURING THE SQL AND RUNNING 
'EXPLAIN' IN pgAdmin ***:

"[
   {
     ""Plan"": {
       ""Node Type"": ""ModifyTable"",
       ""Operation"": ""Update"",
       ""Parallel Aware"": false,
       ""Async Capable"": false,
       ""Relation Name"": ""landcover_grassy_small_scale_2_ply"",
       ""Schema"": ""osm"",
       ""Alias"": ""t1"",
       ""Startup Cost"": 1.14,
       ""Total Cost"": 9129.99,
       ""Plan Rows"": 0,
       ""Plan Width"": 0,
       ""Disabled"": false,
       ""Plans"": [
         {
           ""Node Type"": ""Nested Loop"",
           ""Parent Relationship"": ""Outer"",
           ""Parallel Aware"": false,
           ""Async Capable"": false,
           ""Join Type"": ""Inner"",
           ""Startup Cost"": 1.14,
           ""Total Cost"": 9129.99,
           ""Plan Rows"": 70,
           ""Plan Width"": 62,
           ""Disabled"": false,
           ""Output"": 
[""st_area((landcover_grassy_small_scale_2_ply.way)::geography, true)"",
""st_perimeter((landcover_grassy_small_scale_2_ply.way)::geography, 
true)"", ""CASE WHEN 
(st_area((landcover_grassy_small_scale_2_ply.way)::geography, true) > 
'0'::double precision) THEN 
((power(st_perimeter((landcover_grassy_small_scale_2_ply.way)::geography, 
true), '2'::double precision) / 
st_area((landcover_grassy_small_scale_2_ply.way)::geography, true)) / 
'12.566370614359172'::double precision) ELSE '0'::double precision 
END"", ""st_npoints(landcover_grassy_small_scale_2_ply.way)"", ""CASE 
WHEN (st_npoints(landcover_grassy_small_scale_2_ply.way) > 0) THEN (CASE 
WHEN (st_area((landcover_grassy_small_scale_2_ply.way)::geography, true) 
 > '0'::double precision) THEN 
((power(st_perimeter((landcover_grassy_small_scale_2_ply.way)::geography, 
true), '2'::double precision) / 
st_area((landcover_grassy_small_scale_2_ply.way)::geography, true)) / 
'12.566370614359172'::double precision) ELSE '0'::double precision END / 
(st_npoints(landcover_grassy_small_scale_2_ply.way))::double precision) 
ELSE '0'::double precision END"", ""CASE WHEN 
(st_area((st_convexhull(t1.way))::geography, true) > '0'::double 
precision) THEN 
(st_area((landcover_grassy_small_scale_2_ply.way)::geography, true) / 
st_area((st_convexhull(t1.way))::geography, true)) ELSE '1'::double 
precision END"", ""t1.ctid"", 
""landcover_grassy_small_scale_2_ply.ctid"", 
""landcover_grassy_small_scale_2_ply_pg.ctid""],
           ""Inner Unique"": true,
           ""Plans"": [
             {
               ""Node Type"": ""Nested Loop"",
               ""Parent Relationship"": ""Outer"",
               ""Parallel Aware"": false,
               ""Async Capable"": false,
               ""Join Type"": ""Inner"",
               ""Startup Cost"": 0.72,
               ""Total Cost"": 173.50,
               ""Plan Rows"": 70,
               ""Plan Width"": 828,
               ""Disabled"": false,
               ""Output"": [""t1.way"", ""t1.ctid"", ""t1.objectid"", 
""landcover_grassy_small_scale_2_ply_pg.ctid"", 
""landcover_grassy_small_scale_2_ply_pg.objectid""],
               ""Inner Unique"": true,
               ""Plans"": [
                 {
                   ""Node Type"": ""Index Scan"",
                   ""Parent Relationship"": ""Outer"",
                   ""Parallel Aware"": false,
                   ""Async Capable"": false,
                   ""Scan Direction"": ""Forward"",
                   ""Index Name"": ""idx_osm_35"",
                   ""Relation Name"": 
""landcover_grassy_small_scale_2_ply_pg"",
                   ""Schema"": ""osm"",
                   ""Alias"": ""landcover_grassy_small_scale_2_ply_pg"",
                   ""Startup Cost"": 0.29,
                   ""Total Cost"": 3.70,
                   ""Plan Rows"": 70,
                   ""Plan Width"": 14,
                   ""Disabled"": false,
                   ""Output"": 
[""landcover_grassy_small_scale_2_ply_pg.ctid"", 
""landcover_grassy_small_scale_2_ply_pg.objectid""],
                   ""Index Cond"": 
""((landcover_grassy_small_scale_2_ply_pg.page_number >= 28873) AND 
(landcover_grassy_small_scale_2_ply_pg.page_number < 29373))""
                 },
                 {
                   ""Node Type"": ""Index Scan"",
                   ""Parent Relationship"": ""Inner"",
                   ""Parallel Aware"": false,
                   ""Async Capable"": false,
                   ""Scan Direction"": ""Forward"",
                   ""Index Name"": 
""landcover_grassy_small_scale_2_ply_pkey"",
                   ""Relation Name"": 
""landcover_grassy_small_scale_2_ply"",
                   ""Schema"": ""osm"",
                   ""Alias"": ""t1"",
                   ""Startup Cost"": 0.42,
                   ""Total Cost"": 2.43,
                   ""Plan Rows"": 1,
                   ""Plan Width"": 814,
                   ""Disabled"": false,
                   ""Output"": [""t1.way"", ""t1.ctid"", ""t1.objectid""],
                   ""Index Cond"": ""(t1.objectid = 
landcover_grassy_small_scale_2_ply_pg.objectid)""
                 }
               ]
             },
             {
               ""Node Type"": ""Index Scan"",
               ""Parent Relationship"": ""Inner"",
               ""Parallel Aware"": false,
               ""Async Capable"": false,
               ""Scan Direction"": ""Forward"",
               ""Index Name"": ""landcover_grassy_small_scale_2_ply_pkey"",
               ""Relation Name"": ""landcover_grassy_small_scale_2_ply"",
               ""Schema"": ""osm"",
               ""Alias"": ""landcover_grassy_small_scale_2_ply"",
               ""Startup Cost"": 0.42,
               ""Total Cost"": 0.64,
               ""Plan Rows"": 1,
               ""Plan Width"": 814,
               ""Disabled"": false,
               ""Output"": [""landcover_grassy_small_scale_2_ply.way"", 
""landcover_grassy_small_scale_2_ply.ctid"", 
""landcover_grassy_small_scale_2_ply.objectid""],
               ""Index Cond"": 
""(landcover_grassy_small_scale_2_ply.objectid = t1.objectid)""
             }
           ]
         }
       ]
     }
   }
]"

Op 12-10-2025 om 10:24 schreef Marco Boeringa:
> Hi Andres,
>
> I have been doing a bit more investigation. As I explained before, the 
> problematic multi-threaded geoprocessing step is not some stand-alone 
> query that can be easily reduced to small easily portable reproducible 
> case with attached data. In fact, this geoprocessing step is part of a 
> large custom build Python geoprocessing workflow, with total code 
> probably in the 25k code lines range.
>
> However, based on the apparent poor query plan in PG18 / PostGIS 
> 3.6.0, I now reviewed the exact code once more. I noticed that just 
> before entering the multi-threaded code that emits the queries as seen 
> below, I am actually adding the primary key field 'objectid' as 
> "GENERATED BY DEFAULT AS IDENTITY" to the 
> 'osm.landcover_scrubs_small_scale_2_ply' table.
>
> Now I also noticed I did not run ANALYZE after that against the same 
> table. I have now added this to the code. Although it is still 
> preliminary, first tests seem to indicate that this resolves the 
> issue, and prevents the stalls or better said apparent hugely 
> inefficient query plan (remember: a < 10 sec process was turned into 
> multi-hour). I still need to do more thorough testing to be sure though.
>
> However, this raises a couple of question:
>
> - While ANALYZE is of course hugely important for proper statistics 
> and query planning, I have wondered if PostgreSQL shouldn't 
> automatically have updated the statistics for the addition of the 
> primary key with IDENTITY? It seems to me that based on the definition 
> of the primary key column and IDENTITY and table size, the actual 
> distribution of values is essentially already known even before any 
> sampling of ANALYZE to update statistics?
>
> - Am I right to assume that only the statistics on the objectid field 
> play any role in this issue? As you can see, the WHERE clause does not 
> involve any other fields than the two objectid fields of the main 
> table and the chunk table specifying the job. All other values 
> computed are just derived straight from the geometry column.
>
> - Were there any hints in the all the other data I supplied as to 
> where PG18's query planning without the updated statistics of the new 
> ANALYZE step added, is going wrong? And why this was never an issue in 
> <= PG17?
>
> I also did some preliminary test with the old PG17.6 / PostgGIS 3.6.0 
> cluster with the same Italy extract data. I still need to do more 
> thorough testing, both with and without the extra ANALYZE step, to 
> fully exclude that there isn't something related to the upgrade to 
> PostGIS 3.6.0, but first indications are as I already saw with 
> the PG17.6 / PostgGIS 3.5.3, that there are no issue with <= PG17 / 
> PostGIS combination as regards this apparent planner issue.
>
> Marco
>
> Op 8-10-2025 om 21:08 schreef Andres Freund:
>> In [1] you showed a query. Reformated that looks like this:
>>
>> UPDATE osm.landcover_scrubs_small_scale_2_ply AS t1
>> SET area_geo = t2.area_geo,
>>      perim_geo = t2.perim_geo,
>>      compact_geo = CASE WHEN t2.area_geo > 0 THEN 
>> ((power(t2.perim_geo,2) / t2.area_geo) / (4 * pi())) ELSE 0 END,
>>      npoints_geo = t2.npoints_geo,
>>      comp_npoints_geo = CASE WHEN t2.npoints_geo > 0 THEN (CASE WHEN 
>> t2.area_geo > 0 THEN ((power(t2.perim_geo,2) / t2.area_geo) / (4 * 
>> pi())) ELSE 0 END / t2.npoints_geo) ELSE 0 END,
>>      convex_ratio_geo = CASE WHEN 
>> ST_Area(ST_ConvexHull(way)::geography,true) > 0 THEN (t2.area_geo / 
>> ST_Area(ST_ConvexHull(way)::geography,true)) ELSE 1 END
>> FROM (
>>      SELECT
>>          objectid,
>>          ST_Area(way::geography,true) AS area_geo,
>>          ST_Perimeter(way::geography,true) AS perim_geo,
>>          ST_NPoints(way) AS npoints_geo
>>      FROM osm.landcover_scrubs_small_scale_2_ply) AS t2
>> WHERE (t2.objectid = t1.objectid)
>>     AND t1.objectid IN (SELECT t3.objectid FROM 
>> mini_test.osm.osm_tmp_28128_ch5 AS t3)
>>
>>
>> Which certainly fits with two nested loops, although I don't think I 
>> can infer
>> which order it the joins are in.



pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: Issues with blocksize smaller than 8KB
Next
From: Tom Lane
Date:
Subject: Re: Issues with blocksize smaller than 8KB