Home > mailing lists

Re: Potential "AIO / io workers" inter-worker locking issue in PG18? - Mailing list pgsql-bugs

From	Marco Boeringa
Subject	Re: Potential "AIO / io workers" inter-worker locking issue in PG18?
Date	October 20 12:34:19
Msg-id	eeb6b15c-1b06-4f4e-bab8-0b3fe1b5c80f@boeringa.demon.nl Whole thread Raw
In response to	Re: Potential "AIO / io workers" inter-worker locking issue in PG18? (David Rowley <dgrowleyml@gmail.com>)
Responses	Re: Potential "AIO / io workers" inter-worker locking issue in PG18? Re: Potential "AIO / io workers" inter-worker locking issue in PG18?
List	pgsql-bugs

Tree view

Hi David,

To be honest I am not a SQL wizard like some of you here on the list, 
but part of the reason I setup the query as it currently is, is that the 
PostGIS function calls like ST_Area and ST_Perimeter can be very 
expensive depending on the complexity and size of the geometry, and I 
thus want to avoid at all cost to have to unnecessarily recalculate them 
multiple times in the same query. Maybe I am misunderstanding how 
PostgreSQL processes such queries, but I need the values multiple times 
to calculate some other parameters. So unless PostgreSQL is smart enough 
to cache the result and not execute ST_Area multiple times if it is used 
multiple times in the same query, I thought it wise to separate out the 
calculation and use the SELECT's results as input for the calculation of 
the other parameters. Maybe that isn't actually needed, but I think I 
remember seeing performance gains from the current setup when I 
initially wrote it this way, but I am not entirely sure at this point in 
time, it is a while ago. I know far to little about the internals of 
PostgreSQL and its exact query processing to say anything really 
sensible about this.

A couple of things that still strike me:

- As I wrote in a previous post, just before entering the Python 
multi-threaded processing that generates the jobs, I am adding a primary 
key column with unique objectids (GENERATED BY DEFAULT AS IDENTITY), 
that is subsequently used in the join. I realize this actually an 
enhancement request, but I am wondering why, if PostgreSQL already needs 
to dig through the entire table to add a new column with unique objectid 
values, it doesn't automatically update the statistics of the newly 
added column and write out the proper number of records of the table, 
that must be known by the end of the addition of the column, to the 
'pg_class.reltuples' table you refer to? It seems to me it would save 
the need of the ANALYZE here, and at least ensure that there is some 
useful statistics available on the vital primary key column and about 
the relation size?

- As I wrote in a previous post, the multi-threaded Python code creates 
multiple jobs (dozens). What I am seeing is that only part of the jobs 
is failing, and with some runs, none. E.g. I can run my tool 3 times 
without issues, than the fourth run some of the jobs get the bad plan (I 
only see this behavior in PG18, I never saw it in <= PG17). In all of 
these cases, the input data is *exactly* the same. The planning behavior 
therefor appears non-deterministic. As I understood it, PostgreSQL may 
indeed have non-deterministic behavior if it switches to the genetic 
algorithm on complex queries with many joins, but I have the feeling 
that my query doesn't quite satisfy that level of complexity? Or am I 
wrong here, and do you consider it likely it went through the genetic 
algorithm? It would actually be desirable if EXPLAIN (especially 
'auto_explain') output always showed whether the genetic algorithm was 
activated, so one could judge if non-deterministic behavior of the 
planner is expected.

- Lastly, did you notice the likely "good" plan I posted below the "bad" 
one. I generated that one by simply copy the visible query to pgAdmin 
and hitting EXPLAIN, so it's not the real thing as from 'auto_explain', 
but it does show some marked differences between the plan. Do you have 
any comments to add as to the differences?

Marco

Op 20-10-2025 om 03:33 schreef David Rowley:
> On Mon, 20 Oct 2025 at 09:37, Marco Boeringa <marco@boeringa.demon.nl> wrote:
>> I am not sure why in this nested loop, two index scans on essentially
>> the same key and table are executed. You can compare this bad plan with
> The query contains a self-join, so that's why you're seeing the index
> scanned twice in the query plan. If that's not needed, then you should
> remove it from the query. If objectid is unique for this table then I
> don't see why you need to join the table again to access the very same
> row that you're updating. Just put those function calls in the
> UPDATE's SET clause.
>
> (We do have self join elimination in v18, but I see that it's a bit
> overly strict in what it removes around looking for duplicate
> relations when one of them is the query's result relation. Likely that
> can be made better so it still looks for duplicate relations including
> the result relation, but just never considers removing that one, only
> the other duplicate(s).)
>
>> *** ACTUAL BAD PLAN AS CAPTURED BY auto_explain ***:
>>                     ->  Index Scan using
>> landcover_grassy_small_scale_2_ply_pkey on
>> osm.landcover_grassy_small_scale_2_ply t1  (cost=0.38..2.39 rows=1
>> width=310) (actual time=5.176..462.613 rows=222396.00 loops=1)
>>                           Output: t1.way, t1.ctid, t1.objectid
>>                           Index Searches: 1
>>                           Buffers: shared hit=66411
> This table must have been VACUUMed or ANALYZEd either when it was
> empty or when it contained 1 row. There's no predicate here, so that
> estimate, aside from clamping to 1, comes directly from
> pg_class.reltuples. A new table or truncated table would never
> estimate 1 row as the planner always plays it safe when there are no
> statistics generated yet and assumes 10 pages worth of rows. I can't
> think of any specific reason why v18 behaves differently from v17 on
> this... Maybe you've gotten unlikely with an autovacuum timing thing
> and it's running at a slightly different time than in v17, perhaps
> because it completed the autovacuum of another table slightly quicker
> than v17 did. v18 can perform asynchronous reads for vacuum, maybe
> that could mean more vacuum_cost_page_hits and less
> vacuum_cost_page_misses when calculating vacuum_cost_limit.
>
> Or, perhaps you're doing something like performing a manual VACUUM
> after the tables have had all of their rows deleted?
>
> David

pgsql-bugs by date:

From: "badfilez@gmail.com"
Date: 20 October, 11:51:04
Subject: Re: PG17.6 zstd handling bug

From: Marco Boeringa
Date: 20 October, 14:22:45
Subject: Re: Potential "AIO / io workers" inter-worker locking issue in PG18?

Re: Potential "AIO / io workers" inter-worker locking issue in PG18? - Mailing list pgsql-bugs

Previous

Next