Re: Potential "AIO / io workers" inter-worker locking issue in PG18? - Mailing list pgsql-bugs
| From | Marco Boeringa |
|---|---|
| Subject | Re: Potential "AIO / io workers" inter-worker locking issue in PG18? |
| Date | |
| Msg-id | eeb6b15c-1b06-4f4e-bab8-0b3fe1b5c80f@boeringa.demon.nl Whole thread Raw |
| In response to | Re: Potential "AIO / io workers" inter-worker locking issue in PG18? (David Rowley <dgrowleyml@gmail.com>) |
| Responses |
Re: Potential "AIO / io workers" inter-worker locking issue in PG18?
Re: Potential "AIO / io workers" inter-worker locking issue in PG18? |
| List | pgsql-bugs |
Hi David, To be honest I am not a SQL wizard like some of you here on the list, but part of the reason I setup the query as it currently is, is that the PostGIS function calls like ST_Area and ST_Perimeter can be very expensive depending on the complexity and size of the geometry, and I thus want to avoid at all cost to have to unnecessarily recalculate them multiple times in the same query. Maybe I am misunderstanding how PostgreSQL processes such queries, but I need the values multiple times to calculate some other parameters. So unless PostgreSQL is smart enough to cache the result and not execute ST_Area multiple times if it is used multiple times in the same query, I thought it wise to separate out the calculation and use the SELECT's results as input for the calculation of the other parameters. Maybe that isn't actually needed, but I think I remember seeing performance gains from the current setup when I initially wrote it this way, but I am not entirely sure at this point in time, it is a while ago. I know far to little about the internals of PostgreSQL and its exact query processing to say anything really sensible about this. A couple of things that still strike me: - As I wrote in a previous post, just before entering the Python multi-threaded processing that generates the jobs, I am adding a primary key column with unique objectids (GENERATED BY DEFAULT AS IDENTITY), that is subsequently used in the join. I realize this actually an enhancement request, but I am wondering why, if PostgreSQL already needs to dig through the entire table to add a new column with unique objectid values, it doesn't automatically update the statistics of the newly added column and write out the proper number of records of the table, that must be known by the end of the addition of the column, to the 'pg_class.reltuples' table you refer to? It seems to me it would save the need of the ANALYZE here, and at least ensure that there is some useful statistics available on the vital primary key column and about the relation size? - As I wrote in a previous post, the multi-threaded Python code creates multiple jobs (dozens). What I am seeing is that only part of the jobs is failing, and with some runs, none. E.g. I can run my tool 3 times without issues, than the fourth run some of the jobs get the bad plan (I only see this behavior in PG18, I never saw it in <= PG17). In all of these cases, the input data is *exactly* the same. The planning behavior therefor appears non-deterministic. As I understood it, PostgreSQL may indeed have non-deterministic behavior if it switches to the genetic algorithm on complex queries with many joins, but I have the feeling that my query doesn't quite satisfy that level of complexity? Or am I wrong here, and do you consider it likely it went through the genetic algorithm? It would actually be desirable if EXPLAIN (especially 'auto_explain') output always showed whether the genetic algorithm was activated, so one could judge if non-deterministic behavior of the planner is expected. - Lastly, did you notice the likely "good" plan I posted below the "bad" one. I generated that one by simply copy the visible query to pgAdmin and hitting EXPLAIN, so it's not the real thing as from 'auto_explain', but it does show some marked differences between the plan. Do you have any comments to add as to the differences? Marco Op 20-10-2025 om 03:33 schreef David Rowley: > On Mon, 20 Oct 2025 at 09:37, Marco Boeringa <marco@boeringa.demon.nl> wrote: >> I am not sure why in this nested loop, two index scans on essentially >> the same key and table are executed. You can compare this bad plan with > The query contains a self-join, so that's why you're seeing the index > scanned twice in the query plan. If that's not needed, then you should > remove it from the query. If objectid is unique for this table then I > don't see why you need to join the table again to access the very same > row that you're updating. Just put those function calls in the > UPDATE's SET clause. > > (We do have self join elimination in v18, but I see that it's a bit > overly strict in what it removes around looking for duplicate > relations when one of them is the query's result relation. Likely that > can be made better so it still looks for duplicate relations including > the result relation, but just never considers removing that one, only > the other duplicate(s).) > >> *** ACTUAL BAD PLAN AS CAPTURED BY auto_explain ***: >> -> Index Scan using >> landcover_grassy_small_scale_2_ply_pkey on >> osm.landcover_grassy_small_scale_2_ply t1 (cost=0.38..2.39 rows=1 >> width=310) (actual time=5.176..462.613 rows=222396.00 loops=1) >> Output: t1.way, t1.ctid, t1.objectid >> Index Searches: 1 >> Buffers: shared hit=66411 > This table must have been VACUUMed or ANALYZEd either when it was > empty or when it contained 1 row. There's no predicate here, so that > estimate, aside from clamping to 1, comes directly from > pg_class.reltuples. A new table or truncated table would never > estimate 1 row as the planner always plays it safe when there are no > statistics generated yet and assumes 10 pages worth of rows. I can't > think of any specific reason why v18 behaves differently from v17 on > this... Maybe you've gotten unlikely with an autovacuum timing thing > and it's running at a slightly different time than in v17, perhaps > because it completed the autovacuum of another table slightly quicker > than v17 did. v18 can perform asynchronous reads for vacuum, maybe > that could mean more vacuum_cost_page_hits and less > vacuum_cost_page_misses when calculating vacuum_cost_limit. > > Or, perhaps you're doing something like performing a manual VACUUM > after the tables have had all of their rows deleted? > > David
pgsql-bugs by date: