Re: Potential "AIO / io workers" inter-worker locking issue in PG18? - Mailing list pgsql-bugs

From Marco Boeringa
Subject Re: Potential "AIO / io workers" inter-worker locking issue in PG18?
Date
Msg-id 53b44572-0ceb-4149-b361-07da2da91032@boeringa.demon.nl
Whole thread Raw
In response to Re: Potential "AIO / io workers" inter-worker locking issue in PG18?  (Andres Freund <andres@anarazel.de>)
Responses Re: Potential "AIO / io workers" inter-worker locking issue in PG18?
List pgsql-bugs
Hi Andres,

> Could you show perf report --no-children? That would show us which individual
functions, rather than call-stacks, take the longest.

See entirely below!

> I don't immediately see how this could be related to AIO.

Yes, you could be right this is not related to AIO at all, but another issue introduced in PG18. The only reason I
initiallythought of AIO was of course that it is one of the main new features of PG18, and I could imagine "workers"
gettinginto some sort of inter-worker locking issues, just like threads can.
 

For sure, I did not see this issue in <= PG17, so some change in PG18 is causing it. Additionally, there is a small
chanceit might be related to PostGIS, as that was upgraded as well (3.5.2 --> 3.6.0) during the PG upgrade, as PG18
requiresPostGIS 3.6.0 minimum. And the query does use PostGIS functions, but none that AFAIK rely on e.g. a spatial
index.Functions like ST_Area just process an individual geometry, not the spatial relationship between multiple
geometries.

As I wrote before, this is a multi-threaded Python application (actually developed in a GIS), that uses Python's
'concurrent.futures'threading framework to create jobs of records to process for each thread, significantly speeding up
theprocessing. The queries are in fact dynamically build by the code, and part of a much larger geoprocessing workflow,
soit is hard to run them separately and provide a query plan (although in this case I could by rewriting part of the
querybelow).
 

However, I want to stress that any query plan is unlikely to yield anything. In normal circumstances and in PG17 and
below,this code runs fine! And it is only 1 in maybe 4 runs in PG18 that goes berserk and makes a processing step that
shouldtakes < 10 seconds, all of a sudden take > 2 hours.
 

UPDATE osm.landcover_scrubs_small_scale_2_ply AS t1 SET area_geo
       = t2.area_geo, perim_geo = t2.perim_geo, compact_geo = CASE WHEN
       t2.area_geo > 0 THEN ((power(t2.perim_geo,2) / t2.area_geo) /
       (4 * pi())) ELSE 0 END, npoints_geo = t2.npoints_geo,
       comp_npoints_geo = CASE WHEN t2.npoints_geo > 0 THEN (CASE WHEN
       t2.area_geo > 0 THEN ((power(t2.perim_geo,2) / t2.area_geo) /
       (4 * pi())) ELSE 0 END / t2.npoints_geo) ELSE 0 END,
       convex_ratio_geo = CASE WHEN
       ST_Area(ST_ConvexHull(way)::geography,true) > 0 THEN
       (t2.area_geo / ST_Area(ST_ConvexHull(way)::geography,true)) ELSE 1
       END FROM (SELECT objectid,ST_Area(way::geography,true) AS
       area_geo,ST_Perimeter(way::geography,true) AS
       perim_geo,ST_NPoints(way) AS npoints_geo FROM
       osm.landcover_scrubs_small_scale_2_ply)  AS t2 WHERE (t2.objectid
       = t1.objectid) AND t1.objectid IN (SELECT t3.objectid FROM
       mini_test.osm.osm_tmp_28128_ch5 AS t3)

> So somehow >60% of the CPU time is spent fetching tuples corresponding to
index entries. That seems ... a lot.  Is it possible that you have a lot of
dead rows in the involved tables?

Yes, that is perfectly possible. However, the particular table is only 
just over 100k records. It is true that my code is designed to process 
literally *every* record in a table. However, I specifically set 
adjusted table storage parameters with much more aggressive vacuum 
settings (essentially forcing always vacuum after something like 10k 
dead tuples irrespective of the size of the table). This has worked 
really well, and I have successfully UPDATEd all of Facebook Daylight 
size > 1B records tables with the same code, without ever running into 
this particular issue, nor transaction ID wraparound issues.

One particular thing to note as well is that, due to careful design of 
the jobs taking page locality into account, deliberately setting a low 
table fill factor, and plenty of RAM, quite a few but not all of the 
steps in the geoprocessing workflow, manage to run almost completely as 
PostgreSQL 'HOT' updates, so for all records in the table (even for very 
large ones).

*** sudo perf -p <PID of one stuck postgres backend> -g -d 10 ***
*** sudo perf report --no-children ***

Samples: 40K of event 'task-clock:ppp', Event count (approx.): 10008250000
   Overhead  Command   Shared Object      Symbol
+   12,81%  postgres  postgres           [.] LWLockAttemptLock
+   10,70%  postgres  postgres           [.] heap_hot_search_buffer
+    9,27%  postgres  postgres           [.] 
tts_heap_getsomeattrs.lto_priv.0
+    6,77%  postgres  postgres           [.] LWLockReleaseInternal
+    5,89%  postgres  postgres           [.] hash_search_with_hash_value
+    5,79%  postgres  postgres           [.] ExecInterpExpr
+    4,14%  postgres  postgres           [.] PinBuffer
+    3,61%  postgres  postgres           [.] 
heapam_index_fetch_tuple.lto_priv.0
+    3,00%  postgres  postgres           [.] ReleaseAndReadBuffer
+    2,43%  postgres  postgres           [.] LWLockAcquire
+    2,38%  postgres  postgres           [.] HeapTupleSatisfiesVisibility
+    2,36%  postgres  postgres           [.] index_getnext_tid
+    1,96%  postgres  postgres           [.] index_fetch_heap
+    1,90%  postgres  postgres           [.] _bt_readpage
+    1,85%  postgres  postgres           [.] LWLockRelease
+    1,74%  postgres  postgres           [.] heap_page_prune_opt
+    1,49%  postgres  postgres           [.] IndexNext
+    1,47%  postgres  postgres           [.] _bt_checkkeys
+    1,47%  postgres  postgres           [.] ExecScan
+    1,47%  postgres  postgres           [.] XidInMVCCSnapshot
+    1,19%  postgres  postgres           [.] StartReadBuffer
+    1,18%  postgres  postgres           [.] 
GetPrivateRefCountEntry.lto_priv.0
+    1,12%  postgres  postgres           [.] _bt_next
+    1,10%  postgres  postgres           [.] UnpinBufferNoOwner.lto_priv.0
+    1,09%  postgres  postgres           [.] MemoryContextReset
+    1,08%  postgres  postgres           [.] btgettuple
+    0,91%  postgres  postgres           [.] _bt_check_compare.lto_priv.0
+    0,82%  postgres  postgres           [.] ExecEvalSysVar
+    0,82%  postgres  postgres           [.] ExecStoreBufferHeapTuple
+    0,71%  postgres  postgres           [.] hash_bytes
+    0,65%  postgres  postgres           [.] tts_virtual_clear.lto_priv.0
+    0,63%  postgres  postgres           [.] GlobalVisTestFor
+    0,63%  postgres  postgres           [.] ExecNestLoop
+    0,55%  postgres  postgres           [.] HeapTupleIsSurelyDead
+    0,52%  postgres  postgres           [.] ResourceOwnerForget
      0,48%  postgres  postgres           [.] slot_getsomeattrs_int
      0,42%  postgres  postgres           [.] PredicateLockTID
      0,40%  postgres  postgres           [.] ReadBufferExtended
      0,37%  postgres  postgres           [.] _bt_saveitem
      0,31%  postgres  postgres           [.] ExecIndexScan
      0,30%  postgres  libc.so.6          [.] __memcmp_sse2
      0,28%  postgres  postgres           [.] _bt_setuppostingitems
      0,26%  postgres  postgres           [.] ReleaseBuffer
      0,23%  postgres  postgres           [.] ResourceOwnerEnlarge
      0,23%  postgres  postgres           [.] 
HeapCheckForSerializableConflictOut
      0,22%  postgres  postgres           [.] IncrBufferRefCount
      0,18%  postgres  postgres           [.] pgstat_count_io_op




pgsql-bugs by date:

Previous
From: Etsuro Fujita
Date:
Subject: Re: TRAP: failed Assert("outerPlan != NULL") in postgres_fdw.c
Next
From: Laurenz Albe
Date:
Subject: Re: BUG #19076: Generic query plan is extremely slow