Hi Andres,
Looking at the "perf report --no-children" list, I noticed the high
"LWLock" related CPU activity.
Although I have already mentioned part of it, here are a few more
observations of pgAdmin and pg_locks during the stuck situation:
- pgAdmin shows 15 active session without wait events
- pg_locks shows 228 locks taken, of which 218 are "fastpath"
- Only 10 locks are not fastpath, and they refer to the "transactionid".
I probably misunderstand this, but I would have expected to see 15
transactionids, one for each database session? But maybe I have this
wrong... There are no other transactionids. There *are* also 15
"virtualxid" locks visible, all "fastpath". All transactionid and
virtualxid locks are "ExclusiveLock" type. The rest of the 228 locks are
of "AccessShareLock" and "RowExclusiveLock" type.
- My 'max_locks_per_transaction', as mentioned in this
(https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=c4d5cb71d)
commit related to PG18 and fastpath changes, is set to the default of 64.
- If I look in the pg_stat_io view and regularly refresh it, I hardly
see any changes in the table, except a few records (at least one related
to autovacuum) now and then. This is also more or less confirmed by the
disk RAIDs for tables and indexes, that show zero activity in Windows
Task Manager (0 KB/s writes with 0 ms "average response time" (which
doesn't happen in normal operation of the multi-threaded code where I do
see almost continuous activity on the RAIDs).
- It is likely the entire table being processed is cached in RAM, as
there is plenty of it, far larger than the small Italy extract table
being processed.
Marco