On Mon, Jul 21, 2025 at 12:51:14PM +0530, Rahila Syed wrote:
> This appears to be a valid issue where the Autovacuum worker fails while
> already holding an
> LWLock on one of the pgStatLocal.shared_hash partitions. As a result, when
> we attempt to
> access this table again during proc_exit cleanup in dshash_find, the assert
> is triggered. I haven’t
> yet checked exactly where the lock is acquired within the Autovacuum
> worker, but as Dilip mentioned,
> reviewing where the error occurs in the Autovacuum worker would be helpful.
Per dsm_attach@dsm.c, about the original FATAL message "can't attach
the same segment more than once" that triggers the assertion
afterwards:
* If you're hitting this error, you probably want to attempt to find an
* existing mapping via dsm_find_mapping() before calling dsm_attach() to
* create a new one.
One thing that we could do is to upgrade this FATAL to a PANIC, to get
an idea of the stack where the original problem happens.
The stack is referencing a backend-level stats getting dropped by an
autovacuum worker as a result of pgstat_drop_entry() done in
pgstat_shutdown_hook(), so it looks like we are reaching a new error
state in v18 that could not happen before within the DSM, as an after
effect of the FATAL causing the autovacuum worker to stop. Never seen
this one. We're already doing stats reports in the
pgstat_report_stat() call with manipulations of the pgstats
dshash while shutting down.
objid at 5015 means that the procnum is set as such. How many
max_connections do you have? It seems like a high number points to a
better reproducibility.
Robins, is that your host with gcc experimental? Could it be possible
to re-run the test with a patched build with the FATAL upgraded to
PANIC and see what happens?
--
Michael