Question regarding ASSERT_NO_PARTITION_LOCKS_HELD_BY_ME in dshash_detach() - Mailing list pgsql-hackers

From Pavan Deolasee
Subject Question regarding ASSERT_NO_PARTITION_LOCKS_HELD_BY_ME in dshash_detach()
Date
Msg-id CABOikdMzogyfrPLQCNyZkRwX5fR_2-aQVFDeqAg2N3=FhXDfNA@mail.gmail.com
Whole thread Raw
List pgsql-hackers
Hi Andres,

One of my tests hit an assertion in dshash_detach(). Once again this is with BDR and I don't have a reproduction case with standalone PG. Also, this probably happened because of some weirdness in systemd where it removes shared memory segments underneath, resulting in ERRORs being thrown.

However, looking at the stack trace and the code, I wonder if it's possible to hit the assertion even with stock postgres. In my case, the stack trace looked like:

```
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007fa5775b9535 in __GI_abort () at abort.c:79
#2  0x0000556dbce828bc in ExceptionalCondition (conditionName=0x556dbd027c88 "!LWLockAnyHeldByMe(&(hash_table)->control->partitions[0].lock, DSHASH_NUM_PARTITIONS, sizeof(dshash_partition))",
    errorType=0x556dbd027c44 "FailedAssertion", fileName=0x556dbd027c10 "/opt/postgres/src/postgres/src/backend/lib/dshash.c", lineNumber=309)
    at /opt/postgres/src/postgres/src/backend/utils/error/assert.c:69
#3  0x0000556dbcae0aae in dshash_detach (hash_table=0x556dbe0294f0) at /opt/postgres/src/postgres/src/backend/lib/dshash.c:309
#4  0x0000556dbcd045bf in pgstat_detach_shmem () at /opt/postgres/src/postgres/src/backend/utils/activity/pgstat_shmem.c:240
#5  0x0000556dbccfd263 in pgstat_shutdown_hook (code=0, arg=0) at /opt/postgres/src/postgres/src/backend/utils/activity/pgstat.c:509
#6  0x0000556dbcca18b1 in shmem_exit (code=0) at /opt/postgres/src/postgres/src/backend/storage/ipc/ipc.c:239
#7  0x0000556dbcca1769 in proc_exit_prepare (code=0) at /opt/postgres/src/postgres/src/backend/storage/ipc/ipc.c:194
#8  0x0000556dbcca16ba in proc_exit (code=0) at /opt/postgres/src/postgres/src/backend/storage/ipc/ipc.c:107
#9  0x0000556dbcbfcadc in AutoVacWorkerMain (argc=0, argv=0x0) at /opt/postgres/src/postgres/src/backend/postmaster/autovacuum.c:1590
#10 0x0000556dbcbfc968 in StartAutoVacWorker () at /opt/postgres/src/postgres/src/backend/postmaster/autovacuum.c:1496
#11 0x0000556dbcc0aa50 in StartAutovacuumWorker () at /opt/postgres/src/postgres/src/backend/postmaster/postmaster.c:5534
#12 0x0000556dbcc0a56b in sigusr1_handler (postgres_signal_arg=10) at /opt/postgres/src/postgres/src/backend/postmaster/postmaster.c:5239
#13 <signal handler called>
#14 0x00007fa577687a27 in __GI___select (nfds=10, readfds=0x7fff6e69a370, writefds=0x0, exceptfds=0x0, timeout=0x7fff6e69a3f0) at ../sysdeps/unix/sysv/linux/select.c:41
#15 0x0000556dbcc05e7f in ServerLoop () at /opt/postgres/src/postgres/src/backend/postmaster/postmaster.c:1770
#16 0x0000556dbcc0581e in PostmasterMain (argc=5, argv=0x556dbe027490) at /opt/postgres/src/postgres/src/backend/postmaster/postmaster.c:1478
#17 0x0000556dbcafcaf1 in main (argc=5, argv=0x556dbe027490) at /opt/postgres/src/postgres/src/backend/main/main.c:202
```

If the autovacuum worker is not inside a transaction and throws an ERROR while holding a lock on the dshash, AFAICS it can hit proc_exit() without releasing the lock (because there is no abort transaction processing)

For example, at autovaccum.c:1694  pgstat_report_autovac() can theoretically deep down call `dsa_get_address()`, which calls `get_segment_by_index()` and that function has couple of elog(ERROR) calls.

I understand that this ERROR path is probably not likely to hit during normal course, but if it does like in my case, then it will result in assertion failure. I also think a similar problem may have happened in older releases (not the assertion failure, but backends exiting with a LWLock still held), but maybe the likelihood was very small before.

If this is a problem worth addressing, I wonder if we should explicitly release all LWLocks in the long jump handler, like we do for other processes.

Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB: https://www.enterprisedb..com

pgsql-hackers by date:

Previous
From: John Naylor
Date:
Subject: Re: Considering additional sort specialisation functions for PG16
Next
From: Amit Langote
Date:
Subject: Re: SQL/JSON features for v15