Assertion failure on PG15 with modified test_shm_mq test - Mailing list pgsql-hackers

From Pavan Deolasee
Subject Assertion failure on PG15 with modified test_shm_mq test
Date
Msg-id CABOikdM+cR6fDd1XkGu0paKWaa1E_s4WRG3oETw8u_U5NK1oeg@mail.gmail.com
Whole thread Raw
Responses Re: Assertion failure on PG15 with modified test_shm_mq test
Re: Assertion failure on PG15 with modified test_shm_mq test
List pgsql-hackers
Hello,

I've a slightly modified version of test_shm_mq, that I changed to include a shared fileset.  The motivation to do that came because I hit an assertion failure with PG15 while doing some development work on BDR and I suspected it to be a PG15 bug.

The stack trace looks as below:

(lldb) bt
* thread #1
  * frame #0: 0x00007ff8187b100e libsystem_kernel.dylib`__pthread_kill + 10
    frame #1: 0x00007ff8187e71ff libsystem_pthread.dylib`pthread_kill + 263
    frame #2: 0x00007ff818732d24 libsystem_c.dylib`abort + 123
    frame #3: 0x000000010fce1bab postgres`ExceptionalCondition(conditionName="pgstat_is_initialized && !pgstat_is_shutdown", errorType="FailedAssertion", fileName="pgstat.c", lineNumber=1227) at assert.c:69:2
    frame #4: 0x000000010fb06412 postgres`pgstat_assert_is_up at pgstat.c:1227:2
    frame #5: 0x000000010fb0d2c7 postgres`pgstat_get_entry_ref(kind=PGSTAT_KIND_DATABASE, dboid=0, objoid=0, create=true, created_entry=0x0000000000000000) at pgstat_shmem.c:406:2
    frame #6: 0x000000010fb07579 postgres`pgstat_prep_pending_entry(kind=PGSTAT_KIND_DATABASE, dboid=0, objoid=0, created_entry=0x0000000000000000) at pgstat.c:1068:14
    frame #7: 0x000000010fb09cce postgres`pgstat_prep_database_pending(dboid=0) at pgstat_database.c:327:14
    frame #8: 0x000000010fb09dff postgres`pgstat_report_tempfile(filesize=0) at pgstat_database.c:179:10
    frame #9: 0x000000010fa8dbe9 postgres`ReportTemporaryFileUsage(path="base/pgsql_tmp/pgsql_tmp17312.0.fileset/test_mq_sharefile.0", size=0) at fd.c:1521:2
    frame #10: 0x000000010fa8db3c postgres`PathNameDeleteTemporaryFile(path="base/pgsql_tmp/pgsql_tmp17312.0.fileset/test_mq_sharefile.0", error_on_failure=false) at fd.c:1945:3
    frame #11: 0x000000010fa8d3a8 postgres`unlink_if_exists_fname(fname="base/pgsql_tmp/pgsql_tmp17312.0.fileset/test_mq_sharefile.0", isdir=false, elevel=15) at fd.c:3674:3
    frame #12: 0x000000010fa8d270 postgres`walkdir(path="base/pgsql_tmp/pgsql_tmp17312.0.fileset", action=(postgres`unlink_if_exists_fname at fd.c:3663), process_symlinks=false, elevel=15) at fd.c:3573:5
    frame #13: 0x000000010fa8d0e2 postgres`PathNameDeleteTemporaryDir(dirname="base/pgsql_tmp/pgsql_tmp17312.0.fileset") at fd.c:1689:2
    frame #14: 0x000000010fa91ac1 postgres`FileSetDeleteAll(fileset=0x0000000119240870) at fileset.c:165:3
    frame #15: 0x000000010fa92b08 postgres`SharedFileSetOnDetach(segment=0x00007f93ff00a7c0, datum=4716759152) at sharedfileset.c:119:3
    frame #16: 0x000000010fa96b00 postgres`dsm_detach(seg=0x00007f93ff00a7c0) at dsm.c:801:3
    frame #17: 0x000000010fa96f51 postgres`dsm_backend_shutdown at dsm.c:738:3
    frame #18: 0x000000010fa99402 postgres`shmem_exit(code=1) at ipc.c:259:2
    frame #19: 0x000000010fa99227 postgres`proc_exit_prepare(code=1) at ipc.c:194:2
    frame #20: 0x000000010fa99133 postgres`proc_exit(code=1) at ipc.c:107:2
    frame #21: 0x000000010fce318c postgres`errfinish(filename="postgres.c", lineno=3204, funcname="ProcessInterrupts") at elog.c:661:3
    frame #22: 0x000000010fad7c51 postgres`ProcessInterrupts at postgres.c:3201:4
    frame #23: 0x000000011924d85b test_shm_mq.so`test_shm_mq_main(main_arg=1155036180) at worker.c:159:2
    frame #24: 0x000000010f9da11e postgres`StartBackgroundWorker at bgworker.c:858:2
    frame #25: 0x000000010f9e80b4 postgres`do_start_bgworker(rw=0x00007f93ef904080) at postmaster.c:5823:4
    frame #26: 0x000000010f9e2524 postgres`maybe_start_bgworkers at postmaster.c:6047:9
    frame #27: 0x000000010f9e0e63 postgres`sigusr1_handler(postgres_signal_arg=30) at postmaster.c:5204:3
    frame #28: 0x00007ff8187fcdfd libsystem_platform.dylib`_sigtramp + 29
    frame #29: 0x00007ff8187b2d5b libsystem_kernel.dylib`__select + 11
    frame #30: 0x000000010f9e268d postgres`ServerLoop at postmaster.c:1770:13
    frame #31: 0x000000010f9e0157 postgres`PostmasterMain(argc=8, argv=0x0000600002f30190) at postmaster.c:1478:11
    frame #32: 0x000000010f8bc930 postgres`main(argc=8, argv=0x0000600002f30190) at main.c:202:3
    frame #33: 0x000000011f7d651e dyld`start + 462

I notice that pgstat_shutdown_hook() is registered as a before-shmem-exit callback. The callback is responsible for detaching from the pgstat shared memory segment. But looks like other parts of the system still expect it to be available during later stages of proc exit.

It's not clear to me if pgstat shutdown should happen later or code that gets executed later in the cycle should not try to use pgstat. It's also entirely possible that my usage of SharedFileSet is completely wrong. If that's the case, please let me know the mistake in the usage.

Patch modifying the test case is attached. In order to reproduce the problem quickly,  I added a CHECK_FOR_INTERRUPTS() in the test, but I don't see why that would be a bad coding practice.

Thanks,
Pavan



Attachment

pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Remove dummyret definition
Next
From: Bharath Rupireddy
Date:
Subject: Re: add checkpoint stats of snapshot and mapping files of pg_logical dir