Thread: BF mamba failure
Hi, Peter Smith has recently reported a BF failure [1]. AFAICS, the call stack of failure [2] is as follows: 0x1e66644 <ExceptionalCondition+0x8c> at postgres 0x1d0143c <pgstat_release_entry_ref+0x4c0> at postgres 0x1d02534 <pgstat_get_entry_ref+0x780> at postgres 0x1cfb120 <pgstat_prep_pending_entry+0x8c> at postgres 0x1cfd590 <pgstat_report_disconnect+0x34> at postgres 0x1cfbfc0 <pgstat_shutdown_hook+0xd4> at postgres 0x1ca7b08 <shmem_exit+0x7c> at postgres 0x1ca7c74 <proc_exit_prepare+0x70> at postgres 0x1ca7d2c <proc_exit+0x18> at postgres 0x1cdf060 <PostgresMain+0x584> at postgres 0x1c203f4 <ServerLoop.isra.0+0x1e88> at postgres 0x1c2161c <PostmasterMain+0xfa4> at postgres 0x1edcf94 <main+0x254> at postgres I couldn't correlate it to the recent commits. Any thoughts? [1] - https://www.postgresql.org/message-id/CAHut%2BPsHdWFjU43VEX%2BR-8de6dFQ-_JWrsqs%3DvWek1hULexP4Q%40mail.gmail.com [2] - https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mamba&dt=2023-03-17%2005%3A36%3A10 -- With Regards, Amit Kapila.
Amit Kapila <amit.kapila16@gmail.com> writes: > Peter Smith has recently reported a BF failure [1]. AFAICS, the call > stack of failure [2] is as follows: Note the assertion report a few lines further up: TRAP: failed Assert("pg_atomic_read_u32(&entry_ref->shared_entry->refcount) == 0"), File: "pgstat_shmem.c", Line: 560, PID:25004 regards, tom lane
Hi,
18.03.2023 07:26, Tom Lane wrote:
18.03.2023 07:26, Tom Lane wrote:
Amit Kapila <amit.kapila16@gmail.com> writes:Peter Smith has recently reported a BF failure [1]. AFAICS, the call stack of failure [2] is as follows:Note the assertion report a few lines further up: TRAP: failed Assert("pg_atomic_read_u32(&entry_ref->shared_entry->refcount) == 0"), File: "pgstat_shmem.c", Line: 560, PID: 25004
This assertion failure can be reproduced easily with the attached patch:
============== running regression test queries ==============
test oldest_xmin ... ok 55 ms
test oldest_xmin ... FAILED (test process exited with exit code 1) 107 ms
test oldest_xmin ... FAILED (test process exited with exit code 1) 8 ms
============== shutting down postmaster ==============
contrib/test_decoding/output_iso/log/postmaster.log contains:
TRAP: failed Assert("pg_atomic_read_u32(&entry_ref->shared_entry->refcount) == 0"), File: "pgstat_shmem.c", Line: 561, PID: 456844
With the sleep placed above Assert(entry_ref->shared_entry->dropped) this Assert fails too.
Best regards,
Alexander
Attachment
On Sun, Mar 19, 2023 at 2:00 AM Alexander Lakhin <exclusion@gmail.com> wrote: > > Hi, > > 18.03.2023 07:26, Tom Lane wrote: > > Amit Kapila <amit.kapila16@gmail.com> writes: > > Peter Smith has recently reported a BF failure [1]. AFAICS, the call > stack of failure [2] is as follows: > > Note the assertion report a few lines further up: > > TRAP: failed Assert("pg_atomic_read_u32(&entry_ref->shared_entry->refcount) == 0"), File: "pgstat_shmem.c", Line: 560,PID: 25004 > > > This assertion failure can be reproduced easily with the attached patch: > ============== running regression test queries ============== > test oldest_xmin ... ok 55 ms > test oldest_xmin ... FAILED (test process exited with exit code 1) 107 ms > test oldest_xmin ... FAILED (test process exited with exit code 1) 8 ms > ============== shutting down postmaster ============== > > contrib/test_decoding/output_iso/log/postmaster.log contains: > TRAP: failed Assert("pg_atomic_read_u32(&entry_ref->shared_entry->refcount) == 0"), File: "pgstat_shmem.c", Line: 561,PID: 456844 > > With the sleep placed above Assert(entry_ref->shared_entry->dropped) this Assert fails too. > > Best regards, > Alexander I used a slightly modified* patch of Alexander's [1] applied to the latest HEAD code (but with my "toptxn" patch reverted). --- the patch was modified in that I injected 'sleep' both above and below the Assert(entry_ref->shared_entry->dropped). Using this I was also able to reproduce the problem. But test failures were rare. The make check-world seemed OK, and indeed the test_decoding tests would also appear to PASS around 14 out of 15 times. ============== running regression test queries ============== test oldest_xmin ... ok 342 ms test oldest_xmin ... ok 121 ms test oldest_xmin ... ok 283 ms ============== shutting down postmaster ============== ============== removing temporary instance ============== ===================== All 3 tests passed. ===================== ~~ Often (but not always) depite the test_decoding reported PASS all 3 tests as "ok", I still observed there was a TRAP in the logfile (contrib/test_decoding/output_iso/log/postmaster.log). TRAP: failed Assert("entry_ref->shared_entry->dropped") ~~ Occasionally (about 1 in 15 test runs) the test would fail the same way as described by Alexander [1], with the accompanying TRAP. TRAP: failed Assert("pg_atomic_read_u32(&entry_ref->shared_entry->refcount) == 0"), File: "pgstat_shmem.c", Line: 562, PID: 32013 ============== running regression test queries ============== test oldest_xmin ... ok 331 ms test oldest_xmin ... ok 91 ms test oldest_xmin ... FAILED 702 ms ============== shutting down postmaster ============== ====================== 1 of 3 tests failed. ====================== ~~ FWIW, the "toptxn" patch. whose push coincided with the build-farm error I first reported [2], turns out to be an innocent party in this TRAP. We know this because all of the above results were running using HEAD code but with that "toptxn" patch reverted. ------ [1] https://www.postgresql.org/message-id/1941b7e2-be7c-9c4c-8505-c0fd05910e9a%40gmail.com [2] https://www.postgresql.org/message-id/CAHut%2BPsHdWFjU43VEX%2BR-8de6dFQ-_JWrsqs%3DvWek1hULexP4Q%40mail.gmail.com Kind Regards, Peter Smith. Fujitsu Australia