Re: BF mamba failure - Mailing list pgsql-hackers
From | Peter Smith |
---|---|
Subject | Re: BF mamba failure |
Date | |
Msg-id | CAHut+PvVrjwJm_9ZqnXJk4x9k8dN0dYrV+T5_Rd30BSneDhv1A@mail.gmail.com Whole thread Raw |
In response to | Re: BF mamba failure (Alexander Lakhin <exclusion@gmail.com>) |
Responses |
Re: BF mamba failure
|
List | pgsql-hackers |
On Sun, Mar 19, 2023 at 2:00 AM Alexander Lakhin <exclusion@gmail.com> wrote: > > Hi, > > 18.03.2023 07:26, Tom Lane wrote: > > Amit Kapila <amit.kapila16@gmail.com> writes: > > Peter Smith has recently reported a BF failure [1]. AFAICS, the call > stack of failure [2] is as follows: > > Note the assertion report a few lines further up: > > TRAP: failed Assert("pg_atomic_read_u32(&entry_ref->shared_entry->refcount) == 0"), File: "pgstat_shmem.c", Line: 560,PID: 25004 > > > This assertion failure can be reproduced easily with the attached patch: > ============== running regression test queries ============== > test oldest_xmin ... ok 55 ms > test oldest_xmin ... FAILED (test process exited with exit code 1) 107 ms > test oldest_xmin ... FAILED (test process exited with exit code 1) 8 ms > ============== shutting down postmaster ============== > > contrib/test_decoding/output_iso/log/postmaster.log contains: > TRAP: failed Assert("pg_atomic_read_u32(&entry_ref->shared_entry->refcount) == 0"), File: "pgstat_shmem.c", Line: 561,PID: 456844 > > With the sleep placed above Assert(entry_ref->shared_entry->dropped) this Assert fails too. > > Best regards, > Alexander I used a slightly modified* patch of Alexander's [1] applied to the latest HEAD code (but with my "toptxn" patch reverted). --- the patch was modified in that I injected 'sleep' both above and below the Assert(entry_ref->shared_entry->dropped). Using this I was also able to reproduce the problem. But test failures were rare. The make check-world seemed OK, and indeed the test_decoding tests would also appear to PASS around 14 out of 15 times. ============== running regression test queries ============== test oldest_xmin ... ok 342 ms test oldest_xmin ... ok 121 ms test oldest_xmin ... ok 283 ms ============== shutting down postmaster ============== ============== removing temporary instance ============== ===================== All 3 tests passed. ===================== ~~ Often (but not always) depite the test_decoding reported PASS all 3 tests as "ok", I still observed there was a TRAP in the logfile (contrib/test_decoding/output_iso/log/postmaster.log). TRAP: failed Assert("entry_ref->shared_entry->dropped") ~~ Occasionally (about 1 in 15 test runs) the test would fail the same way as described by Alexander [1], with the accompanying TRAP. TRAP: failed Assert("pg_atomic_read_u32(&entry_ref->shared_entry->refcount) == 0"), File: "pgstat_shmem.c", Line: 562, PID: 32013 ============== running regression test queries ============== test oldest_xmin ... ok 331 ms test oldest_xmin ... ok 91 ms test oldest_xmin ... FAILED 702 ms ============== shutting down postmaster ============== ====================== 1 of 3 tests failed. ====================== ~~ FWIW, the "toptxn" patch. whose push coincided with the build-farm error I first reported [2], turns out to be an innocent party in this TRAP. We know this because all of the above results were running using HEAD code but with that "toptxn" patch reverted. ------ [1] https://www.postgresql.org/message-id/1941b7e2-be7c-9c4c-8505-c0fd05910e9a%40gmail.com [2] https://www.postgresql.org/message-id/CAHut%2BPsHdWFjU43VEX%2BR-8de6dFQ-_JWrsqs%3DvWek1hULexP4Q%40mail.gmail.com Kind Regards, Peter Smith. Fujitsu Australia
pgsql-hackers by date: