Thread: BF mamba failure

BF mamba failure

From

Amit Kapila

Date:

18 March 2023, 04:11:59

Hi,

Peter Smith has recently reported a BF failure [1]. AFAICS, the call
stack of failure [2] is as follows:

0x1e66644 <ExceptionalCondition+0x8c> at postgres
0x1d0143c <pgstat_release_entry_ref+0x4c0> at postgres
0x1d02534 <pgstat_get_entry_ref+0x780> at postgres
0x1cfb120 <pgstat_prep_pending_entry+0x8c> at postgres
0x1cfd590 <pgstat_report_disconnect+0x34> at postgres
0x1cfbfc0 <pgstat_shutdown_hook+0xd4> at postgres
0x1ca7b08 <shmem_exit+0x7c> at postgres
0x1ca7c74 <proc_exit_prepare+0x70> at postgres
0x1ca7d2c <proc_exit+0x18> at postgres
0x1cdf060 <PostgresMain+0x584> at postgres
0x1c203f4 <ServerLoop.isra.0+0x1e88> at postgres
0x1c2161c <PostmasterMain+0xfa4> at postgres
0x1edcf94 <main+0x254> at postgres

I couldn't correlate it to the recent commits. Any thoughts?

[1] - https://www.postgresql.org/message-id/CAHut%2BPsHdWFjU43VEX%2BR-8de6dFQ-_JWrsqs%3DvWek1hULexP4Q%40mail.gmail.com
[2] -
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mamba&dt=2023-03-17%2005%3A36%3A10

-- 
With Regards,
Amit Kapila.

Re: BF mamba failure

From

Tom Lane

Date:

18 March 2023, 04:26:03

Amit Kapila <amit.kapila16@gmail.com> writes:
> Peter Smith has recently reported a BF failure [1]. AFAICS, the call
> stack of failure [2] is as follows:

Note the assertion report a few lines further up:

TRAP: failed Assert("pg_atomic_read_u32(&entry_ref->shared_entry->refcount) == 0"), File: "pgstat_shmem.c", Line: 560,
PID:25004 

            regards, tom lane

Re: BF mamba failure

From

Alexander Lakhin

Date:

18 March 2023, 15:00:01

Hi,

18.03.2023 07:26, Tom Lane wrote:

Amit Kapila <amit.kapila16@gmail.com> writes:

Peter Smith has recently reported a BF failure [1]. AFAICS, the call
stack of failure [2] is as follows:

Note the assertion report a few lines further up:

TRAP: failed Assert("pg_atomic_read_u32(&entry_ref->shared_entry->refcount) == 0"), File: "pgstat_shmem.c", Line: 560, PID: 25004

This assertion failure can be reproduced easily with the attached patch:
============== running regression test queries        ==============
test oldest_xmin                  ... ok           55 ms
test oldest_xmin                  ... FAILED (test process exited with exit code 1)      107 ms
test oldest_xmin                  ... FAILED (test process exited with exit code 1)        8 ms
============== shutting down postmaster               ==============

contrib/test_decoding/output_iso/log/postmaster.log contains:
TRAP: failed Assert("pg_atomic_read_u32(&entry_ref->shared_entry->refcount) == 0"), File: "pgstat_shmem.c", Line: 561, PID: 456844

With the sleep placed above Assert(entry_ref->shared_entry->dropped) this Assert fails too.

Best regards,
Alexander

Attachment

pgstat_shmem-debugging.patch

Re: BF mamba failure

From

Peter Smith

Date:

20 March 2023, 06:10:46

On Sun, Mar 19, 2023 at 2:00 AM Alexander Lakhin <exclusion@gmail.com> wrote:
>
> Hi,
>
> 18.03.2023 07:26, Tom Lane wrote:
>
> Amit Kapila <amit.kapila16@gmail.com> writes:
>
> Peter Smith has recently reported a BF failure [1]. AFAICS, the call
> stack of failure [2] is as follows:
>
> Note the assertion report a few lines further up:
>
> TRAP: failed Assert("pg_atomic_read_u32(&entry_ref->shared_entry->refcount) == 0"), File: "pgstat_shmem.c", Line:
560,PID: 25004 
>
>
> This assertion failure can be reproduced easily with the attached patch:
> ============== running regression test queries        ==============
> test oldest_xmin                  ... ok           55 ms
> test oldest_xmin                  ... FAILED (test process exited with exit code 1)      107 ms
> test oldest_xmin                  ... FAILED (test process exited with exit code 1)        8 ms
> ============== shutting down postmaster               ==============
>
> contrib/test_decoding/output_iso/log/postmaster.log contains:
> TRAP: failed Assert("pg_atomic_read_u32(&entry_ref->shared_entry->refcount) == 0"), File: "pgstat_shmem.c", Line:
561,PID: 456844 
>
> With the sleep placed above Assert(entry_ref->shared_entry->dropped) this Assert fails too.
>
> Best regards,
> Alexander

I used a slightly modified* patch of Alexander's [1] applied to the
latest HEAD code (but with my "toptxn" patch reverted).
--- the patch was modified in that I injected 'sleep' both above and
below the Assert(entry_ref->shared_entry->dropped).

Using this I was also able to reproduce the problem. But test failures
were rare. The make check-world seemed OK, and indeed the
test_decoding tests would also appear to PASS around 14 out of 15
times.

============== running regression test queries        ==============
test oldest_xmin                  ... ok          342 ms
test oldest_xmin                  ... ok          121 ms
test oldest_xmin                  ... ok          283 ms
============== shutting down postmaster               ==============
============== removing temporary instance            ==============

=====================
 All 3 tests passed.
=====================

~~

Often (but not always) depite the test_decoding reported PASS all 3
tests as "ok", I still observed there was a TRAP in the logfile
(contrib/test_decoding/output_iso/log/postmaster.log).
TRAP: failed Assert("entry_ref->shared_entry->dropped")

~~

Occasionally (about 1 in 15 test runs) the test would fail the same
way as described by Alexander [1], with the accompanying TRAP.
TRAP: failed Assert("pg_atomic_read_u32(&entry_ref->shared_entry->refcount)
== 0"), File: "pgstat_shmem.c", Line: 562, PID: 32013

============== running regression test queries        ==============
test oldest_xmin                  ... ok          331 ms
test oldest_xmin                  ... ok           91 ms
test oldest_xmin                  ... FAILED      702 ms
============== shutting down postmaster               ==============

======================
 1 of 3 tests failed.
======================


~~


FWIW, the "toptxn" patch. whose push coincided with the build-farm
error I first reported [2], turns out to be an innocent party in this
TRAP. We know this because all of the above results were running using
HEAD code but with that "toptxn" patch reverted.

------
[1] https://www.postgresql.org/message-id/1941b7e2-be7c-9c4c-8505-c0fd05910e9a%40gmail.com
[2] https://www.postgresql.org/message-id/CAHut%2BPsHdWFjU43VEX%2BR-8de6dFQ-_JWrsqs%3DvWek1hULexP4Q%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

Re: BF mamba failure

From

Alexander Lakhin

Date:

12 June 2024, 17:00:00

Hello hackers,

20.03.2023 09:10, Peter Smith wrote:
>
> Using this I was also able to reproduce the problem. But test failures
> were rare. The make check-world seemed OK, and indeed the
> test_decoding tests would also appear to PASS around 14 out of 15
> times.

I've stumbled upon this assertion failure again during testing following cd312adc5.

This time I've simplified the reproducer to the attached modification.
With this patch applied, `make -s check -C contrib/test_decoding` fails on master as below:
ok 1         - pgstat_rc_1                                14 ms
not ok 2     - pgstat_rc_2                              1351 ms

contrib/test_decoding/output_iso/log/postmaster.log contains:
TRAP: failed Assert("pg_atomic_read_u32(&entry_ref->shared_entry->refcount) == 0"), File: "pgstat_shmem.c", Line: 562,

PID: 1130928

With extra logging added, I see the following events happening:
1) pgstat_rc_1.setup calls pgstat_create_replslot(), gets
   ReplicationSlotIndex(slot) = 0 and calls
   pgstat_get_entry_ref_locked(PGSTAT_KIND_REPLSLOT, InvalidOid, 0, 0).

2) pgstat_rc_1.s0_get_changes executes pg_logical_slot_get_changes(...)
   and then calls pgstat_gc_entry_refs on shmem_exit() ->
   pgstat_shutdown_hook() ...;
   with the sleep added inside pgstat_release_entry_ref, this backend waits
   after decreasing entry_ref->shared_entry->refcount to 0.

3) pgstat_rc_1.stop removes the replication slot.

4) pgstat_rc_2.setup calls pgstat_create_replslot(), gets
   ReplicationSlotIndex(slot) = 0 and calls
   pgstat_get_entry_ref_locked(PGSTAT_KIND_REPLSLOT, InvalidOid, 0, 0),
   which leads to the call pgstat_reinit_entry(), which increases refcount
   for the same shared_entry as in (1) and (2), and then to the call
   pgstat_acquire_entry_ref(), which increases refcount once more.

5) the backend 2 reaches
Assert(pg_atomic_read_u32(&entry_ref->shared_entry->refcount) == 0),
   which fails due to refcount = 2.

Best regards,
Alexander

Attachment

pgstat_rc-debugging.patch

Re: BF mamba failure

From

Michael Paquier

Date:

14 June 2024, 05:31:37

On Wed, Jun 12, 2024 at 08:00:00PM +0300, Alexander Lakhin wrote:
> With extra logging added, I see the following events happening:
> 1) pgstat_rc_1.setup calls pgstat_create_replslot(), gets
>   ReplicationSlotIndex(slot) = 0 and calls
>   pgstat_get_entry_ref_locked(PGSTAT_KIND_REPLSLOT, InvalidOid, 0, 0).
>
> 2) pgstat_rc_1.s0_get_changes executes pg_logical_slot_get_changes(...)
>   and then calls pgstat_gc_entry_refs on shmem_exit() ->
>   pgstat_shutdown_hook() ...;
>   with the sleep added inside pgstat_release_entry_ref, this backend waits
>   after decreasing entry_ref->shared_entry->refcount to 0.
>
> 3) pgstat_rc_1.stop removes the replication slot.
>
> 4) pgstat_rc_2.setup calls pgstat_create_replslot(), gets
>   ReplicationSlotIndex(slot) = 0 and calls
>   pgstat_get_entry_ref_locked(PGSTAT_KIND_REPLSLOT, InvalidOid, 0, 0),
>   which leads to the call pgstat_reinit_entry(), which increases refcount
>   for the same shared_entry as in (1) and (2), and then to the call
>   pgstat_acquire_entry_ref(), which increases refcount once more.
>
> 5) the backend 2 reaches
> Assert(pg_atomic_read_u32(&entry_ref->shared_entry->refcount) == 0),
>   which fails due to refcount = 2.

Thanks for the details.

So this comes down to the point that we offer no guarantee that the
stats entry a backend sees at shutdown is the same as the one it wants
to clean up.  That's the same problem as what Floris has reported
here, with an OID wraparound and tables to get the same hash key.
That can happen for all stats kinds:
https://www.postgresql.org/message-id/b14ae28029f64757bb64613be2549a74@Optiver.com

I don't think that this is going to fly far except if we introduce a
concept of "generation" or "age" in the stats entries.  The idea is
simple: when a stats entry is reinitialized because of a drop&create,
increment a counter to tell that this is a new generation, and keep
track of it in *both* PgStat_EntryRef (local backend reference to the
shmem stats entry) *and* PgStatShared_HashEntry (the central one).
When releasing an entry, if we know that the shared entry we are
pointing at is not of the same generation as the local reference, it
means that the entry has been reused for something else with the same
hash key, so give up.  It should not be that invasive, still it means
ABI breakage in the two pgstats internal structures I am mentioning,
which is OK for a backpatch as this stuff is internal.  On top of
that, this problem means that we can silently and randomly lose stats,
which is not cool.

Note that Noah has been working on a better integration of injection
points with isolation tests.  We could reuse that here to have a test
case, with an injection point waiting around the pg_usleep() you have
hardcoded:
https://www.postgresql.org/message-id/20240614003549.c2.nmisch@google.com

I'll try to give it a go on Monday.
--
Michael

Attachment

signature.asc

Re: BF mamba failure

From

Michael Paquier

Date:

17 June 2024, 04:32:30

On Fri, Jun 14, 2024 at 02:31:37PM +0900, Michael Paquier wrote:
> I don't think that this is going to fly far except if we introduce a
> concept of "generation" or "age" in the stats entries.  The idea is
> simple: when a stats entry is reinitialized because of a drop&create,
> increment a counter to tell that this is a new generation, and keep
> track of it in *both* PgStat_EntryRef (local backend reference to the
> shmem stats entry) *and* PgStatShared_HashEntry (the central one).
> When releasing an entry, if we know that the shared entry we are
> pointing at is not of the same generation as the local reference, it
> means that the entry has been reused for something else with the same
> hash key, so give up.  It should not be that invasive, still it means
> ABI breakage in the two pgstats internal structures I am mentioning,
> which is OK for a backpatch as this stuff is internal.  On top of
> that, this problem means that we can silently and randomly lose stats,
> which is not cool.
>
> I'll try to give it a go on Monday.

Here you go, the patch introduces what I've named an "age" counter
attached to the shared entry references, and copied over to the local
references.  The countner is initialized at 0 and incremented each
time an entry is reused, then when attempting to drop an entry we
cross-check the version hold locally with the shared one.

While looking at the whole, this is close to a concept patch sent
previously, where a counter is used in the shared entry with a
cross-check done with the local reference, that was posted here
(noticed that today):
https://www.postgresql.org/message-id/20230603.063418.767871221863527769.horikyota.ntt@gmail.com

The logic is different though, as we don't need to care about the
contents of the local cache when cross-checking the "age" count when
retrieving the contents, just the case where a backend would attempt
to drop an entry it thinks is OK to operate on, that got reused
because of the effect of other backends doing creates and drops with
the same hash key.

This idea needs more eyes, so I am adding that to the next CF for now.
I've played with it for a few hours and concurrent replication slot
drops/creates, without breaking it.  I have not implemented an
isolation test for this case, as it depends on where we are going with
their integration with injection points.
--
Michael

Attachment

Re: BF mamba failure

From

Kouber Saparev

Date:

10 October 2024, 09:19:30

Am I correct to believe this patch is fixing the "can only drop stats once" issue?

It just happened to us, one of our streaming replicas decided to shut down.

Re: BF mamba failure

From

Bertrand Drouvot

Date:

10 October 2024, 17:42:09

Hi,

On Thu, Oct 10, 2024 at 06:19:30AM +0000, Kouber Saparev wrote:
> Am I correct to believe this patch is fixing the "can only drop stats once" issue?
> 
> It just happened to us, one of our streaming replicas decided to shut down.

Does the error message looks like (added in [1]):

"
trying to drop stats entry already dropped: kind=%s dboid=%u objoid=%u refcount=%u",
"

by any chance? If so, would you mind to share it?

Regards,

[1]: https://postgr.es/c/d92573adcb

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: BF mamba failure

From

Kouber Saparev

Date:

11 October 2024, 11:07:29

На чт, 10.10.2024 г. в 17:42 Bertrand Drouvot <bertranddrouvot.pg@gmail.com> написа:

Does the error message looks like (added in [1]):

"
trying to drop stats entry already dropped: kind=%s dboid=%u objoid=%u refcount=%u",
"

by any chance? If so, would you mind to share it?

Regards,

[1]: https://postgr.es/c/d92573adcb

Unfortunately not, we are running 15.4 and planning to upgrade very soon. Is the patch mentioned already merged in PostgreSQL 16?

The last lines in the log are:

2024-09-22 07:33:53.913 UTC,,,223214,,64e79dc1.367ee,3255151,,2023-08-24 18:13:21 UTC,1/0,0,FATAL,XX000,"can only drop stats once",,,,,"WAL redo at 45F8F/DB943818 for Transaction/COMMIT: 2024-09-22 07:24:49.603408+00; dropped stats: 2/16420/3237787065 2/16420/3237787064 2/16420/3237787061; inval msgs: catcache 32 catcache 7 catcache 6 catcache 7 catcache 6 catcache 55 catcache 54 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 55 catcache 54 catcache 80 catcache 79 catcache 80 catcache 79 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 55 catcache 54 relcache 3237787065 relcache 3237787064 snapshot 2608 relcache 3237787064 snapshot 2608 snapshot 2608 snapshot 2608 relcache 3237787061 snapshot 2608 snapshot 1214",,,,"","startup",,0
2024-09-22 07:33:54.501 UTC,,,223207,,64e79dc0.367e7,1423,,2023-08-24 18:13:20 UTC,,0,LOG,00000,"startup process (PID 223214) exited with exit code 1",,,,,,,,,"","postmaster",,0
2024-09-22 07:33:54.501 UTC,,,223207,,64e79dc0.367e7,1424,,2023-08-24 18:13:20 UTC,,0,LOG,00000,"terminating any other active server processes",,,,,,,,,"","postmaster",,0
2024-09-22 07:33:55.398 UTC,,,223207,,64e79dc0.367e7,1425,,2023-08-24 18:13:20 UTC,,0,LOG,00000,"shutting down due to startup process failure",,,,,,,,,"","postmaster",,0
2024-09-22 07:33:56.697 UTC,,,223207,,64e79dc0.367e7,1426,,2023-08-24 18:13:20 UTC,,0,LOG,00000,"database system is shut down",,,,,,,,,"","postmaster",,0

Kouber

Re: BF mamba failure

From

Bertrand Drouvot

Date:

11 October 2024, 11:18:58

Hi,

On Fri, Oct 11, 2024 at 11:07:29AM +0300, Kouber Saparev wrote:
> На чт, 10.10.2024 г. в 17:42 Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
> написа:
> 
> >
> > Does the error message looks like (added in [1]):
> >
> > "
> > trying to drop stats entry already dropped: kind=%s dboid=%u objoid=%u
> > refcount=%u",
> > "
> >
> > by any chance? If so, would you mind to share it?
> >
> > Regards,
> >
> > [1]: https://postgr.es/c/d92573adcb
> 
> 
> Unfortunately not, we are running 15.4 and planning to upgrade very soon.
> Is the patch mentioned already merged in PostgreSQL 16?

Yes, as of 16.4.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: BF mamba failure

From

Bertrand Drouvot

Date:

16 October 2024, 13:11:08

Hi,

On Wed, Oct 16, 2024 at 09:43:48AM +0900, Michael Paquier wrote:
> On Fri, Oct 11, 2024 at 08:18:58AM +0000, Bertrand Drouvot wrote:
> > On Fri, Oct 11, 2024 at 11:07:29AM +0300, Kouber Saparev wrote:
> >> Unfortunately not, we are running 15.4 and planning to upgrade very soon.
> >> Is the patch mentioned already merged in PostgreSQL 16?
> > 
> > Yes, as of 16.4.
> 
> Right.  I'd surely welcome more eyes on what I have posted here:
> https://www.postgresql.org/message-id/Zm-8Xo93K9yD9fy7@paquier.xyz

I applied your patch and do confirm that it fixes the issue. While that works I
wonder is there no other way to fix the issue.

Indeed, in pgstat_release_entry_ref(), we're doing:

"
if (pg_atomic_fetch_sub_u32(&entry_ref->shared_entry->refcount, 1) == 1)
.
.
    shent = dshash_find(pgStatLocal.shared_hash,
                        &entry_ref->shared_entry->key,
                        true);
"

I wonder if we are not decrementing &entry_ref->shared_entry->refcount too early.

I mean, wouldn't that make sense to decrement it after the dshash_find() call?
(to ensure a "proper" whole entry locking, done in dshash_find(), first)

> I am a bit annoyed by the fact of making PgStatShared_HashEntry
> slightly larger to track the age of the entries,

Yeah, FWIW, we would be going from 32 bytes:

(gdb)  ptype /o struct PgStatShared_HashEntry
/* offset      |    size */  type = struct PgStatShared_HashEntry {
/*      0      |      16 */    PgStat_HashKey key;
/*     16      |       1 */    _Bool dropped;
/* XXX  3-byte hole      */
/*     20      |       4 */    pg_atomic_uint32 refcount;
/*     24      |       8 */    dsa_pointer body;

                               /* total size (bytes):   32 */
                             }

to 40 bytes (with the patch applied):

(gdb) ptype /o struct PgStatShared_HashEntry
/* offset      |    size */  type = struct PgStatShared_HashEntry {
/*      0      |      16 */    PgStat_HashKey key;
/*     16      |       1 */    _Bool dropped;
/* XXX  3-byte hole      */
/*     20      |       4 */    pg_atomic_uint32 refcount;
/*     24      |       4 */    pg_atomic_uint32 agecount;
/* XXX  4-byte hole      */
/*     32      |       8 */    dsa_pointer body;

                               /* total size (bytes):   40 */
                             }

Due to the padding, that's a 8 bytes increase while we're adding "only" 4 bytes.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: BF mamba failure

From

Bertrand Drouvot

Date:

14 November 2024, 13:07:41

Hi,

On Thu, Nov 14, 2024 at 04:55:23PM +0900, Michael Paquier wrote:
> I've been going through this patch again, and the failures that could
> be seen because of such failures, like standbys failing in an
> unpredictible way is not cool, so I am planning to apply the attached
> down to 15 now that the release has colled down.  At the end I am not
> really stressing about this addition in the structure for the sake of
> making the stats entries safe to handle.

I don't think the addition is a concern too.

> I did not like much the name "agecount" though, used to cross-check
> how many times an entry is reused in shared memory and in the local
> copy we keep in a process, so I've renamed it to "generation".

"generation" sounds more appropriate to me too.

The approach to avoid the error sounds reasonable to me.

Just 2 comments about the patch:

=== 1

Maybe use "generation" instead of generation in the comments (where it's not done,
some comments do it already).

=== 2

We could think about adding a test. That should be doable with replication slots
or custom pgstats kinds and probably injection points. But I'm not sure that's
worth it, as the code in the patch looks "simple" enough. Thoughts?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com