RE: ReplicationSlotRelease() crashes when the instance is in the single user mode - Mailing list pgsql-hackers

From Hayato Kuroda (Fujitsu)
Subject RE: ReplicationSlotRelease() crashes when the instance is in the single user mode
Date
Msg-id OSCPR01MB14966D4E276EC9080EF727AA4F52AA@OSCPR01MB14966.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: ReplicationSlotRelease() crashes when the instance is in the single user mode  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: ReplicationSlotRelease() crashes when the instance is in the single user mode
List pgsql-hackers
Dear Robert, Paul, Mutaamba,

Sorry for the late reply. I was in the business trip.

> I don't feel good about the direction from which this patch is
> attacking the problem. The original stack trace looks like this:
> 
> postgres(ExceptionalCondition+0xab)[0xb86a2a]
> postgres(ReplicationSlotRelease+0x5a)[0x8df10b]
> postgres(pg_replication_slot_advance+0x330)[0x8e46ed]
> 
> ReplicationSlotRelease() is called near the end of
> pg_replication_slot_advance. IIUC, this means that we successfully
> acquired the slot and did all the work, and then failed a sanity check
> afterward.

Exactly. The slot is trying to be released after most of tasks are done.

```
#6  0x00000000008fd20b in pg_replication_slot_advance (fcinfo=0x23b4428)
    at ../postgres/src/backend/replication/slotfuncs.c:588
588             ReplicationSlotRelease();
(gdb) list
583              * advancing potentially done.
584              */
585             ReplicationSlotsComputeRequiredXmin(false);
586             ReplicationSlotsComputeRequiredLSN();
587 
588             ReplicationSlotRelease();
```

> So, I see two possibilities: either it's not OK to acquire
> a replication slot in single-user mode, in which case this should have
> failed earlier, or it's also OK to release it, in which case this
> should not have failed at all. Interestingly, I see that
> ReplicationSlotAcquire() features a special case whose specific
> purpose is to allow it to work in single-user mode, which makes me
> rather suspect that the latter is correct.

Agreed.

> It is possible that it was
> intentional that acquiring works in single-user mode and dropping
> (other than via ReplicationSlotDropAcquired) does not, but so far I
> see nothing in the comments suggesting that.
>
> In any event, that inconsistency between how ReplicationSlotAcquire()
> works and how ReplicationSlotRelease() works seems to me to be the
> core thing that should be fixed here. If the selected fix also
> requires error checks in higher-level functions, as added by this
> patch, then well and good. But right now, the proposed patch isn't
> trying to fix the definitional inconsistency in the underlying layer
> at all, and is instead just installing guard rails to try to keep
> anyone from bumping into it. That seems like a bad direction to go --

To confirm; your point is that we should firstly fix to allow acquiring/releasing
slots in the mode, then consider additional guards, is it right? Valid point.
We've not deeply considered but ideas described in [1] could be used.

> it feels like we're blocking access to potentially-useful
> functionality because of an assertion failure that might just be
> slightly sloppy coding.

I still cannot find enough use-cases to allow manipulating slots, though.

[1]:
https://www.postgresql.org/message-id/OSCPR01MB14966ED588A0328DAEBE8CB25F5FA2%40OSCPR01MB14966.jpnprd01.prod.outlook.com

Best regards,
Hayato Kuroda
FUJITSU LIMITED


pgsql-hackers by date:

Previous
From: Chao Li
Date:
Subject: Re: GB18030-2022 Support in PostgreSQL
Next
From: Thomas Munro
Date:
Subject: Re: `pg_ctl init` crashes when run concurrently; semget(2) suspected