RE: ReplicationSlotRelease() crashes when the instance is in the single user mode - Mailing list pgsql-hackers
From | Hayato Kuroda (Fujitsu) |
---|---|
Subject | RE: ReplicationSlotRelease() crashes when the instance is in the single user mode |
Date | |
Msg-id | OSCPR01MB14966D4E276EC9080EF727AA4F52AA@OSCPR01MB14966.jpnprd01.prod.outlook.com Whole thread Raw |
In response to | Re: ReplicationSlotRelease() crashes when the instance is in the single user mode (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: ReplicationSlotRelease() crashes when the instance is in the single user mode
|
List | pgsql-hackers |
Dear Robert, Paul, Mutaamba, Sorry for the late reply. I was in the business trip. > I don't feel good about the direction from which this patch is > attacking the problem. The original stack trace looks like this: > > postgres(ExceptionalCondition+0xab)[0xb86a2a] > postgres(ReplicationSlotRelease+0x5a)[0x8df10b] > postgres(pg_replication_slot_advance+0x330)[0x8e46ed] > > ReplicationSlotRelease() is called near the end of > pg_replication_slot_advance. IIUC, this means that we successfully > acquired the slot and did all the work, and then failed a sanity check > afterward. Exactly. The slot is trying to be released after most of tasks are done. ``` #6 0x00000000008fd20b in pg_replication_slot_advance (fcinfo=0x23b4428) at ../postgres/src/backend/replication/slotfuncs.c:588 588 ReplicationSlotRelease(); (gdb) list 583 * advancing potentially done. 584 */ 585 ReplicationSlotsComputeRequiredXmin(false); 586 ReplicationSlotsComputeRequiredLSN(); 587 588 ReplicationSlotRelease(); ``` > So, I see two possibilities: either it's not OK to acquire > a replication slot in single-user mode, in which case this should have > failed earlier, or it's also OK to release it, in which case this > should not have failed at all. Interestingly, I see that > ReplicationSlotAcquire() features a special case whose specific > purpose is to allow it to work in single-user mode, which makes me > rather suspect that the latter is correct. Agreed. > It is possible that it was > intentional that acquiring works in single-user mode and dropping > (other than via ReplicationSlotDropAcquired) does not, but so far I > see nothing in the comments suggesting that. > > In any event, that inconsistency between how ReplicationSlotAcquire() > works and how ReplicationSlotRelease() works seems to me to be the > core thing that should be fixed here. If the selected fix also > requires error checks in higher-level functions, as added by this > patch, then well and good. But right now, the proposed patch isn't > trying to fix the definitional inconsistency in the underlying layer > at all, and is instead just installing guard rails to try to keep > anyone from bumping into it. That seems like a bad direction to go -- To confirm; your point is that we should firstly fix to allow acquiring/releasing slots in the mode, then consider additional guards, is it right? Valid point. We've not deeply considered but ideas described in [1] could be used. > it feels like we're blocking access to potentially-useful > functionality because of an assertion failure that might just be > slightly sloppy coding. I still cannot find enough use-cases to allow manipulating slots, though. [1]: https://www.postgresql.org/message-id/OSCPR01MB14966ED588A0328DAEBE8CB25F5FA2%40OSCPR01MB14966.jpnprd01.prod.outlook.com Best regards, Hayato Kuroda FUJITSU LIMITED
pgsql-hackers by date: