Re: Synchronizing slots from primary to standby - Mailing list pgsql-hackers
| From | Drouvot, Bertrand |
|---|---|
| Subject | Re: Synchronizing slots from primary to standby |
| Date | |
| Msg-id | dd9dbbaf-ca77-423a-8d62-bfc814626b47@gmail.com Whole thread Raw |
| In response to | Re: Synchronizing slots from primary to standby (Amit Kapila <amit.kapila16@gmail.com>) |
| Responses |
RE: Synchronizing slots from primary to standby
|
| List | pgsql-hackers |
Hi,
On 11/10/23 8:55 AM, Amit Kapila wrote:
> On Fri, Nov 10, 2023 at 12:50 PM Drouvot, Bertrand
> <bertranddrouvot.pg@gmail.com> wrote:
>>
>> But even if we ERROR out instead of emitting a WARNING, the user would still
>> need to be notified/monitor such errors. I agree that then probably they will
>> come to know earlier because the slot sync mechanism would be stopped but still
>> it is not "guaranteed" (specially if there is no others "working" synced slots
>> around.)
>
>>
>> And if they do not, then there is still a risk to use this slot after a
>> failover thinking this is a "synced" slot.
>>
>
> I think this is another reason that probably giving ERROR has better
> chances for the user to notice before failover. IF knowing such errors
> user still proceeds with the failover, the onus is on her.
Agree. My concern is more when they don't know about the error.
> We can
> probably document this hazard along with the failover feature so that
> users are aware that they either need to be careful while creating
> slots on standby or consult ERROR logs. I guess we can even make it
> visible in the view also.
Yeah.
>> Giving more thoughts, what about using a dedicated/reserved naming convention for
>> synced slot like synced_<primary_slot_name> or such and then:
>>
>> - prevent user to create sync_<whatever> slots on standby
>> - sync <slot> on primary to sync_<slot> on standby
>> - during failover, rename sync_<slot> to <slot> and if <slot> exists then
>> emit a WARNING and keep sync_<slot> in place.
>>
>> That way both slots are still in place (the manually created <slot> and
>> the sync_<slot<) and one could decide what to do with them.
>>
>
> Hmm, I think after failover, users need to rename all slots or we need
> to provide a way to rename them so that they can be used by
> subscribers which sounds like much more work.
Agree that's much more work for the subscriber case. Maybe that's not worth
the extra work.
>>> Also, the current coding doesn't ensure
>>> we will always give WARNING. If we see the below code that deals with
>>> this WARNING,
>>>
>>> + /* User created slot with the same name exists, emit WARNING. */
>>> + else if (found && s->data.sync_state == SYNCSLOT_STATE_NONE)
>>> + {
>>> + ereport(WARNING,
>>> + errmsg("not synchronizing slot %s; it is a user created slot",
>>> + remote_slot->name));
>>> + }
>>> + /* Otherwise create the slot first. */
>>> + else
>>> + {
>>> + TransactionId xmin_horizon = InvalidTransactionId;
>>> + ReplicationSlot *slot;
>>> +
>>> + ReplicationSlotCreate(remote_slot->name, true, RS_EPHEMERAL,
>>> + remote_slot->two_phase, false);
>>>
>>> I think this is not a solid check to ensure that the slot existed
>>> before. Because it could be created as soon as the slot sync worker
>>> invokes ReplicationSlotCreate() here.
>>
>> Agree.
>>
>
> So, having a concrete check to give WARNING would require some more
> logic which I don't think is a good idea to handle this boundary case.
>
Yeah good point, agree to just error out in all the case then (if we discard
the sync_ reserved wording proposal, which seems to be the case as probably
not worth the extra work).
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
pgsql-hackers by date: