Hi,
On Fri, Jan 12, 2024 at 08:42:39AM +0530, Amit Kapila wrote:
> On Thu, Jan 11, 2024 at 9:11 PM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> >
> > On Thu, Jan 11, 2024 at 04:22:56PM +0530, Amit Kapila wrote:
> > > >
> > > > To close the above race, I could think of the following ways:
> > > > 1. Drop and re-create the slot.
> > > > 2. Emit LOG/WARNING in this case and once remote_slot's LSN moves
> > > > ahead of local_slot's LSN then we can update it; but as mentioned in
> > > > your previous comment, we need to update all other fields as well. If
> > > > we follow this then we probably need to have a check for catalog_xmin
> > > > as well.
> >
> > IIUC, this would be a sync slot (so not usable until promotion) that could
> > not be used anyway (invalidated), so I'll vote for drop / re-create then.
> >
>
> No, it can happen for non-sync slots as well.
Yeah, I meant that we could decide to drop/re-create only for sync slots.
>
> > > > Now, related to this the other case which needs some handling is what
> > > > if the remote_slot's restart_lsn is greater than local_slot's
> > > > restart_lsn but it is a re-created slot with the same name. In that
> > > > case, I think the other properties like 'two_phase', 'plugin' could be
> > > > different. So, is simply copying those sufficient or do we need to do
> > > > something else as well?
> > > >
> > >
> >
> > I'm not sure to follow here. If the remote slot is re-created then it would
> > be also dropped / re-created locally, or am I missing something?
> >
>
> As our slot-syncing mechanism is asynchronous (from time to time we
> check the slot information on primary), isn't it possible that the
> same name slot is dropped and recreated between slot-sync worker's
> checks?
>
Yeah, I should have thought harder ;-) So for this case, let's imagine that If we
had an easy way to detect that a remote slot has been drop/re-created then I think
we would also drop and re-create it on the standby too.
If so, I think we should then update all the fields (that we're currently updating
in the "create locally" case) when we detect that (at least) one of the following differs:
- dboid
- plugin
- two_phase
Maybe the "best" approach would be to have a way to detect that a slot has been
re-created on the primary (but that would mean rely on more than the slot name
to "identify" a slot and probably add a new member to the struct to do so).
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com