Re: Synchronizing slots from primary to standby - Mailing list pgsql-hackers

From shveta malik
Subject Re: Synchronizing slots from primary to standby
Date
Msg-id CAJpy0uDbuJNsHngyHcL04oJg9JmO0=yjSn3jWhPTrQ-Gssc5sQ@mail.gmail.com
Whole thread Raw
In response to Re: Synchronizing slots from primary to standby  (Peter Smith <smithpb2250@gmail.com>)
List pgsql-hackers
On Thu, Jan 18, 2024 at 10:31 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> I have one question about the new code in v63-0002.
>
> ======
> src/backend/replication/logical/slotsync.c
>
> 1. ReplSlotSyncWorkerMain
>
> + Assert(SlotSyncWorker->pid == InvalidPid);
> +
> + /*
> + * Startup process signaled the slot sync worker to stop, so if meanwhile
> + * postmaster ended up starting the worker again, exit.
> + */
> + if (SlotSyncWorker->stopSignaled)
> + {
> + SpinLockRelease(&SlotSyncWorker->mutex);
> + proc_exit(0);
> + }
>
> Can we be sure a worker crash can't occur (in ShutDownSlotSync?) in
> such a way that SlotSyncWorker->stopSignaled was already assigned
> true, but SlotSyncWorker->pid was not yet reset to InvalidPid;
>
> e.g. Is the Assert above still OK?

We are good with the Assert here. I tried below cases:

1) When slotsync worker is say killed using 'kill', it is considered
as SIGTERM; slot sync worker invokes 'slotsync_worker_onexit()' before
going down and thus sets SlotSyncWorker->pid = InvalidPid. This means
when it is restarted (considering we have put the breakpoints in such
a way that postmaster had already reached do_start_bgworker() before
promotion finished), it is able to see stopSignaled set but pid is
InvalidPid and thus we are good.

2) Another case is when we kill slot sync worker using 'kill -9' (or
say we make it crash), in such a case, postmaster signals each sibling
process to quit (including startup process) and cleans up the shared
memory used by each (including SlotSyncWorker). In such a case
promotion fails. And if slot sync worker is started again, it will
find pid as InvalidPid. So we are good.

thanks
Shveta



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Network failure may prevent promotion
Next
From: Nathan Bossart
Date:
Subject: Re: ALTER ROLE documentation improvement