Re: promotion related handling in pg_sync_replication_slots() - Mailing list pgsql-hackers

From shveta malik
Subject Re: promotion related handling in pg_sync_replication_slots()
Date
Msg-id CAJpy0uA68sK6wPNedbuN-2_ZXfcHfL1U9aUKHHtfPNcb5SqZHQ@mail.gmail.com
Whole thread Raw
In response to Re: promotion related handling in pg_sync_replication_slots()  (Bertrand Drouvot <bertranddrouvot.pg@gmail.com>)
Responses Re: promotion related handling in pg_sync_replication_slots()
List pgsql-hackers
On Fri, Apr 19, 2024 at 10:53 AM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> Hi,
>
> On Thu, Apr 18, 2024 at 05:36:05PM +0530, shveta malik wrote:
> > Please find v8 attached. Changes are:
>
> Thanks!
>
> A few comments:

Thanks for reviewing.

> 1 ===
>
> @@ -1440,7 +1461,7 @@ ReplSlotSyncWorkerMain(char *startup_data, size_t startup_data_len)
>          * slotsync_worker_onexit() but that will need the connection to be made
>          * global and we want to avoid introducing global for this purpose.
>          */
> -       before_shmem_exit(slotsync_failure_callback, PointerGetDatum(wrconn));
> +       before_shmem_exit(slotsync_worker_disconnect, PointerGetDatum(wrconn));
>
> The comment above this change still states "Register the failure callback once
> we have the connection", I think it has to be reworded a bit now that v8 is
> making use of slotsync_worker_disconnect().
>
> 2 ===
>
> +        * Register slotsync_worker_onexit() before we register
> +        * ReplicationSlotShmemExit() in BaseInit(), to ensure that during exit of
> +        * slot sync worker, ReplicationSlotShmemExit() is called first, followed
> +        * by slotsync_worker_onexit(). Startup process during promotion waits for
>
> Worth to mention in shmem_exit() (where it "while (--before_shmem_exit_index >= 0)"
> or before the shmem_exit() definition) that ReplSlotSyncWorkerMain() relies on
> this LIFO behavior? (not sure if there is other "strong" LIFO requirement in
> other part of the code).

I see other modules as well relying on LIFO behavior.
Please see applyparallelworker.c where
'before_shmem_exit(pa_shutdown)' is needed to be done after
'before_shmem_exit(logicalrep_worker_onexit)' (commit id 3d144c6).
Also in postinit.c, I see such comments atop
'before_shmem_exit(ShutdownPostgres, 0)'.
I feel we can skip adding this specific comment about
ReplSlotSyncWorkerMain() in ipc.c, as none of the other modules has
also not added any. I will address the rest of your comments in the
next version.

> 3 ===
>
> +        * Startup process during promotion waits for slot sync to finish and it
> +        * does that by checking the 'syncing' flag.
>
> worth to mention ShutDownSlotSync()?
>
> 4 ===
>
> I did a few tests manually (launching ShutDownSlotSync() through gdb / with and
> without sync worker and with / without pg_sync_replication_slots() running
> concurrently) and it looks like it works as designed.

Thanks for testing it.

> Having said that, the logic that is in place to take care of the corner cases
> described up-thread seems reasonable to me.

thanks
Shveta



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?
Next
From: Tender Wang
Date:
Subject: Re: Can't find not null constraint, but \d+ shows that