Re: Synchronizing slots from primary to standby - Mailing list pgsql-hackers

From shveta malik
Subject Re: Synchronizing slots from primary to standby
Date
Msg-id CAJpy0uCpwShc76zBen4rJsL4FOc3KxC8J8dEuW-KG=a31J6Nfg@mail.gmail.com
Whole thread Raw
In response to Re: Synchronizing slots from primary to standby  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: Synchronizing slots from primary to standby
List pgsql-hackers
On Tue, Feb 20, 2024 at 8:25 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
>
> I've reviewed the v91 patch. Here are random comments:

Thanks for the comments.

> ---
>  /*
>   * Checks the remote server info.
>   *
> - * We ensure that the 'primary_slot_name' exists on the remote server and the
> - * remote server is not a standby node.
> + * Check whether we are a cascading standby. For non-cascading standbys, it
> + * also ensures that the 'primary_slot_name' exists on the remote server.
>   */
>
> IIUC what the validate_remote_info() does doesn't not change by this
> patch, so the previous comment seems to be clearer to me.
>
> ---
>     if (remote_in_recovery)
> +   {
> +       /*
> +        * If we are a cascading standby, no need to check further for
> +        * 'primary_slot_name'.
> +        */
>         ereport(ERROR,
>                 errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
>                 errmsg("cannot synchronize replication slots from a
> standby server"));
> +   }
> +   else
> +   {
> +       bool        primary_slot_valid;
>
> -   primary_slot_valid = DatumGetBool(slot_getattr(tupslot, 2, &isnull));
> -   Assert(!isnull);
> +       primary_slot_valid = DatumGetBool(slot_getattr(tupslot, 2, &isnull));
> +       Assert(!isnull);
>
> -   if (!primary_slot_valid)
> -       ereport(ERROR,
> -               errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> -               errmsg("bad configuration for slot synchronization"),
> -       /* translator: second %s is a GUC variable name */
> -               errdetail("The replication slot \"%s\" specified by
> \"%s\" does not exist on the primary server.",
> -                         PrimarySlotName, "primary_slot_name"));
> +       if (!primary_slot_valid)
> +           ereport(ERROR,
> +                   errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> +                   errmsg("bad configuration for slot synchronization"),
> +           /* translator: second %s is a GUC variable name */
> +                   errdetail("The replication slot \"%s\" specified
> by \"%s\" does not exist on the primary server.",
> +                             PrimarySlotName, "primary_slot_name"));
> +   }
>
> I think it's a refactoring rather than changes required by the
> slotsync worker. We can do that in a separate patch but why do we need
> this change in the first place?

In v90, this refactoring was made due to the fact that
validate_remote_info() was supposed to behave differently for SQL
function and slot-sync worker. SQL-function was supposed to ERROR out
while the worker was supposed to LOG and become no-op. And thus the
change was needed. In v91, we made this functionality same i.e. both
sql function and worker will error out but missed to remove this
refactoring. Thanks for catching this, I will revert it in the next
version. To match the refactoring, I made the comment change too, will
revert that as well.

> ---
> +        ValidateSlotSyncParams(ERROR);
> +
>          /* Load the libpq-specific functions */
>          load_file("libpqwalreceiver", false);
>
> -        ValidateSlotSyncParams();
> +        (void) CheckDbnameInConninfo();
>
> Is there any reason why we move where to check the parameters?

Earlier DBname verification was done inside ValidateSlotSyncParams()
and thus it was needed to load 'libpqwalreceiver' before we call this
function. Now we have moved dbname verification in a separate call and
thus the above order got changed. ValidateSlotSyncParams() is a common
function used by SQL function and worker. Earlier slot sync worker was
checking all the GUCs after starting up and was exiting each time any
GUC was invalid. It was suggested that it would be better to check the
GUCs before starting the slot sync worker in the postmaster itself,
making the ValidateSlotSyncParams() move to postmaster (see
SlotSyncWorkerAllowed).  But it was not a good idea to load libpq in
postmaster and thus we moved libpq related verification out of
ValidateSlotSyncParams(). This resulted in the above change.  I hope
it answers your query.

thanks
Shveta



pgsql-hackers by date:

Previous
From: Andrei Lepikhov
Date:
Subject: Re: POC, WIP: OR-clause support for indexes
Next
From: Amit Kapila
Date:
Subject: Re: Synchronizing slots from primary to standby