Re: Synchronizing slots from primary to standby - Mailing list pgsql-hackers

From shveta malik
Subject Re: Synchronizing slots from primary to standby
Date
Msg-id CAJpy0uDUAbY_gK-bOvoEGBA9KqqU_EDqQ0jAQGk2o1_EWSZtbw@mail.gmail.com
Whole thread Raw
In response to Re: Synchronizing slots from primary to standby  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On Fri, Nov 17, 2023 at 5:08 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Nov 16, 2023 at 5:34 PM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > PFA v35.
> >
>
> Review v35-0002*
> ==============
> 1.
> As quoted in the commit message,
> >
> If a logical slot is invalidated on the primary, slot on the standby is also
> invalidated. If a logical slot on the primary is valid but is invalidated
> on the standby due to conflict (say required rows removed on the primary),
> then that slot is dropped and recreated on the standby in next sync-cycle.
> It is okay to recreate such slots as long as these are not consumable on the
> standby (which is the case currently).
> >
>
> I think this won't happen normally because of the physical slot and
> hot_standby_feedback but probably can occur in cases like if the user
> temporarily switches hot_standby_feedback from on to off. Are there
> any other reasons? I think we can mention the cases along with it as
> well at least for now. Additionally, I think this should be covered in
> code comments as well.
>
> 2.
>  #include "postgres.h"
> -
> +#include "access/genam.h"
>
> Spurious line removal.
>
> 3.
>            A password needs to be provided too, if the sender demands password
>            authentication.  It can be provided in the
>            <varname>primary_conninfo</varname> string, or in a separate
> -          <filename>~/.pgpass</filename> file on the standby server (use
> -          <literal>replication</literal> as the database name).
> -          Do not specify a database name in the
> -          <varname>primary_conninfo</varname> string.
> +          <filename>~/.pgpass</filename> file on the standby server.
> +         </para>
> +         <para>
> +          Specify <literal>dbname</literal> in
> +          <varname>primary_conninfo</varname> string to allow synchronization
> +          of slots from the primary server to the standby server.
> +          This will only be used for slot synchronization. It is ignored
> +          for streaming.
>
> Is there a reason to remove part of the earlier sentence "use
> <literal>replication</literal> as the database name"?
>
> 4.
> +       <primary><varname>enable_syncslot</varname> configuration
> parameter</primary>
> +      </indexterm>
> +      </term>
> +      <listitem>
> +       <para>
> +        It enables a physical standby to synchronize logical failover slots
> +        from the primary server so that logical subscribers are not blocked
> +        after failover.
> +       </para>
> +       <para>
> +        It is enabled by default. This parameter can only be set in the
> +        <filename>postgresql.conf</filename> file or on the server
> command line.
> +       </para>
>
> I think you forgot to update the documentation for the default value
> of this variable.
>
> 5.
> + *   a) start the logical replication workers for every enabled subscription
> + *      when not in standby_mode
> + *   b) start the slot-sync worker for logical failover slots synchronization
> + *      from the primary server when in standby_mode.
>
> Either use a full stop after both lines or none of these.
>
> 6.
> +static void slotsync_worker_cleanup(SlotSyncWorkerInfo * worker);
>
> There shouldn't be space between * and the worker.
>
> 7.
> + if (!SlotSyncWorker->hdr.in_use)
> + {
> + LWLockRelease(SlotSyncWorkerLock);
> + ereport(ERROR,
> + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> + errmsg("replication slot-sync worker not initialized, "
> + "cannot attach")));
> + }
> +
> + if (SlotSyncWorker->hdr.proc)
> + {
> + LWLockRelease(SlotSyncWorkerLock);
> + ereport(ERROR,
> + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> + errmsg("replication slot-sync worker is "
> + "already running, cannot attach")));
> + }
>
> Using slot-sync in the error messages looks a bit odd to me. Can we
> use  "replication slot sync worker ..." in both these and other
> similar messages? I think it would be better if we don't split the
> messages into multiple lines in these cases as messages don't appear
> too long to me.
>
> 8.
> +/*
> + * Detach the worker from DSM and update 'proc' and 'in_use'.
> + * Logical replication launcher will come to know using these
> + * that the worker has shutdown.
> + */
> +void
> +slotsync_worker_detach(int code, Datum arg)
> +{
>
> I think the reference to DSM is leftover from the previous version of
> the patch. Can we change the above comments as per the new code?
>
> 9.
> +static bool
> +slotsync_worker_launch()
> {
> ...
> + /* TODO: do we really need 'generation', analyse more here */
> + worker->hdr.generation++;
>
> We should do something about this TODO. As per my understanding, we
> don't need a generation number for the slot sync worker as we have one
> such worker but I guess the patch requires it because we are using
> existing logical replication worker infrastructure. This brings the
> question of whether we really need a separate SlotSyncWorkerInfo or if
> we can use existing LogicalRepWorker and distinguish it with
> LogicalRepWorkerType? I guess you didn't use it because most of the
> fields in LogicalRepWorker will be unused for slot sync worker.
>
> 10.
> + * Can't use existing functions like 'get_database_oid' from dbcommands.c for
> + * validity purpose as they need db connection.
> + */
> +static bool
> +validate_dbname(const char *dbname)
>
> I don't know how important it is to validate the dbname before
> launching the sync slot worker because anyway after launching, it will
> give an error while initializing the connection if the dbname is
> invalid. But, if we think it is really required, did you consider
> using GetDatabaseTuple()?

I have removed  'validate_dbname' in v40. We let dbname go through
BackgroundWorkerInitializeConnection() which internally does dbname
validation. Later if 'primary_conninfo' is changed and the db name
specified in it is different, we exit the worker and let it get
restarted which will do the validation again when it does
BackgroundWorkerInitializeConnection().

>
> --
> With Regards,
> Amit Kapila.



pgsql-hackers by date:

Previous
From: Dean Rasheed
Date:
Subject: Re: [PATCH] psql: Add tab-complete for optional view parameters
Next
From: "Hayato Kuroda (Fujitsu)"
Date:
Subject: RE: [PoC] pg_upgrade: allow to upgrade publisher node