Re: PGDOCS - Logical replication GUCs - added some xrefs - Mailing list pgsql-hackers

From samay sharma
Subject Re: PGDOCS - Logical replication GUCs - added some xrefs
Date
Msg-id CAJxrbywJoYWe9FL7NjLhKm4ZGRytm-63r2n05NH2bz=oJL472g@mail.gmail.com
Whole thread Raw
In response to Re: PGDOCS - Logical replication GUCs - added some xrefs  (Peter Smith <smithpb2250@gmail.com>)
Responses Re: PGDOCS - Logical replication GUCs - added some xrefs  (Peter Smith <smithpb2250@gmail.com>)
Re: PGDOCS - Logical replication GUCs - added some xrefs  (Alvaro Herrera <alvherre@alvh.no-ip.org>)
List pgsql-hackers
Hi,

On Tue, Dec 6, 2022 at 11:12 PM Peter Smith <smithpb2250@gmail.com> wrote:
On Tue, Dec 6, 2022 at 5:57 AM samay sharma <smilingsamay@gmail.com> wrote:
>
> Hi,
>
> On Mon, Oct 24, 2022 at 12:45 AM Peter Smith <smithpb2250@gmail.com> wrote:
>>
>> Hi hackers.
>>
>> There is a docs Logical Replication section "31.10 Configuration
>> Settings" [1] which describes some logical replication GUCs, and
>> details on how they interact with each other and how to take that into
>> account when setting their values.
>>
>> There is another docs Server Configuration section "20.6 Replication"
>> [2] which lists the replication-related GUC parameters, and what they
>> are for.
>>
>> Currently AFAIK those two pages are unconnected, but I felt it might
>> be helpful if some of the parameters in the list [2] had xref links to
>> the additional logical replication configuration information [1]. PSA
>> a patch to do that.
>
>
> +1 on the patch. Some feedback on v5 below.
>

Thanks for your detailed review comments!

I have changed most things according to your suggestions. Please check patch v6.

Thanks for the changes. See a few points of feedback below.

> +    <para>
> +     For <emphasis>logical replication</emphasis>, <firstterm>publishers</firstterm>
> +     (servers that do <link linkend="sql-createpublication"><command>CREATE PUBLICATION</command></link>)
> +     replicate data to <firstterm>subscribers</firstterm>
> +     (servers that do <link linkend="sql-createsubscription"><command>CREATE SUBSCRIPTION</command></link>).
> +     Servers can also be publishers and subscribers at the same time. Note,
> +     the following sections refers to publishers as "senders". The parameter
> +     <literal>max_replication_slots</literal> has a different meaning for the
> +     publisher and subscriber, but all other parameters are relevant only to
> +     one side of the replication. For more details about logical replication
> +     configuration settings refer to
> +     <xref linkend="logical-replication-config"/>.
> +    </para>

The second last line seems a bit odd here. In my last round of feedback, I had meant to add the line "The parameter .... " onwards to the top of logical-replication-config.sgml.

What if we made the top of logical-replication-config.sgml like below?

Logical replication requires several configuration options to be set. Most configuration options are relevant only on one side of the replication (i.e. publisher or subscriber). However, max_replication_slots is applicable on both sides but has different meanings on each side.


> > +    <para>
> > +     For <firstterm>logical replication</firstterm> configuration settings refer
> > +     also to <xref linkend="logical-replication-config"/>.
> > +    </para>
> > +
>
> I feel the top paragraph needs to explain terminology for logical replication like it does for physical replication in addition to linking to the logical replication config page. I'm recommending this as we use terms like subscriber etc. in description of parameters without introducing them first.
>
> As an example, something like below might work.
>
> These settings control the behavior of the built-in streaming replication feature (see Section 27.2.5) and logical replication (link).
>
> For physical replication, servers will be either a primary or a standby server. Primaries can send data, while standbys are always receivers of replicated data. When cascading replication (see Section 27.2.7) is used, standby servers can also be senders, as well as receivers. Parameters are mainly for sending and standby servers, though some parameters have meaning only on the primary server. Settings may vary across the cluster without problems if that is required.
>
> For logical replication, servers will either be publishers (also called senders in the sections below) or subscribers. Publishers are ...., Subscribers are...
>

OK. I split this blurb into 2 parts – streaming and logical
replication. The streaming replication part is the same as before. The
logical replication part is new.

> > +       <para>
> > +         See <xref linkend="logical-replication-config"/> for more details
> > +         about setting <varname>max_replication_slots</varname> for logical
> > +         replication.
> > +        </para>
>
>
> The link doesn't add any new information regarding max_replication_slots other than "to reserve some for table sync" and has a good amount of unrelated info. I think it might be useful to just put a line here asking to reserve some for table sync instead of linking to the entire logical replication config section.
>

OK. I copied the tablesync note back to config.sgml definition of
'max_replication_slots' and removed the link as suggested. Frankly, I
also thought it is a bit strange that the max_replication_slots in the
“Sending Servers” section was describing this parameter for
“Subscribers”. OTOH, I did not want to split the definition in half so
instead, I’ve added another Subscriber <varlistentry> that just refers
back to this place. It looks like an improvement to me.

Hmm, I agree this is a tricky scenario. However, to me, it seems odd to mention the parameter twice as this chapter of the docs just lists each parameter and describes them. So, I'd probably remove the reference to it in the subscriber section. We should describe it's usage in different places in the logical replication part of the docs (as we do).
 

> > -   Logical replication requires several configuration options to be set.
> > +   Logical replication requires several configuration parameters to be set.
>
> May not be needed? The docs have references to both options and parameters but I don't feel strongly about it. Feel free to use what you prefer.

OK. I removed this.

>
> I think we should add an additional line to the intro here saying that parameters are mostly relevant only one of the subscriber or publisher. Maybe a better written version of "While max_replication_slots means different things on the publisher and subscriber, all other parameters are relevant only on either the publisher or the subscriber."
>

OK. Done but with slightly different wording to that.

> > +  <sect2 id="logical-replication-config-notes">
> > +   <title>Notes</title>
>
> I don't think we need this sub-section. If I understand correctly, these parameters are effective only on the subscriber side. So, any reason to not include them in that section?

OK. I moved these notes into the "Subscribers" section as suggested,
and removed "Notes".

>
> > +
> > +   <para>
> > +    Logical replication workers are also affected by
> > +    <link linkend="guc-wal-receiver-timeout"><varname>wal_receiver_timeout</varname></link>,
> > +    <link linkend="guc-wal-receiver-status-interval"><varname>wal_receiver_status_interval</varname></link> and
> > +    <link linkend="guc-wal-retrieve-retry-interval"><varname>wal_receiver_retry_interval</varname></link>.
> > +   </para>
> > +
>
> I like moving this; it makes more sense here. Should we remove it from config.sgml? It seems a bit out of place there as we generally talk only about individual parameters there and this line is general logical replication subscriber advise which is more suited to logical-replication.sgml

OK. I agree, it looked repetitive since the link to the
logical-replication page is nearby this information anyway, so I’ve
removed it from the config.sgml as you suggested.

>
> > +   <para>
> > +    Configuration parameter
> > +    <link linkend="guc-max-worker-processes"><varname>max_worker_processes</varname></link>
> > +    may need to be adjusted to accommodate for replication workers, at least (
> > +    <link linkend="guc-max-logical-replication-workers"><varname>max_logical_replication_workers</varname></link>
> > +    + <literal>1</literal>). Some extensions and parallel queries also take
> > +    worker slots from <varname>max_worker_processes</varname>.
> > +   </para>
> > +
> > +  </sect2>
>
> I think we should move this to the subscriber section as said above. It's useful to know this and people might skip over the notes.

OK. Done.

> +   <para>
> +    <link linkend="guc-max-logical-replication-workers"><varname>max_logical_replication_workers</varname></link>
> +    must be set to at least the number of subscriptions (for apply workers), plus
> +    some reserve for the table synchronization workers. Configuration parameter
> +    <link linkend="guc-max-worker-processes"><varname>max_worker_processes</varname></link>
> +    may need to be adjusted to accommodate for replication workers, at least (
> +    <link linkend="guc-max-logical-replication-workers"><varname>max_logical_replication_workers</varname></link>
> +    + <literal>1</literal>). Note, some extensions and parallel queries also
> +    take worker slots from <varname>max_worker_processes</varname>.
> +   </para>

Maybe do max_worker_processes in a new line like the rest.

Regards,
Samay
Microsoft
 

------
Kind Regards,
Peter Smith.
Fujitsu Australia

pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: [PATCH] pg_dump: lock tables in batches
Next
From: Tom Lane
Date:
Subject: Re: Error-safe user functions