Re: Logical Replication of sequences - Mailing list pgsql-hackers
| From | vignesh C |
|---|---|
| Subject | Re: Logical Replication of sequences |
| Date | |
| Msg-id | CALDaNm0kGfCS5OY7x=JzzAMbOcwh+ykpGtV8JOfN4gG131gAUw@mail.gmail.com Whole thread Raw |
| In response to | Re: Logical Replication of sequences (Peter Smith <smithpb2250@gmail.com>) |
| List | pgsql-hackers |
On Fri, 31 Oct 2025 at 11:26, Peter Smith <smithpb2250@gmail.com> wrote: > > Hi Vignesh, > > For later.... here are some review comments for the documentation > patch v20251030-0002 > > ====== > doc/src/sgml/config.sgml > > wal_retrieve_retry_interval: > > 1. > <para> > - In logical replication, this parameter also limits how often a failing > - replication apply worker or table synchronization worker will be > - respawned. > + In logical replication, this parameter also limits how quickly a > + failing replication apply worker, table synchronization worker, or > + sequence synchronization worker will be respawned. > </para> > > I think you can simplify that. > > SUGGESTION > In logical replication, this parameter also limits how quickly a > failing replication apply worker, or table/sequence synchronization > worker will be respawned. Modified > ~~~ > > max_logical_replication_workers: > > 2. > <para> > Specifies maximum number of logical replication workers. This includes > - leader apply workers, parallel apply workers, and table synchronization > - workers. > + leader apply workers, parallel apply workers, table synchronization > + workers and a sequence synchronization worker. > </para> > > I think you can simplify that. > > SUGGESTION > This includes leader apply workers, parallel apply workers, and > table/sequence synchronization workers. Modified > ~~~ > > max_sync_workers_per_subscription: > > 3. > <para> > Maximum number of synchronization workers per subscription. This > parameter controls the amount of parallelism of the initial data copy > - during the subscription initialization or when new tables are added. > + during the subscription initialization or when new tables or sequences > + are added. > </para> > > But, there is no parallelism at all for sequence copies, because there > is only one sequencesync worker (as the following docs paragraph > says), so maybe we do not need this docs change. > > NOTE -- see the comment #12 below, and maybe use wording like that. Modified similarly > ====== > doc/src/sgml/logical-replication.sgml > > Section 29.1 Publication: > > 4. > Publications may currently only contain tables or sequences. Objects must be > added explicitly, except when a publication is created using > <literal>FOR TABLES IN SCHEMA</literal>, <literal>FOR ALL TABLES</literal>, > - or <literal>FOR ALL SEQUENCES</literal>. > + or <literal>FOR ALL SEQUENCES</literal>. Unlike tables, the state of > + sequences can be synchronized at any time. For more information, see > + <xref linkend="logical-replication-sequences"/>. > > Not sure about the wording "the state of". Maybe it can be simplified? > > SUGGESTION > Unlike tables, sequences can be synchronized at any time. Modified > ~~~ > > 5. > + <listitem> > + <para> > + use <link linkend="sql-altersubscription-params-refresh-sequences"> > + <command>ALTER SUBSCRIPTION ... REFRESH SEQUENCES</command></link> > + to re-synchronize all sequences. > + </para> > + </listitem> > > AFAIK it's not going to get any newly added sequences so it is not > really "all sequences" so this seems misleading. I thought it should > be like below. > > SUGGESTION > use ALTER SUBSCRIPTION ... REFRESH SEQUENCES to re-synchronize all > sequences currently known to the subscription. Modified > ~~~ > > Section 29.7.1. Sequence Definition Mismatches: > > 6. > + <para> > + During sequence synchronization, the sequence definitions of the publisher > + and the subscriber are compared. An error is logged listing all differing > + sequences before the process exits. The apply worker detects this failure > + and repeatedly respawns the sequence synchronization worker to retry until > + all differences are resolved. See also > + <link linkend="guc-wal-retrieve-retry-interval"><varname>wal_retrieve_retry_interval</varname></link>. > + </para> > > It seems a bit misleading. e.g. AFAIK the "The apply worker detects > this failure" is not true. IIUC, the apply worker simply finds some > sequences that still have INIT state, so really it has no knowledge of > failure at all, right? > > Consider rewording this part. > > SUGGESTION > The sequence synchronization worker validates that sequence > definitions match between publisher and subscriber. If mismatches > exist, the worker logs an error identifying them and exits. The apply > worker continues respawning the sequence synchronization worker until > synchronization succeeds. Modified > ~~~ > > Section 29.7.2. Refreshing Stale Sequences: > > 7. > + <para> > + Subscriber side sequence values may frequently become out of sync due to > + updates on the publisher. > + </para> > + <para> > + To verify, compare the sequence values between the publisher and > + subscriber, and if necessary, execute > + <link linkend="sql-altersubscription-params-refresh-sequences"> > + <command>ALTER SUBSCRIPTION ... REFRESH SEQUENCES</command></link>. > + </para> > > I didn't see why the wording "To verify" was needed here. Below is a > slightly simpler alternative for these paragraphs. > > SUGGESTION > Subscriber sequence values drift out of sync as the publisher advances > them. Compare values between publisher and subscriber, then run ALTER > SUBSCRIPTION ... REFRESH SEQUENCES to resynchronize if necessary. Modified > ~~~ > > Section 29.7.3. Examples. > > 8. GENERAL. Prompts in examples > > I think using prompts like "test_pub#" in the examples is frowned upon > because it makes cutting directly from the examples more difficult. > Similarly, the result of the commands is not shown. > > See other PG18 logical replication examples for why the current style > is.... e.g. more like this: > /* pub # */ CREATE TABLE t1(a int, b int, c text, PRIMARY KEY(a,c)); > /* pub # */ CREATE TABLE t2(d int, e int, f int, PRIMARY KEY(d)); Modified > ~~~ > > 9. > + Re-synchronize all the sequences on the subscriber using > + <link linkend="sql-altersubscription-params-refresh-sequences"> > + <command>ALTER SUBSCRIPTION ... REFRESH SEQUENCES</command></link>. > > SUGGESTION > Re-synchronize all sequences known to the subscriber using... Modified > ~~~ > > Section 29.9. Restrictions # > > 10. > + then this should typically not be a problem. If, however, some kind of > + switchover or failover to the subscriber database is intended, then the > + sequences would need to be updated to the latest values, either by > + executing <link linkend="sql-altersubscription-params-refresh-sequences"> > + <command>ALTER SUBSCRIPTION ... REFRESH SEQUENCES</command></link> > + or by copying the current data from the publisher (perhaps using > + <command>pg_dump</command>) or by determining a sufficiently high value > + from the tables themselves. > > IIUC the "ALTER SUBSCRIPTION ... REFRESH SEQUENCES" is only going to > resync the sequences that the subscriber already knew about. So, if > you really wanted to get the latest of everything won't the user need > to execute double-commands just in case there are some new sequences > at the publisher? > > e.g. > First, ALTER SUBSCRIPTION REFRESH PUBLICATION > Then, ALTER SUBSCRIPTION REFRESH SEQUENCES I felt we don't support DDL, so new one's created should be copied using pg_dump. I felt existing is ok. > ~~~ > > Section 29.13.2. Subscribers # > > 11. > - workers), plus some reserve for the table synchronization workers and > - parallel apply workers. > + workers), plus some reserve for the parallel apply workers, table > synchronization workers, and a sequence > + synchronization worker. > > I think this can be worded similar to the config.sgml > > SUGGESTION > ... plus some reserve for the parallel apply workers, and > table/sequence synchronization workers. Modified > ~~ > > 12. > <para> > <link linkend="guc-max-sync-workers-per-subscription"><varname>max_sync_workers_per_subscription</varname></link> > - controls the amount of parallelism of the initial data copy during the > - subscription initialization or when new tables are added. > + controls how many tables can be synchronized in parallel during > + subscription initialization or when new tables are added. One additional > + worker is also needed for sequence synchronization. > </para> > > Oh, perhaps this is the wording that should have been used in > config.sgml (review comment #3) to avoid implying about sequencesync > workers helping with parallelism. Used the sequence synchronization doc similar to here in #3 > ====== > doc/src/sgml/monitoring.sgml > > 13. > <para> > Type of the subscription worker process. Possible types are > - <literal>apply</literal>, <literal>parallel apply</literal>, and > - <literal>table synchronization</literal>. > + <literal>apply</literal>, <literal>parallel apply</literal>, > + <literal>table synchronization</literal>, and > + <literal>sequence synchronization</literal>. > </para></entry> > > This docs fragment probably belongs with the pg_stats patch, not here. This is ok here as we display this for running process, the other stats patch is mainly for errors. > ====== > doc/src/sgml/ref/create_subscription.sgml > > 14. > (see <xref linkend="sql-createtype"/> for more about send/receive > - functions). > + functions). This parameter is not applicable for sequences. > </para> > > In many places for this page the patch says "This parameter is not > applicable for sequences." > > IMO that is ambiguous. It is not clear if the parameter is silently > ignored, if it will give an error, or what? > > Maybe you can clarify by saying "is ignored" or "has no effect", > instead of "is not applicable" Modified The attached v20251102 version patch has the changes for the same. Regards, Vignesh
Attachment
pgsql-hackers by date: