Re: Logical Replication of sequences - Mailing list pgsql-hackers

From Peter Smith
Subject Re: Logical Replication of sequences
Date
Msg-id CAHut+PvD4jUsgSkabE9UtKcpvXbLFUxktaWnz8KAKAry1tj2-Q@mail.gmail.com
Whole thread Raw
In response to Re: Logical Replication of sequences  (vignesh C <vignesh21@gmail.com>)
Responses Re: Logical Replication of sequences
List pgsql-hackers
Here are some review comments for the patch v20240805_2-0003.

======
doc/src/sgml/catalogs.sgml

nitpick - removed the word "either"

======
doc/src/sgml/ref/alter_subscription.sgml

I felt the discussions about "how to handle warnings" are a bit scattered:
e.g.1 - ALTER SUBSCRIPTION REFRESH PUBLICATION copy data referred to
CREATE SUBSCRIPTION copy data.
e.g.2 - ALTER SUBSCRIPTION REFRESH explains what to do, but now the
explanation is in 2 places.
e.g.3 - CREATE SUBSCRIPTION copy data explains what to do (again), but
IMO it belongs better in the common "Notes" part

FYI, I've moved all the information to one place (in the CREATE
SUBSCRIPTION "Notes") and others refer to this central place. See the
attached nitpicks diff.

REFRESH PUBLICATION copy_data
nitpick - now refers to  CREATE SUBSCRIPTION "Notes". I also moved it
to be nearer to the other sequence stuff.

REFRESH PUBLICATION SEQUENCES:
nitpick - now refers to CREATE SUBSCRIPTION "Notes".

======
doc/src/sgml/ref/create_subscription.sgml

REFRESH PUBLICATION copy_data
nitpick - now refers to CREATE SUBSCRIPTION "Notes"

Notes:
nitpick - the explanation of, and what to do about sequence WARNINGS,
is moved to here

======
src/backend/commands/sequence.c

pg_sequence_state:
nitpick - I just moved the comment in pg_sequence_state() to below the
NOTE, which talks about "page LSN".

======
src/backend/catalog/pg_subscription.c

1. HasSubscriptionRelations

Should function 'HasSubscriptionRelations' be renamed to
'HasSubscriptionTables'?

~~~

GetSubscriptionRelations:
nitpick - tweak some "skip" comments.

======
src/backend/commands/subscriptioncmds.c

2. CreateSubscription

  tables = fetch_table_list(wrconn, publications);
- foreach(lc, tables)
+ foreach_ptr(RangeVar, rv, tables)
+ {
+ Oid relid;
+
+ relid = RangeVarGetRelid(rv, AccessShareLock, false);
+
+ /* Check for supported relkind. */
+ CheckSubscriptionRelkind(get_rel_relkind(relid),
+ rv->schemaname, rv->relname);
+
+ AddSubscriptionRelState(subid, relid, table_state,
+ InvalidXLogRecPtr, true);
+ }
+
+ /* Add the sequences in init state */
+ sequences = fetch_sequence_list(wrconn, publications);
+ foreach_ptr(RangeVar, rv, sequences)

These 2 loops (first for tables and then for sequences) seem to be
executing the same code. If you wanted you could combine the lists
up-front, and then have one code loop instead of 2. It would mean less
code. OTOH, maybe the current code is more readable? I am not sure
what is best, so just bringing this to your attention.

~~~

AlterSubscription_refresh:
nitpick = typo /indicating tha/indicating that/

~~~

3. fetch_sequence_list

+ appendStringInfoString(&cmd, "SELECT DISTINCT n.nspname, c.relname,
s.seqtypid, s.seqmin, s.seqmax, s.seqstart, s.seqincrement,
s.seqcycle"
+    " FROM pg_publication p, LATERAL
pg_get_publication_sequences(p.pubname::text) gps(relid),"
+    " pg_class c JOIN pg_namespace n ON n.oid = c.relnamespace JOIN
pg_sequence s ON c.oid = s.seqrelid"
+    " WHERE c.oid = gps.relid AND p.pubname IN (");
+ get_publications_str(publications, &cmd, true);
+ appendStringInfoChar(&cmd, ')');

Please wrap this better to make the SQL more readable.

~~

4.
+ if (seqform->seqtypid != seqtypid || seqform->seqmin != seqmin ||
+ seqform->seqmax != seqmax || seqform->seqstart != seqstart ||
+ seqform->seqincrement != seqincrement ||
+ seqform->seqcycle != seqcycle)
+ ereport(WARNING,
+ errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("Sequence option in remote and local is not same for \"%s.%s\"",
+    get_namespace_name(get_rel_namespace(relid)), get_rel_name(relid)),
+ errhint("Alter/Re-create the sequence using the same options as in remote."));

4a.
Are these really known as "options"? Or should they be called
"sequence parameters", or something else, like "sequence attributes"?

4a.
Is there a way to give more helpful information by identifying what
was different in the log? OTOH, maybe it would become too messy if
there were multiple differences...

======
src/backend/replication/logical/launcher.c

5. logicalrep_sync_worker_count

- if (isTablesyncWorker(w) && w->subid == subid)
+ if ((isTableSyncWorker(w) || isSequenceSyncWorker(w)) &&
+ w->subid == subid)

You could micro-optimize this -- it may be more efficient to write the
condition the other way around.

SUGGESTION
if (w->subid == subid && (isTableSyncWorker(w) || isSequenceSyncWorker(w)))

======
.../replication/logical/sequencesync.c

File header comment:
nitpick - there seems a large cut/paste mistake (the first 2
paragraphs are almost the same).
nitpick - reworded with the help of Chat-GPT for slightly better
clarity. Also fixed a couple of typos.
nitpick - it mentioned MAX_SEQUENCES_SYNC_PER_BATCH several times so I
changed the wording of one of them

~~~

fetch_remote_sequence_data:
nitpick - all other params have the same name as sequence members, so
change the parameter name /lsn/page_lsn/

~

copy_sequence:
nitpick - rename var /seq_lsn/seq_page_lsn/

======
src/backend/replication/logical/tablesync.c

6. process_syncing_sequences_for_apply

+ * If a sequencesync worker is running already, there is no need to start a new
+ * one; the existing sequencesync worker will synchronize all the sequences. If
+ * there are still any sequences to be synced after the sequencesync worker
+ * exited, then a new sequencesync worker can be started in the next iteration.
+ * To prevent starting the sequencesync worker at a high frequency after a
+ * failure, we store its last failure time. We start the sync worker for the
+ * same relation after waiting at least wal_retrieve_retry_interval.

Why is it talking about "We start the sync worker for the same
relation ...". The sequencesync_failuretime is per sync worker, not
per relation. And, I don't see any 'same relation' check in the code.

======
src/include/catalog/pg_subscription_rel.h

GetSubscriptionRelations:
nitpick - changed parameter name /all_relations/all_states/

======
src/test/subscription/t/034_sequences.pl

nitpick - add some ########## comments to highlight the main test
parts to make it easier to read.
nitpick - fix typo /syned/synced/

7. More test cases?
IIUC you can also get a sequence mismatch warning during "ALTER ...
REFRESH PUBLICATION", and "CREATE SUBSCRIPTION". So, should those be
tested also?

======
Kind Regards,
Peter Smith.
Fujitsu Australia

Attachment

pgsql-hackers by date:

Previous
From: Bertrand Drouvot
Date:
Subject: Re: Restart pg_usleep when interrupted
Next
From: Amul Sul
Date:
Subject: Re: Support specify tablespace for each merged/split partition