Re: Logical Replication of sequences - Mailing list pgsql-hackers

From Peter Smith
Subject Re: Logical Replication of sequences
Date
Msg-id CAHut+PuPbOCOCbeCnTPM_YM3ueF2w_j1ZYBJa9aJzjEjug7w=g@mail.gmail.com
Whole thread Raw
In response to Re: Logical Replication of sequences  (vignesh C <vignesh21@gmail.com>)
Responses Re: Logical Replication of sequences
List pgsql-hackers
Hi Vignesh,

Some review comments for v20251030-0001

======
Commit message

1.
3) ALTER SUBSCRIPTION ... REFRESH SEQUENCES
    - (A new command introduced in PG19 by a prior patch.)
    - All sequences in pg_subscription_rel are reset to DATASYNC state.
~

DATASYNC? Just above, it says the only possible states for sequences
are INIT and READY.

======
.../replication/logical/sequencesync.c

2.
+ * A single sequencesync worker is responsible for synchronizing all sequences
+ * in INIT state in pg_subscription_rel. It begins by retrieving the list of
+ * sequences flagged for synchronization. These sequences are then processed
+ * in batches, allowing multiple entries to be synchronized within a single
+ * transaction. The worker fetches the current sequence values and page LSNs
+ * from the remote publisher, updates the corresponding sequences on the local
+ * subscriber, and finally marks each sequence as READY upon successful
+ * synchronization.
+ *

Those first 2 sentences seem repetitive because AFAIK "in INIT state"
and "flagged for synchronization" are exactly the same thing.

SUGGESTION
A single sequencesync worker is responsible for synchronizing all
sequences. It begins by retrieving the list of sequences that are
flagged for needing synchronization (e.g. those with INIT state).

~~~

get_and_validate_seq_info:

3.
+/*
+ * get_and_validate_seq_info
+ *
+ * Extracts remote sequence information from the tuple slot received from the
+ * publisher and validates it against the corresponding local sequence
+ * definition.
+ */

Missing comma.

/publisher and validates/publisher, and validates/

~~~

4.
Now that this function is in 2 parts, I think each part should be
clearly identified with comments, something like:

PART 1:
/*
 * 1. Extract sequence information from the tuple slot received from the
 * publisher
 */

PART 2:
/*
 * 2. Compare the remote sequence definition to the local sequence definition,
 * and report any discrepancies.
 */

~~~

5.
+ seqinfo_local = (LogicalRepSequenceInfo *) list_nth(seqinfos, seqidx);
+ seqinfo_local->found_on_pub = true;
+ *seqinfo = seqinfo_local;

Is this separate `seqinfo_local` variable needed? It seems always
unconditionally assigned to the parameter, so you might as well just
do without the extra variable. Maybe just rename the parameter as
`seqinfo_local`?

~~~

6.
+ if (!*sequence_rel) /* Sequence was concurrently dropped? */
+ return COPYSEQ_SKIPPED;
+
+ tup = SearchSysCache1(SEQRELID, ObjectIdGetDatum(seqinfo_local->localrelid));
+ if (!HeapTupleIsValid(tup)) /* Sequence was concurrently dropped? */
+ return COPYSEQ_SKIPPED;

Nit. IMO the code was easier to read in the previous patch version,
when it was commented above the code like:

/* Sequence was concurrently dropped */
if (!*sequence_rel)
  return COPYSEQ_SKIPPED;

/* Sequence was concurrently dropped */
tup = SearchSysCache1(SEQRELID, ObjectIdGetDatum(seqinfo->localrelid));
if (!HeapTupleIsValid(tup))
  return COPYSEQ_SKIPPED;
~~~

copy_sequence:

7.
+static CopySeqResult
+copy_sequence(LogicalRepSequenceInfo *seqinfo, int64 last_value,
+   bool is_called, XLogRecPtr page_lsn, Oid seqowner)
+{
+ UserContext ucxt;
+ AclResult aclresult;
+ bool run_as_owner = MySubscription->runasowner;
+ Oid seqoid = seqinfo->localrelid;
+
+ /*
+ * Make sure that the sequence is copied as sequence owner, unless the
+ * user has opted out of that behaviour.
+ */
+ if (!run_as_owner)
+ SwitchToUntrustedUser(seqowner, &ucxt);

The code/comment seems contradictory because IMO it is vague what
runasowner member means -- which owner? AFAIK `run_as_owner` really
means "run as *subscription* owner". Ideally, the
MySubscription->runasowner member might be renamed to 'runassubowner'
but maybe that is a bigger change than you want to make for this
patch.

So, maybe just the comment can be rewritten for more clarity.

SUGGESTION:
If the user did not opt to run as the owner of the subscription
('run_as_owner'), then copy the sequence as the owner of the sequence.

(Also, make the similar comment change for the equivalent place in tablesync.c).

~~~

copy_sequences:

8.
+ ereport(LOG,
+ errmsg("logical replication sequence synchronization for
subscription \"%s\" - total unsynchronized: %d",
+    MySubscription->name, list_length(seqinfos)));
+
+ while (cur_batch_base_index < list_length(seqinfos))

Would it be tidier to declare int n_seqinfos = list_length(seqinfos);
instead of using list_length() multiple times in this function.

~~~

9.
+ * We deliberately avoid acquiring a local lock on the sequence before
+ * querying the publisher to prevent potential distributed deadlocks
+ * in bi-directional replication setups. For instance, a concurrent
+ * ALTER SEQUENCE on one node might block this worker, while the
+ * worker's own local lock simultaneously blocks a similar operation
+ * on the other nod resulting in a circular wait that spans both nodes
+ * and remains undetected.

typo: /nod/node/

~~~

10.
+ int64 last_value;
+ bool is_called;
+ XLogRecPtr page_lsn;
+ CopySeqResult sync_status;
+ LogicalRepSequenceInfo *seqinfo;
+
+ CHECK_FOR_INTERRUPTS();
+
+ if (ConfigReloadPending)
+ {
+ ConfigReloadPending = false;
+ ProcessConfigFile(PGC_SIGHUP);
+ }
+
+ sync_status = get_and_validate_seq_info(slot, &sequence_rel,
+ &seqinfo, &page_lsn,
+ &last_value, &is_called);
+ if (sync_status == COPYSEQ_SUCCESS)
+ sync_status = copy_sequence(seqinfo, last_value, is_called,
+ page_lsn,
+ sequence_rel->rd_rel->relowner);

Why does LogicalRepSequenceInfo only have to be metadata? Can't those
`page_lsn`, `last_value`, and `is_called` also be made members of
LogicalRepSequenceInfo, so you don't have to declare them and pass
them all in and out as parameters here?

======
src/test/subscription/t/036_sequences.pl

11.
+$node_publisher->safe_psql(
+ 'postgres', qq(
+ DROP SEQUENCE regress_s4;
+));
+
+# Verify that an error is logged for the missing sequence ('regress_s4').
+$node_subscriber->wait_for_log(
+ qr/ERROR: ( [A-Z0-9]+:)? logical replication sequence
synchronization failed for subscription "regress_seq_sub"\n.*DETAIL:.*
Missing sequence\(s\) on publisher: \("public.regress_s4"\)/,
+ $log_offset);

I felt that psql DROP code belongs below the comment too.
Alternatively, add another comment for that DROP, like:
# Drop the sequence ('regress_s4') in preparation for the next test

~~~

12.
There is still a yet-to-be-implemented test combination as previously
reported [1, comment #3], right?

======
[1] https://www.postgresql.org/message-id/CAHut%2BPu%2BwTfWwCUUUL%2BcqsHFZ-ptY6CWe8FYM3by901NvLArCQ%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia



pgsql-hackers by date:

Previous
From: Philip Alger
Date:
Subject: Re: [PATCH] Add pg_get_type_ddl() to retrieve the CREATE TYPE statement
Next
From: Shinya Kato
Date:
Subject: Re: Add wal_fpi_bytes_[un]compressed to pg_stat_wal