Re: Logical Replication of sequences - Mailing list pgsql-hackers
| From | Peter Smith | 
|---|---|
| Subject | Re: Logical Replication of sequences | 
| Date | |
| Msg-id | CAHut+PuPbOCOCbeCnTPM_YM3ueF2w_j1ZYBJa9aJzjEjug7w=g@mail.gmail.com Whole thread Raw | 
| In response to | Re: Logical Replication of sequences (vignesh C <vignesh21@gmail.com>) | 
| Responses | Re: Logical Replication of sequences | 
| List | pgsql-hackers | 
Hi Vignesh,
Some review comments for v20251030-0001
======
Commit message
1.
3) ALTER SUBSCRIPTION ... REFRESH SEQUENCES
    - (A new command introduced in PG19 by a prior patch.)
    - All sequences in pg_subscription_rel are reset to DATASYNC state.
~
DATASYNC? Just above, it says the only possible states for sequences
are INIT and READY.
======
.../replication/logical/sequencesync.c
2.
+ * A single sequencesync worker is responsible for synchronizing all sequences
+ * in INIT state in pg_subscription_rel. It begins by retrieving the list of
+ * sequences flagged for synchronization. These sequences are then processed
+ * in batches, allowing multiple entries to be synchronized within a single
+ * transaction. The worker fetches the current sequence values and page LSNs
+ * from the remote publisher, updates the corresponding sequences on the local
+ * subscriber, and finally marks each sequence as READY upon successful
+ * synchronization.
+ *
Those first 2 sentences seem repetitive because AFAIK "in INIT state"
and "flagged for synchronization" are exactly the same thing.
SUGGESTION
A single sequencesync worker is responsible for synchronizing all
sequences. It begins by retrieving the list of sequences that are
flagged for needing synchronization (e.g. those with INIT state).
~~~
get_and_validate_seq_info:
3.
+/*
+ * get_and_validate_seq_info
+ *
+ * Extracts remote sequence information from the tuple slot received from the
+ * publisher and validates it against the corresponding local sequence
+ * definition.
+ */
Missing comma.
/publisher and validates/publisher, and validates/
~~~
4.
Now that this function is in 2 parts, I think each part should be
clearly identified with comments, something like:
PART 1:
/*
 * 1. Extract sequence information from the tuple slot received from the
 * publisher
 */
PART 2:
/*
 * 2. Compare the remote sequence definition to the local sequence definition,
 * and report any discrepancies.
 */
~~~
5.
+ seqinfo_local = (LogicalRepSequenceInfo *) list_nth(seqinfos, seqidx);
+ seqinfo_local->found_on_pub = true;
+ *seqinfo = seqinfo_local;
Is this separate `seqinfo_local` variable needed? It seems always
unconditionally assigned to the parameter, so you might as well just
do without the extra variable. Maybe just rename the parameter as
`seqinfo_local`?
~~~
6.
+ if (!*sequence_rel) /* Sequence was concurrently dropped? */
+ return COPYSEQ_SKIPPED;
+
+ tup = SearchSysCache1(SEQRELID, ObjectIdGetDatum(seqinfo_local->localrelid));
+ if (!HeapTupleIsValid(tup)) /* Sequence was concurrently dropped? */
+ return COPYSEQ_SKIPPED;
Nit. IMO the code was easier to read in the previous patch version,
when it was commented above the code like:
/* Sequence was concurrently dropped */
if (!*sequence_rel)
  return COPYSEQ_SKIPPED;
/* Sequence was concurrently dropped */
tup = SearchSysCache1(SEQRELID, ObjectIdGetDatum(seqinfo->localrelid));
if (!HeapTupleIsValid(tup))
  return COPYSEQ_SKIPPED;
~~~
copy_sequence:
7.
+static CopySeqResult
+copy_sequence(LogicalRepSequenceInfo *seqinfo, int64 last_value,
+   bool is_called, XLogRecPtr page_lsn, Oid seqowner)
+{
+ UserContext ucxt;
+ AclResult aclresult;
+ bool run_as_owner = MySubscription->runasowner;
+ Oid seqoid = seqinfo->localrelid;
+
+ /*
+ * Make sure that the sequence is copied as sequence owner, unless the
+ * user has opted out of that behaviour.
+ */
+ if (!run_as_owner)
+ SwitchToUntrustedUser(seqowner, &ucxt);
The code/comment seems contradictory because IMO it is vague what
runasowner member means -- which owner? AFAIK `run_as_owner` really
means "run as *subscription* owner". Ideally, the
MySubscription->runasowner member might be renamed to 'runassubowner'
but maybe that is a bigger change than you want to make for this
patch.
So, maybe just the comment can be rewritten for more clarity.
SUGGESTION:
If the user did not opt to run as the owner of the subscription
('run_as_owner'), then copy the sequence as the owner of the sequence.
(Also, make the similar comment change for the equivalent place in tablesync.c).
~~~
copy_sequences:
8.
+ ereport(LOG,
+ errmsg("logical replication sequence synchronization for
subscription \"%s\" - total unsynchronized: %d",
+    MySubscription->name, list_length(seqinfos)));
+
+ while (cur_batch_base_index < list_length(seqinfos))
Would it be tidier to declare int n_seqinfos = list_length(seqinfos);
instead of using list_length() multiple times in this function.
~~~
9.
+ * We deliberately avoid acquiring a local lock on the sequence before
+ * querying the publisher to prevent potential distributed deadlocks
+ * in bi-directional replication setups. For instance, a concurrent
+ * ALTER SEQUENCE on one node might block this worker, while the
+ * worker's own local lock simultaneously blocks a similar operation
+ * on the other nod resulting in a circular wait that spans both nodes
+ * and remains undetected.
typo: /nod/node/
~~~
10.
+ int64 last_value;
+ bool is_called;
+ XLogRecPtr page_lsn;
+ CopySeqResult sync_status;
+ LogicalRepSequenceInfo *seqinfo;
+
+ CHECK_FOR_INTERRUPTS();
+
+ if (ConfigReloadPending)
+ {
+ ConfigReloadPending = false;
+ ProcessConfigFile(PGC_SIGHUP);
+ }
+
+ sync_status = get_and_validate_seq_info(slot, &sequence_rel,
+ &seqinfo, &page_lsn,
+ &last_value, &is_called);
+ if (sync_status == COPYSEQ_SUCCESS)
+ sync_status = copy_sequence(seqinfo, last_value, is_called,
+ page_lsn,
+ sequence_rel->rd_rel->relowner);
Why does LogicalRepSequenceInfo only have to be metadata? Can't those
`page_lsn`, `last_value`, and `is_called` also be made members of
LogicalRepSequenceInfo, so you don't have to declare them and pass
them all in and out as parameters here?
======
src/test/subscription/t/036_sequences.pl
11.
+$node_publisher->safe_psql(
+ 'postgres', qq(
+ DROP SEQUENCE regress_s4;
+));
+
+# Verify that an error is logged for the missing sequence ('regress_s4').
+$node_subscriber->wait_for_log(
+ qr/ERROR: ( [A-Z0-9]+:)? logical replication sequence
synchronization failed for subscription "regress_seq_sub"\n.*DETAIL:.*
Missing sequence\(s\) on publisher: \("public.regress_s4"\)/,
+ $log_offset);
I felt that psql DROP code belongs below the comment too.
Alternatively, add another comment for that DROP, like:
# Drop the sequence ('regress_s4') in preparation for the next test
~~~
12.
There is still a yet-to-be-implemented test combination as previously
reported [1, comment #3], right?
======
[1] https://www.postgresql.org/message-id/CAHut%2BPu%2BwTfWwCUUUL%2BcqsHFZ-ptY6CWe8FYM3by901NvLArCQ%40mail.gmail.com
Kind Regards,
Peter Smith.
Fujitsu Australia
		
	pgsql-hackers by date: