Re: [PATCH] Support automatic sequence replication - Mailing list pgsql-hackers

From Chao Li
Subject Re: [PATCH] Support automatic sequence replication
Date
Msg-id 02EDB3D2-4E5A-4EDE-BADF-3DF62D707831@gmail.com
Whole thread Raw
In response to RE: [PATCH] Support automatic sequence replication  ("Zhijie Hou (Fujitsu)" <houzj.fnst@fujitsu.com>)
List pgsql-hackers

> On Mar 5, 2026, at 19:34, Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> wrote:
>
> On Thursday, March 5, 2026 6:47 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>> On Thu, Mar 5, 2026 at 9:35 AM shveta malik <shveta.malik@gmail.com>
>> wrote:
>>>
>>>
>>> 3)
>>> + /* Sleep for the configured interval */
>>> + (void) WaitLatch(MyLatch,
>>> + WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH, sleep_ms,
>>> + WAIT_EVENT_LOGICAL_SYNC_STATE_CHANGE);
>>>
>>> I don't think this wait-event is appropriate. Unlike tablesync, we are
>>> not waiting for any state change here. Shall we add a new one for our
>>> case? How about WAIT_EVENT_LOGICAL_SEQSYNC_MAIN? Thoughts?
>>>
>>
>> +1 for a new wait event. Few other minor comments:
>
> Added.
>
>>
>> 1.
>> + * Check if the subscription includes sequences and start a
>> + sequencesync
>> + * worker if one is not already running. The active sequencesync worker
>> + will
>> + * handle all pending sequence synchronization. If any sequences remain
>> + * unsynchronized after it exits, a new worker can be started in the
>> + next
>> + * iteration.
>>  *
>> - * Start a sequencesync worker if one is not already running. The active
>> - * sequencesync worker will handle all pending sequence synchronization. If
>> any
>> - * sequences remain unsynchronized after it exits, a new worker can be
>> started
>> - * in the next iteration.
>>
>> Why did this comment change? The earlier one sounds okay to me.
>
> I think either version is fine, so reverted this change now.
>
>>
>> 2.
>>  break;
>> +
>>  case COPYSEQ_INSUFFICIENT_PERM:
>>
>> Why does this patch add additional new lines? We use both styles (existing
>> and what this patch does) in the code but it seems unnecessary to change for
>> this patch.
>
> Removed.
>
>>
>> 3.
>> - ProcessSequencesForSync();
>> +
>> + /* Check if sequence worker needs to be started */
>> + MaybeLaunchSequenceSyncWorker();
>>
>> No need for an additional line and a comment here.
>
> Removed.
>
> Here is the V11 patch which addressed all above comments and [1][2].
>
> [1] https://www.postgresql.org/message-id/CAJpy0uAfu-VPqCknLLvJ%2BPUx_cyoR-b70xUNT6Pyv8N-odKizQ%40mail.gmail.com
> [2] https://www.postgresql.org/message-id/CAJpy0uBeAdz6-3P26Eryeq3TyjA-GiKY3z0hFMxzZD%3DAYGqQ3Q%40mail.gmail.com
>
> Best Regards,
> Hou zj
>
<v11-0002-Synchronize-sequences-directly-in-REFRESH-SEQUEN.patch><v11-0001-Support-automatic-sequence-replication.patch>

Hi Zhijie,

Thanks for your clarification in the other email. I have reviewed v11 and here comes my comments:

1 - 0001
```
 static CopySeqResult
-copy_sequence(LogicalRepSequenceInfo *seqinfo, Oid seqowner)
+validate_seqsync_state(LogicalRepSequenceInfo *seqinfo, Relation sequence_rel)
 {
-    UserContext ucxt;
     AclResult    aclresult;
+    Oid            seqoid = seqinfo->localrelid;
+
+    /* Perform drift check if it's not the initial sync */
+    if (seqinfo->relstate == SUBREL_STATE_READY)
+    {
+        int64        local_last_value;
+        bool        local_is_called;
+
+        /*
+         * Skip synchronization if the current user does not have sufficient
+         * privileges to read the sequence data.
+         */
+        if (!GetSequence(sequence_rel, &local_last_value, &local_is_called))
+            return COPYSEQ_INSUFFICIENT_PERM;
+
+        /*
+         * Skip synchronization if the sequence has not drifted from the
+         * publisher's value.
+         */
+        if (local_last_value == seqinfo->last_value &&
+            local_is_called == seqinfo->is_called)
+            return COPYSEQ_NO_DRIFT;
+    }
+
+    /* Verify that the current user can update the sequence */
+    aclresult = pg_class_aclcheck(seqoid, GetUserId(), ACL_UPDATE);
+
+    if (aclresult != ACLCHECK_OK)
+        return COPYSEQ_INSUFFICIENT_PERM;
+
+    return COPYSEQ_ALLOWED;
+}
```

I think we can move pg_class_aclcheck() to before the drift check, because it’s a local check, should be cheaper than a
remotecheck. If the local check fails, we don’t need to touch remote. 

2 - 0001
```
+static CopySeqResult
+copy_sequence(LogicalRepSequenceInfo *seqinfo, Relation sequence_rel)
+{
+    UserContext ucxt;
     bool        run_as_owner = MySubscription->runasowner;
     Oid            seqoid = seqinfo->localrelid;
+    CopySeqResult result;
+    XLogRecPtr    local_page_lsn;
+
+    (void) GetSubscriptionRelState(MySubscription->oid,
+                                   RelationGetRelid(sequence_rel),
+                                   &local_page_lsn);
```

I think GetSubscriptionRelState is called to get local_page_lsn. But local_page_lsn is only used very late in this
function,and there is an early return branch, so can we move this call to right before where local_page_lsn is used? 

3 - 0001
```
     /*
      * Record the remote sequence's LSN in pg_subscription_rel and mark the
-     * sequence as READY.
+     * sequence as READY if updating a sequence that is in INIT state.
      */
-    UpdateSubscriptionRelState(MySubscription->oid, seqoid, SUBREL_STATE_READY,
-                               seqinfo->page_lsn, false);
+    if (seqinfo->relstate == SUBREL_STATE_INIT ||
+        seqinfo->page_lsn != local_page_lsn)
+        UpdateSubscriptionRelState(MySubscription->oid, seqoid, SUBREL_STATE_READY,
+                                   seqinfo->page_lsn, false);
```

In the comment, I think you don’t have to add “if updating .. that is in INIT state”, but if you do, then you should
alsomention the lsn condition. 

4 - 0001
```
+            else
+            {
+                /*
+                 * Double the sleep time, but not beyond the maximum allowable
+                 * value.
+                 */
+                sleep_ms = Min(sleep_ms * 2, SEQSYNC_MAX_SLEEP_MS);
+            }
```

Double the sleep time when no drift is an optimization. But looks like the doubling happens only when all sequences
haveno drift. Say, there are 1000 sequences, and only one is hot, then the it will still fetch the 1000 sequences from
remoteevery 2 seconds, making the optimization much less efficient. I think the worker can wake up every 2 seconds, but
nextfetch time should be per sequence. 

5 - 0001
```
+ * If the state of sequence is SUBREL_STATE_READY, only synchronize sequences
```

“The state of sequence is SUBREL_STATE_READY” is inaccurate, I think it should be “the state of sequence sync is
SUBREL_STATE_READY”.

6 - 0001

start_sequence_sync runs an infinite loop to periodically sync sequences. I don’t it has an auto reconnect mechanism.
Whensomething wrong happens, the sync worker will exit, how can the worker  

7 - 0001

Say a major version upgrade use case that uses logical replication. Before shutdown the publication side (old version),
ifthere are 1000 sequences, how can a user decide if all sequences have been synced? From this perspective, would it
makesense to log a INFO message when all sequences have no drift? If next round still no drift, then don’t repeat the
message.In other words, when the states between (all no drift) and (any drift) switch, log a INFO message. 

8 - 0001
```
+bool
+GetSequence(Relation seqrel, int64 *last_value, bool *is_called)
```

This function name feels too general. How about ReadSequenceState or GetSequenceLastValueAndIsCalled?

9 - 0001
```
+                    elog(ERROR, "unrecognized Sequence replication result: %d", (int) sync_status);
```
Nit: The “S” seems not have to be capital.

10 - 0002
```
+                    ? errdetail("The local last_value %lld is ahead of the one on publisher",
+                                (long long int) local_last_value)
+                    : errdetail("The local last_value %lld is behind of the one on publisher",
+                                (long long int) local_last_value));
```

Per https://www.postgresql.org/docs/18/error-style-guide.html, detail message should ends with a period:

Detail and hint messages: Use complete sentences, and end each with a period. Capitalize the first word of sentences.
Puttwo spaces after the period if another sentence follows (for English text; might be inappropriate in other
languages).

11 - 0002 - same code block as 10

If we give up the sync, does that mean it’s a safe situation for upgrade? If yes, does it make sense to add a hint to
saysomething like “Now it’s safe for upgrade”, or whatever else guide to users. 

12 - 0002
```
+void
+AlterSubSyncSequences(WalReceiverConn *conn, Oid subid, char *subname,
+                      bool runasowner)
+{
+    /*
+     * Init the SequenceSyncContext which we clean up after the sequence
+     * synchronization.
+     */
+    SequenceSyncContext = AllocSetContextCreate(CurrentMemoryContext,
+                                                "SequenceSyncContext",
+                                                ALLOCSET_DEFAULT_SIZES);
+
+    PG_TRY();
+    {
+        MemoryContext    oldctx;
+
+        oldctx = MemoryContextSwitchTo(SequenceSyncContext);
+
+        LogicalRepSyncSequences(conn, subid, subname, runasowner);
```

I think we can take the return value of LogicalRepSyncSequences and log an INFO to tell user the current status.

13 - 0002
```
+# Verify there is no logical replication apply worker running
+$result = $node_subscriber->safe_psql(
+    'postgres',
+    "SELECT count(*) FROM pg_stat_activity WHERE backend_type = 'logical replication apply worker'");
+
+is($result, '0', 'no logical replication worker is running’);
```

Looks like a mismatch. The test checks “apply worker” but the error message mentions “logical replication worker”.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/







pgsql-hackers by date:

Previous
From: jian he
Date:
Subject: Re: finish TODOs in to_json_is_immutable, to_jsonb_is_immutable also add tests on it
Next
From: shveta malik
Date:
Subject: Re: [PATCH] Add max_logical_replication_slots GUC