Re: Logical Replication of sequences - Mailing list pgsql-hackers

From shveta malik
Subject Re: Logical Replication of sequences
Date
Msg-id CAJpy0uDOcjk-k2y1FVeBS0RQeCGDKbE14f7qES6TU+RUVW7eXA@mail.gmail.com
Whole thread Raw
In response to Re: Logical Replication of sequences  (Peter Smith <smithpb2250@gmail.com>)
Responses Re: Logical Replication of sequences
List pgsql-hackers
On Fri, Jul 4, 2025 at 3:53 PM vignesh C <vignesh21@gmail.com> wrote:
>
> On Tue, 1 Jul 2025 at 15:20, shveta malik <shveta.malik@gmail.com> wrote:
> > On Mon, Jun 30, 2025 at 3:21 PM Nisha Moond <nisha.moond412@gmail.com> wrote:
> > >
> > > Please find the attached v20250630 patch set addressing above comments
> > > and other comments in [1],[2],[3] and [4].
> >
> > Thanks for the patches. I am still in process of reviewing it but
> > please find few comments:
> >
> > 1)
> > + if (pset.sversion >= 180000)
> > + appendPQExpBuffer(&buf,
> > +   ",\n  puballsequences AS \"%s\"",
> > +   gettext_noop("All sequences"));
> >
> > The server version check throughout the patch can be modified to 19000
> > as a new branch is created now.
>
> Modified
>
> > 2)
> > + bool all_pub; /* Special publication for all tables,
> > + * sequecnes */
> >
> > a) Typo: sequecnes --> sequences
>
> Modified
>
> > b) It is not clear from the comment that when will it be true? Will it
> > be set when either of all-tables or all-sequences is given or does it
> > need both?
>
> Updated the comments
>
> > 3)
> > postgres=# create publication pub1 for all sequences WITH ( PUBLISH='delete');
> > CREATE PUBLICATION
> > postgres=# create publication pub2 for all tables, sequences WITH
> > (PUBLISH='update');
> > CREATE PUBLICATION
> >
> > For the first command, 'WITH ( publication_parameter..' is useless.
> > For the second command, it is applicable only for 'all tables'.
> >
> > a) I am not sure if we even allow WITH in the first command?
> > b) In the second command, even if we allow it, there should be some
> > sort of NOTICE informing that it is applicable only to 'TABLES'.
> >
> > Thoughts?
> >
> > Also we allowed altering publication_parameter for all-sequences publication:
> >
> > postgres=# alter publication pub1 set (publish='insert,update');
> > ALTER PUBLICATION
> >
> > c) Should this be restricted as well? Thoughts?
>
> We have documented that the parameter of with clause is not applicable
> for sequences. I feel that all the above statements are ok with the
> documentation mentioned.
>
> Regarding the comments from [1].
> > 5)
> > In LogicalRepSyncSequences, why are we allocating it in a permanent
> > memory context?
> >
> > + /* Allocate the tracking info in a permanent memory context. */
> > + oldctx = MemoryContextSwitchTo(CacheMemoryContext);
> > + foreach_ptr(SubscriptionRelState, seq_state, sequences)
> > + {
> > + SubscriptionRelState *rstate = palloc(sizeof(SubscriptionRelState));
> > +
> > + memcpy(rstate, seq_state, sizeof(SubscriptionRelState));
> > + sequences_not_synced = lappend(sequences_not_synced, rstate);
> > + }
> > + MemoryContextSwitchTo(oldctx);
> >
> > Same for 'seq_info' allocation.
>
> When we are in between a transaction we will be using
> TopTransactionContext. We can palloc() in TopTransactionContext and
> safely use that memory throughout the transaction. But we cannot
> cannot access memory allocated in TopTransactionContext after
> CommitTransaction() finishes, because TopTransactionContext is
> explicitly reset (or deleted) at the end of the transaction.
> This is the reason we have to use CacheMemoryContext here.
>

Okay. I see. Thanks for the details. Can we add this info in comments,
something like:
Allocate the tracking information in a permanent memory context to
ensure it remains accessible across multiple transactions during the
sequence copy process. The memory will be released once the copy is
finished.

> The rest of the comments are fixed.

Thank you for the patches. I am not done with review yet, but please
find the comments so far:


1)
LogicalRepSyncSequences()

+ /* Allocate the sequences information in a permanent memory context. */
+ oldctx = MemoryContextSwitchTo(CacheMemoryContext);
+
+ /* Get the sequences that should be synchronized. */
+ subsequences = GetSubscriptionRelations(subid, false, true, true);

Here too we are using CacheMemoryContext? 'subsequences' is only used
in given function and not in copy_sequeneces. I guess, we start
multiple transactions in given function and thus make it essential to
allocate subsequences in CacheMemoryContext. But why are we doing
StartTransactionCommand twice? Is this intentional? Can we do it once
before GetSubscriptionRelations() and commit it once after for-loop is
over? IIUC, then we will not need CacheMemoryContext here.

2)
logicalrep_seqsyncworker_set_failuretime()

a) This function is extern'ed in worker_internal.h. But I do not see
its usage outside launcher.c. Is it a mistake?

b) Also, do we really need logicalrep_seqsyncworker_set_failuretime?
Is it better to move its logic instead in
logicalrep_seqsyncworker_set_failuretime()?

3)
SyncFetchRelationStates:
Earlier the name was FetchTableStates. If we really want to  use the
'Sync' keyword, we can name it FetchRelationSyncStates, as we are
fetching sync-status only. Thoughts?

4)
ProcessSyncingSequencesForApply():
+
+ if (rstate->state != SUBREL_STATE_INIT)
+ continue;

Why do we expect that rstate->state is not INIT at this time when we
have fetched only INIT sequences in sequence_states_not_ready. Shall
this be assert? If there is a valid scenario where it can be READY
here, please add comments.

5)
+ if (!MyLogicalRepWorker->sequencesync_failure_time ||
+ TimestampDifferenceExceeds(MyLogicalRepWorker->sequencesync_failure_time,
+    now, wal_retrieve_retry_interval))
+ {
+ MyLogicalRepWorker->sequencesync_failure_time = 0;
+
+ logicalrep_worker_launch(WORKERTYPE_SEQUENCESYNC,
+ MyLogicalRepWorker->dbid,
+ MySubscription->oid,
+ MySubscription->name,
+ MyLogicalRepWorker->userid,
+ InvalidOid,
+ DSM_HANDLE_INVALID);
+ break;
+ }

We set sequencesync_failure_time to 0, but if logicalrep_worker_launch
is not able to launch the worker due to some reason, next time it will
not even wait for 'wal_retrieve_retry_interval time' to attempt
restarting it again. Is that intentional?

In other workflows such as while launching table-sync or apply worker,
 this scenario does not arise. This is because we maintain start_time
there which can never be 0 instead of failure time and before
attempting to start the workers, we set start_time to current time.
The seq-sync failure-time OTOH is only set to non-null in
logicalrep_seqsyncworker_failure() and it is not necessary that we
will hit that function as the  logicalrep_worker_launch() may fail
before that itself. Do you think we shall maintain start-time instead
of failure-time for seq-sync worker as well? Or is there any other way
to handle it?

thanks
Shveta



pgsql-hackers by date:

Previous
From: Nazir Bilal Yavuz
Date:
Subject: Re: meson vs. llvm bitcode files
Next
From: Yura Sokolov
Date:
Subject: Re: Add 64-bit XIDs into PostgreSQL 15