Re: Logical Replication of sequences - Mailing list pgsql-hackers

From vignesh C
Subject Re: Logical Replication of sequences
Date
Msg-id CALDaNm3oVgF=0sN6kaTDOkF-fJhE8+HjkXoa9isfZEfQBtifnQ@mail.gmail.com
Whole thread Raw
In response to Re: Logical Replication of sequences  (shveta malik <shveta.malik@gmail.com>)
List pgsql-hackers
On Mon, 7 Jul 2025 at 14:37, shveta malik <shveta.malik@gmail.com> wrote:
>
> > When we are in between a transaction we will be using
> > TopTransactionContext. We can palloc() in TopTransactionContext and
> > safely use that memory throughout the transaction. But we cannot
> > cannot access memory allocated in TopTransactionContext after
> > CommitTransaction() finishes, because TopTransactionContext is
> > explicitly reset (or deleted) at the end of the transaction.
> > This is the reason we have to use CacheMemoryContext here.
> >
>
> Okay. I see. Thanks for the details. Can we add this info in comments,
> something like:
> Allocate the tracking information in a permanent memory context to
> ensure it remains accessible across multiple transactions during the
> sequence copy process. The memory will be released once the copy is
> finished.

I felt the existing is ok as it is mentioned similarly in
FetchRelationStates too. I don't want to keep it different in
different places.

> 1)
> LogicalRepSyncSequences()
>
> + /* Allocate the sequences information in a permanent memory context. */
> + oldctx = MemoryContextSwitchTo(CacheMemoryContext);
> +
> + /* Get the sequences that should be synchronized. */
> + subsequences = GetSubscriptionRelations(subid, false, true, true);
>
> Here too we are using CacheMemoryContext? 'subsequences' is only used
> in given function and not in copy_sequeneces. I guess, we start
> multiple transactions in given function and thus make it essential to
> allocate subsequences in CacheMemoryContext. But why are we doing
> StartTransactionCommand twice? Is this intentional? Can we do it once
> before GetSubscriptionRelations() and commit it once after for-loop is
> over? IIUC, then we will not need CacheMemoryContext here.

Modified

> 2)
> logicalrep_seqsyncworker_set_failuretime()
>
> a) This function is extern'ed in worker_internal.h. But I do not see
> its usage outside launcher.c. Is it a mistake?

This is removed now as part of the next comment fix

> b) Also, do we really need logicalrep_seqsyncworker_set_failuretime?
> Is it better to move its logic instead in
> logicalrep_seqsyncworker_set_failuretime()?

Modified

> 3)
> SyncFetchRelationStates:
> Earlier the name was FetchTableStates. If we really want to  use the
> 'Sync' keyword, we can name it FetchRelationSyncStates, as we are
> fetching sync-status only. Thoughts?

Instead of FetchRelationSyncStates, I preferred FetchRelationStates,
and changed it to FetchRelationStates.

> 4)
> ProcessSyncingSequencesForApply():
> +
> + if (rstate->state != SUBREL_STATE_INIT)
> + continue;
>
> Why do we expect that rstate->state is not INIT at this time when we
> have fetched only INIT sequences in sequence_states_not_ready. Shall
> this be assert? If there is a valid scenario where it can be READY
> here, please add comments.

Modified to Assert

> 5)
> + if (!MyLogicalRepWorker->sequencesync_failure_time ||
> + TimestampDifferenceExceeds(MyLogicalRepWorker->sequencesync_failure_time,
> +    now, wal_retrieve_retry_interval))
> + {
> + MyLogicalRepWorker->sequencesync_failure_time = 0;
> +
> + logicalrep_worker_launch(WORKERTYPE_SEQUENCESYNC,
> + MyLogicalRepWorker->dbid,
> + MySubscription->oid,
> + MySubscription->name,
> + MyLogicalRepWorker->userid,
> + InvalidOid,
> + DSM_HANDLE_INVALID);
> + break;
> + }
>
> We set sequencesync_failure_time to 0, but if logicalrep_worker_launch
> is not able to launch the worker due to some reason, next time it will
> not even wait for 'wal_retrieve_retry_interval time' to attempt
> restarting it again. Is that intentional?
>
> In other workflows such as while launching table-sync or apply worker,
>  this scenario does not arise. This is because we maintain start_time
> there which can never be 0 instead of failure time and before
> attempting to start the workers, we set start_time to current time.
> The seq-sync failure-time OTOH is only set to non-null in
> logicalrep_seqsyncworker_failure() and it is not necessary that we
> will hit that function as the  logicalrep_worker_launch() may fail
> before that itself. Do you think we shall maintain start-time instead
> of failure-time for seq-sync worker as well? Or is there any other way
> to handle it?

I preferred the suggestion from [1]. Modified it accordingly.

The attached v20250709 version patch has the changes for the same.

[1] -
https://www.postgresql.org/message-id/CAJpy0uA6ugJN%2BtBwv38mG%2B6vf-uMuQoqxaZCX-mg1qBWX%2B%3DBkw%40mail.gmail.com

Regards,
Vignesh

Attachment

pgsql-hackers by date:

Previous
From: Jakub Wartak
Date:
Subject: Re: Adding basic NUMA awareness
Next
From: Fujii Masao
Date:
Subject: Re: Tab completion for large objects