Re: Logical Replication of sequences - Mailing list pgsql-hackers

From vignesh C
Subject Re: Logical Replication of sequences
Date
Msg-id CALDaNm2z1R8nCaX4TL9W_hRH2sWX8LcP8++i4BAbhgUNEY8k1w@mail.gmail.com
Whole thread Raw
In response to Re: Logical Replication of sequences  (Dilip Kumar <dilipbalaut@gmail.com>)
Responses Re: Logical Replication of sequences
List pgsql-hackers
On Wed, 12 Jun 2024 at 10:51, Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Tue, Jun 11, 2024 at 4:06 PM vignesh C <vignesh21@gmail.com> wrote:
> >
> > Amit and I engaged in an offline discussion regarding the design and
> > contemplated that it could be like below:
>
> If I understand correctly, does this require the sequences to already
> exist on the subscribing node before creating the subscription, or
> will it also copy any non-existing sequences?

Sequences must exist in the subscriber; we'll synchronize only their
values. Any sequences that are not present in the subscriber will
trigger an error.

> > 1) CREATE PUBLICATION syntax enhancement:
> > CREATE PUBLICATION ... FOR ALL SEQUENCES;
> > The addition of a new column titled "all sequences" in the
> > pg_publication system table will signify whether the publication is
> > designated as all sequences publication or not.
> >
> > 2)  CREATE SUBSCRIPTION -- no syntax change.
> > Upon creation of a subscription, the following additional steps will
> > be managed by the subscriber:
> > i) The subscriber will retrieve the list of sequences associated with
> > the subscription's publications.
> > ii) For each sequence: a) Retrieve the sequence value from the
> > publisher by invoking the pg_sequence_state function. b) Set the
> > sequence with the value obtained from the publisher. iv) Once the
> > subscription creation is completed, all sequence values will become
> > visible at the subscriber's end.
> >
> > An alternative design approach could involve retrieving the sequence
> > list from the publisher during subscription creation and inserting the
> > sequences with an "init" state into the pg_subscription_rel system
> > table. These tasks could be executed by a single sequence sync worker,
> > which would:
> > i) Retrieve the list of sequences in the "init" state from the
> > pg_subscription_rel system table.
> > ii) Initiate a transaction.
> > iii) For each sequence: a) Obtain the sequence value from the
> > publisher by utilizing the pg_sequence_state function. b) Update the
> > sequence with the value obtained from the publisher.
> > iv) Commit the transaction.
> >
> > The benefit with the second approach is that if there are large number
> > of sequences, the sequence sync can be enhanced to happen in parallel
> > and also if there are any locks held on the sequences in the
> > publisher, the sequence worker can wait to acquire the lock instead of
> > blocking the whole create subscription command which will delay the
> > initial copy of the tables too.
>
> Yeah w.r.t. this point second approach seems better.

ok

> > 3) Refreshing the sequence can be achieved through the existing
> > command: ALTER SUBSCRIPTION ... REFRESH PUBLICATION(no syntax change
> > here).
> > The subscriber identifies stale sequences, meaning sequences present
> > in pg_subscription_rel but absent from the publication, and removes
> > them from the pg_subscription_rel system table. The subscriber also
> > checks for newly added sequences in the publisher and synchronizes
> > their values from the publisher using the steps outlined in the
> > subscription creation process. It's worth noting that previously
> > synchronized sequences won't be synchronized again; the sequence sync
> > will occur solely for the newly added sequences.
>
> > 4) Introducing a new command for refreshing all sequences: ALTER
> > SUBSCRIPTION ... REFRESH PUBLICATION SEQUENCES.
> > The subscriber will remove stale sequences and add newly added
> > sequences from the publisher. Following this, it will re-synchronize
> > the sequence values for all sequences in the updated list from the
> > publisher, following the steps outlined in the subscription creation
> > process.
>
> Okay, this answers my first question: we will remove the sequences
> that are removed from the publisher and add the new sequences. I don't
> see any problem with this, but doesn't it seem like we are effectively
> doing DDL replication only for sequences without having a
> comprehensive plan for overall DDL replication?

What I intended to convey is that we'll eliminate the sequences from
pg_subscription_rel. We won't facilitate the DDL replication of
sequences; instead, we anticipate users to create the sequences
themselves.

> > 5) Incorporate the pg_sequence_state function to fetch the sequence
> > value from the publisher, along with the page LSN. Incorporate
> > SetSequence function, which will procure a new relfilenode for the
> > sequence and set the new relfilenode with the specified value. This
> > will facilitate rollback in case of any failures.
>
> I do not understand this point, you mean whenever we are fetching the
> sequence value from the publisher we need to create a new relfilenode
> on the subscriber?  Why not just update the catalog tuple is
> sufficient?  Or this is for handling the ALTER SEQUENCE case?

Sequences operate distinctively from tables. Alterations to sequences
reflect instantly in another session, even before committing the
transaction. To ensure the synchronization of sequence value and state
updates in pg_subscription_rel, we assign it a new relfilenode. This
strategy ensures that any potential errors allow for the rollback of
both the sequence state in pg_subscription_rel and the sequence values
simultaneously.

Regards,
Vignesh



pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: RFC: adding pytest as a supported test framework
Next
From: Bertrand Drouvot
Date:
Subject: Re: Track the amount of time waiting due to cost_delay