Re: Logical Replication of sequences - Mailing list pgsql-hackers

From Dilip Kumar
Subject Re: Logical Replication of sequences
Date
Msg-id CAFiTN-vg6JBdu93T74o8F3n_hHdaj+-YdD4jRbOCvybF_RwwRg@mail.gmail.com
Whole thread Raw
In response to Re: Logical Replication of sequences  (vignesh C <vignesh21@gmail.com>)
Responses Re: Logical Replication of sequences
List pgsql-hackers
On Tue, Jun 11, 2024 at 4:06 PM vignesh C <vignesh21@gmail.com> wrote:
>
> Amit and I engaged in an offline discussion regarding the design and
> contemplated that it could be like below:

If I understand correctly, does this require the sequences to already
exist on the subscribing node before creating the subscription, or
will it also copy any non-existing sequences?

> 1) CREATE PUBLICATION syntax enhancement:
> CREATE PUBLICATION ... FOR ALL SEQUENCES;
> The addition of a new column titled "all sequences" in the
> pg_publication system table will signify whether the publication is
> designated as all sequences publication or not.
>
> 2)  CREATE SUBSCRIPTION -- no syntax change.
> Upon creation of a subscription, the following additional steps will
> be managed by the subscriber:
> i) The subscriber will retrieve the list of sequences associated with
> the subscription's publications.
> ii) For each sequence: a) Retrieve the sequence value from the
> publisher by invoking the pg_sequence_state function. b) Set the
> sequence with the value obtained from the publisher. iv) Once the
> subscription creation is completed, all sequence values will become
> visible at the subscriber's end.
>
> An alternative design approach could involve retrieving the sequence
> list from the publisher during subscription creation and inserting the
> sequences with an "init" state into the pg_subscription_rel system
> table. These tasks could be executed by a single sequence sync worker,
> which would:
> i) Retrieve the list of sequences in the "init" state from the
> pg_subscription_rel system table.
> ii) Initiate a transaction.
> iii) For each sequence: a) Obtain the sequence value from the
> publisher by utilizing the pg_sequence_state function. b) Update the
> sequence with the value obtained from the publisher.
> iv) Commit the transaction.
>
> The benefit with the second approach is that if there are large number
> of sequences, the sequence sync can be enhanced to happen in parallel
> and also if there are any locks held on the sequences in the
> publisher, the sequence worker can wait to acquire the lock instead of
> blocking the whole create subscription command which will delay the
> initial copy of the tables too.

Yeah w.r.t. this point second approach seems better.

> 3) Refreshing the sequence can be achieved through the existing
> command: ALTER SUBSCRIPTION ... REFRESH PUBLICATION(no syntax change
> here).
> The subscriber identifies stale sequences, meaning sequences present
> in pg_subscription_rel but absent from the publication, and removes
> them from the pg_subscription_rel system table. The subscriber also
> checks for newly added sequences in the publisher and synchronizes
> their values from the publisher using the steps outlined in the
> subscription creation process. It's worth noting that previously
> synchronized sequences won't be synchronized again; the sequence sync
> will occur solely for the newly added sequences.

> 4) Introducing a new command for refreshing all sequences: ALTER
> SUBSCRIPTION ... REFRESH PUBLICATION SEQUENCES.
> The subscriber will remove stale sequences and add newly added
> sequences from the publisher. Following this, it will re-synchronize
> the sequence values for all sequences in the updated list from the
> publisher, following the steps outlined in the subscription creation
> process.

Okay, this answers my first question: we will remove the sequences
that are removed from the publisher and add the new sequences. I don't
see any problem with this, but doesn't it seem like we are effectively
doing DDL replication only for sequences without having a
comprehensive plan for overall DDL replication?

> 5) Incorporate the pg_sequence_state function to fetch the sequence
> value from the publisher, along with the page LSN. Incorporate
> SetSequence function, which will procure a new relfilenode for the
> sequence and set the new relfilenode with the specified value. This
> will facilitate rollback in case of any failures.

I do not understand this point, you mean whenever we are fetching the
sequence value from the publisher we need to create a new relfilenode
on the subscriber?  Why not just update the catalog tuple is
sufficient?  Or this is for handling the ALTER SEQUENCE case?

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: Logical Replication of sequences
Next
From: "Fujii.Yuki@df.MitsubishiElectric.co.jp"
Date:
Subject: RE: Partial aggregates pushdown