Re: Logical Replication of sequences - Mailing list pgsql-hackers
| From | shveta malik |
|---|---|
| Subject | Re: Logical Replication of sequences |
| Date | |
| Msg-id | CAJpy0uB7JcrDntEm+nvQ8k74dMzYcDEQZBS70mOcY0N3UcczFg@mail.gmail.com Whole thread Raw |
| In response to | RE: Logical Replication of sequences ("Zhijie Hou (Fujitsu)" <houzj.fnst@fujitsu.com>) |
| List | pgsql-hackers |
On Mon, Oct 27, 2025 at 8:23 AM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> wrote: > > On Friday, October 24, 2025 11:22 PM vignesh C <vignesh21@gmail.com> wrote: > > > > On Thu, 23 Oct 2025 at 16:47, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Thu, Oct 23, 2025 at 11:45 AM vignesh C <vignesh21@gmail.com> wrote: > > > > > > > > The attached patch has the changes for the same. > > > > > > > > > > I have pushed 0001 and the following are comments on 0002. > > > > The attached v20251024 version patch has the changes for the same. > > The comments from [1] have also been addressed in this version. > > Thanks for updating the patch. > > I was reviewing 0003 and have some thoughts for simplifying the codes related to > sequence state invalidations and hash tables: > > 1. I'm considering whether we could lock sequences at the start and maintain > these locks until the copy process finishes, allowing us to remove > invalidation codes. > > I understand that the current process is: > > 1. start a transaction to fetch namespace/seqname for all the sequences in > the pg_subscription_rel > 2. start multiple transation and handle a batch of in each transaction > > So if there are sequence is altered between step 1 and 2, then we need to > skip the renamed or dropped sequences in step 2 and invalidates the hash > entry which looks inelegant. > > To improve this, my proposal is to postpone the namespace/seqname fetch logic > until the second step. Initially, we would fetch just the sequence OIDs. > Then, in step 2, we would fetch the namespace/seqname after locking the > sequence. This approach ensures that any concurrent RENAME operations between > steps are irrelevant, as we will use the latest sequence names to query the > publisher, preventing any RENAME during step 2. This logic is also consistent > with tablesync process where we lock the table first and get nspname/relname > after that. > > 2. We currently use a hash table to map remote sequence information to local > sequence data. I'm exploring the possibility of using a List instead. By > passing the sequence's index in the List to the query: > > The idea is to pass the index of the sequence in the List to the query like: > > "FROM ( VALUES %s ) AS s (schname, seqname, seqidx)" > > Upon receiving the results, we can directly map remote sequences to local > ones using:: "list_nth(seqinfos, seqidx);" > > Here is a patch atop of 0003 that implements above ideas. Please take a > look at this and see if it makes the code look better. > I like the overall idea here. With this approach, since we first fetch the relids and then retrieve the sequence names later (after taking the exclusive lock), our remote query will always use the latest names, whether they’re the original or altered ones, it doesn’t matter. The sequence name itself can’t change during this step, so we’re safe on that front. As a result, the race conditions I mentioned in [1] and [2] are no longer applicable. I’ll still go through the patch in more detail to verify and review it. [1]: https://www.postgresql.org/message-id/CAJpy0uC-Jx2L6tOTnDQ_Zwz99X3HQDik6tG%3D%2B1a71SxZFiy12w%40mail.gmail.com [2]: https://www.postgresql.org/message-id/CAJpy0uAQ43WjvuBi9F_hOJwsa1veGCJJs0ogH1o_o9AAv0jTfg%40mail.gmail.com thanks Shveta
pgsql-hackers by date: