Re: long-standing data loss bug in initial sync of logical replication - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: long-standing data loss bug in initial sync of logical replication
Date
Msg-id CAA4eK1+HOzhB35PVnmgL-OAtfLnQAsKM1shJODZi3sacTeae9A@mail.gmail.com
Whole thread Raw
In response to Re: long-standing data loss bug in initial sync of logical replication  (vignesh C <vignesh21@gmail.com>)
Responses Re: long-standing data loss bug in initial sync of logical replication
List pgsql-hackers
On Mon, Jul 1, 2024 at 10:51 AM vignesh C <vignesh21@gmail.com> wrote:
>
>
> This issue is present in all supported versions. I was able to
> reproduce it using the steps recommended by Andres and Tomas's
> scripts. I also conducted a small test through TAP tests to verify the
> problem. Attached is the alternate_lock_HEAD.patch, which includes the
> lock modification(Tomas's change) and the TAP test.
>

@@ -1568,7 +1568,7 @@ OpenTableList(List *tables)
  /* Allow query cancel in case this takes a long time */
  CHECK_FOR_INTERRUPTS();

- rel = table_openrv(t->relation, ShareUpdateExclusiveLock);
+ rel = table_openrv(t->relation, ShareRowExclusiveLock);

The comment just above this code ("Open, share-lock, and check all the
explicitly-specified relations") needs modification. It would be
better to explain the reason of why we would need SRE lock here.

> To reproduce the issue in the HEAD version, we cannot use the same
> test as in the alternate_lock_HEAD patch because the behavior changes
> slightly after the fix to wait for the lock until the open transaction
> completes.
>

But won't the test that reproduces the problem in HEAD be successful
after the code change? If so, can't we use the same test instead of
slight modification to verify the lock mode?

>  The attached issue_reproduce_testcase_head.patch can be
> used to reproduce the issue through TAP test in HEAD.
> The changes made in the HEAD version do not directly apply to older
> branches. For PG14, PG13, and PG12 branches, you can use the
> alternate_lock_PG14.patch.
>

Why didn't you include the test in the back branches? If it is due to
background psql stuff, then won't commit
(https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=187b8991f70fc3d2a13dc709edd408a8df0be055)
can address it?

--
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: 回复: An implementation of multi-key sort
Next
From: Amit Langote
Date:
Subject: Re: Address the -Wuse-after-free warning in ATExecAttachPartition()