Re: long-standing data loss bug in initial sync of logical replication - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: long-standing data loss bug in initial sync of logical replication
Date
Msg-id CAA4eK1+W_9zQtaXX4fsrHDrLOZyoFheeLo6ExQX-U6EPGTAqEA@mail.gmail.com
Whole thread Raw
In response to Re: long-standing data loss bug in initial sync of logical replication  (Andres Freund <andres@anarazel.de>)
Responses Re: long-standing data loss bug in initial sync of logical replication
List pgsql-hackers
On Sun, Nov 19, 2023 at 7:48 AM Andres Freund <andres@anarazel.de> wrote:
>
> On 2023-11-19 02:15:33 +0100, Tomas Vondra wrote:
> >
> > If understand correctly, with the current code (which only gets
> > ShareUpdateExclusiveLock), we may end up in a situation like this
> > (sessions A and B):
> >
> >   A: starts "ALTER PUBLICATION p ADD TABLE t" and gets the SUE lock
> >   A: writes the invalidation message(s) into WAL
> >   B: inserts into table "t"
> >   B: commit
> >   A: commit
>
> I don't think this the problematic sequence - at least it's not what I had
> reproed in
> https://postgr.es/m/20231118025445.crhaeeuvoe2g5dv6%40awork3.anarazel.de
>
> Adding line numbers:
>
> 1) S1: CREATE TABLE d(data text not null);
> 2) S1: INSERT INTO d VALUES('d1');
> 3) S2: BEGIN; INSERT INTO d VALUES('d2');
> 4) S1: ALTER PUBLICATION pb ADD TABLE d;
> 5) S2: COMMIT
> 6) S2: INSERT INTO d VALUES('d3');
> 7) S1: INSERT INTO d VALUES('d4');
> 8) RL: <nothing>
>
> The problem with the sequence is that the insert from 3) is decoded *after* 4)
> and that to decode the insert (which happened before the ALTER) the catalog
> snapshot and cache state is from *before* the ALTER TABLE. Because the
> transaction started in 3) doesn't actually modify any catalogs, no
> invalidations are executed after decoding it. The result is that the cache
> looks like it did at 3), not like after 4). Undesirable timetravel...
>
> It's worth noting that here the cache state is briefly correct, after 4), it's
> just that after 5) it stays the old state.
>
> If 4) instead uses a SRE lock, then S1 will be blocked until S2 commits, and
> everything is fine.
>

I agree, your analysis looks right to me.

>
>
> > > I'm not sure there are any cases where using SRE instead of AE would cause
> > > problems for logical decoding, but it seems very hard to prove. I'd be very
> > > surprised if just using SRE would not lead to corrupted cache contents in some
> > > situations. The cases where a lower lock level is ok are ones where we just
> > > don't care that the cache is coherent in that moment.
>
> > Are you saying it might break cases that are not corrupted now? How
> > could obtaining a stronger lock have such effect?
>
> No, I mean that I don't know if using SRE instead of AE would have negative
> consequences for logical decoding. I.e. whether, from a logical decoding POV,
> it'd suffice to increase the lock level to just SRE instead of AE.
>
> Since I don't see how it'd be correct otherwise, it's kind of a moot question.
>

We lost track of this thread and the bug is still open. IIUC, the
conclusion is to use SRE in OpenTableList() to fix the reported issue.
Andres, Tomas, please let me know if my understanding is wrong,
otherwise, let's proceed and fix this issue.

--
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Bertrand Drouvot
Date:
Subject: Re: Track the amount of time waiting due to cost_delay
Next
From: Matthias van de Meent
Date:
Subject: Re: Improve EXPLAIN output for multicolumn B-Tree Index