Home > mailing lists

Re: long-standing data loss bug in initial sync of logical replication - Mailing list pgsql-hackers

From	Tomas Vondra
Subject	Re: long-standing data loss bug in initial sync of logical replication
Date	November 19, 2023 01:15:33
Msg-id	5cc4bbc8-b2ad-3e6e-9878-cef8ddde8ffe@enterprisedb.com Whole thread Raw
In response to	Re: long-standing data loss bug in initial sync of logical replication (Andres Freund <andres@anarazel.de>)
Responses	Re: long-standing data loss bug in initial sync of logical replication
List	pgsql-hackers

Tree view

On 11/18/23 22:05, Andres Freund wrote:
> Hi,
> 
> On 2023-11-18 21:45:35 +0100, Tomas Vondra wrote:
>> On 11/18/23 19:12, Andres Freund wrote:
>>>> If we increase the locks from ShareUpdateExclusive to ShareRowExclusive,
>>>> we're making it conflict with RowExclusive. Which is just DML, and I
>>>> think we need to do that.
>>>
>>> From what I can tell it needs to to be an AccessExlusiveLock. Completely
>>> independent of logical decoding. The way the cache stays coherent is catalog
>>> modifications conflicting with anything that builds cache entries. We have a
>>> few cases where we do use lower level locks, but for those we have explicit
>>> analysis for why that's ok (see e.g. reloptions.c) or we block until nobody
>>> could have an old view of the catalog (various CONCURRENTLY) operations.
>>>
>>
>> Yeah, I got too focused on the issue I triggered, which seems to be
>> fixed by using SRE (still don't understand why ...). But you're probably
>> right there may be other cases where SRE would not be sufficient, I
>> certainly can't prove it'd be safe.
> 
> I think it makes sense here: SRE prevents the problematic "scheduling" in your
> test - with SRE no DML started before ALTER PUB ... ADD can commit after.
> 

If understand correctly, with the current code (which only gets
ShareUpdateExclusiveLock), we may end up in a situation like this
(sessions A and B):

  A: starts "ALTER PUBLICATION p ADD TABLE t" and gets the SUE lock
  A: writes the invalidation message(s) into WAL
  B: inserts into table "t"
  B: commit
  A: commit

With the stronger SRE lock, the commits would have to happen in the
opposite order, because as you say it prevents the bad ordering.

But why would this matter for logical decoding? We accumulate the the
invalidations and execute them at transaction commit, or did I miss
something?

So what I think should happen is we get to apply B first, which won't
see the table as part of the publication. It might even build the cache
entries (syscache+relsync), reflecting that. But then we get to execute
A, along with all the invalidations, and that should invalidate them.

I'm clearly missing something, because the SRE does change the behavior,
so there has to be a difference (and by my reasoning it shouldn't be).

Or maybe it's the other way around? Won't B get the invalidation, but
use a historical snapshot that doesn't yet see the table in publication?

> I'm not sure there are any cases where using SRE instead of AE would cause
> problems for logical decoding, but it seems very hard to prove. I'd be very
> surprised if just using SRE would not lead to corrupted cache contents in some
> situations. The cases where a lower lock level is ok are ones where we just
> don't care that the cache is coherent in that moment.
> 

Are you saying it might break cases that are not corrupted now? How
could obtaining a stronger lock have such effect?

> In a way, the logical decoding cache-invalidation situation is a lot more
> atomic than the "normal" situation. During normal operation locking is
> strictly required to prevent incoherent states when building a cache entry
> after a transaction committed, but before the sinval entries have been
> queued. But in the logical decoding case that window doesn't exist.
> 

Because we apply the invalidations at commit time, so it happens as a
single operation that can't interleave with other sessions?

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

pgsql-hackers by date:

From: Noah Misch
Date: 19 November 2023, 00:32:36
Subject: Re: reindexing an invalid index should not use ERRCODE_INDEX_CORRUPTED

From: Alexander Korotkov
Date: 19 November 2023, 01:17:29
Subject: Re: Assert failure on 'list_member_ptr(rel->joininfo, restrictinfo)'

Re: long-standing data loss bug in initial sync of logical replication - Mailing list pgsql-hackers

Previous

Next