Re: long-standing data loss bug in initial sync of logical replication - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: long-standing data loss bug in initial sync of logical replication |
Date | |
Msg-id | 5cc4bbc8-b2ad-3e6e-9878-cef8ddde8ffe@enterprisedb.com Whole thread Raw |
In response to | Re: long-standing data loss bug in initial sync of logical replication (Andres Freund <andres@anarazel.de>) |
Responses |
Re: long-standing data loss bug in initial sync of logical replication
|
List | pgsql-hackers |
On 11/18/23 22:05, Andres Freund wrote: > Hi, > > On 2023-11-18 21:45:35 +0100, Tomas Vondra wrote: >> On 11/18/23 19:12, Andres Freund wrote: >>>> If we increase the locks from ShareUpdateExclusive to ShareRowExclusive, >>>> we're making it conflict with RowExclusive. Which is just DML, and I >>>> think we need to do that. >>> >>> From what I can tell it needs to to be an AccessExlusiveLock. Completely >>> independent of logical decoding. The way the cache stays coherent is catalog >>> modifications conflicting with anything that builds cache entries. We have a >>> few cases where we do use lower level locks, but for those we have explicit >>> analysis for why that's ok (see e.g. reloptions.c) or we block until nobody >>> could have an old view of the catalog (various CONCURRENTLY) operations. >>> >> >> Yeah, I got too focused on the issue I triggered, which seems to be >> fixed by using SRE (still don't understand why ...). But you're probably >> right there may be other cases where SRE would not be sufficient, I >> certainly can't prove it'd be safe. > > I think it makes sense here: SRE prevents the problematic "scheduling" in your > test - with SRE no DML started before ALTER PUB ... ADD can commit after. > If understand correctly, with the current code (which only gets ShareUpdateExclusiveLock), we may end up in a situation like this (sessions A and B): A: starts "ALTER PUBLICATION p ADD TABLE t" and gets the SUE lock A: writes the invalidation message(s) into WAL B: inserts into table "t" B: commit A: commit With the stronger SRE lock, the commits would have to happen in the opposite order, because as you say it prevents the bad ordering. But why would this matter for logical decoding? We accumulate the the invalidations and execute them at transaction commit, or did I miss something? So what I think should happen is we get to apply B first, which won't see the table as part of the publication. It might even build the cache entries (syscache+relsync), reflecting that. But then we get to execute A, along with all the invalidations, and that should invalidate them. I'm clearly missing something, because the SRE does change the behavior, so there has to be a difference (and by my reasoning it shouldn't be). Or maybe it's the other way around? Won't B get the invalidation, but use a historical snapshot that doesn't yet see the table in publication? > I'm not sure there are any cases where using SRE instead of AE would cause > problems for logical decoding, but it seems very hard to prove. I'd be very > surprised if just using SRE would not lead to corrupted cache contents in some > situations. The cases where a lower lock level is ok are ones where we just > don't care that the cache is coherent in that moment. > Are you saying it might break cases that are not corrupted now? How could obtaining a stronger lock have such effect? > In a way, the logical decoding cache-invalidation situation is a lot more > atomic than the "normal" situation. During normal operation locking is > strictly required to prevent incoherent states when building a cache entry > after a transaction committed, but before the sinval entries have been > queued. But in the logical decoding case that window doesn't exist. > Because we apply the invalidations at commit time, so it happens as a single operation that can't interleave with other sessions? regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: