Re: Build-farm - intermittent error in 031_column_list.pl - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: Build-farm - intermittent error in 031_column_list.pl |
Date | |
Msg-id | CAA4eK1K40xhObN1MWO7=rzrJmo+oQ048O8drZ86-F7artVvwQQ@mail.gmail.com Whole thread Raw |
In response to | Re: Build-farm - intermittent error in 031_column_list.pl (Tomas Vondra <tomas.vondra@enterprisedb.com>) |
Responses |
RE: Build-farm - intermittent error in 031_column_list.pl
|
List | pgsql-hackers |
On Wed, May 25, 2022 at 6:54 PM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > > On 5/25/22 13:26, Amit Kapila wrote: > > On Wed, May 25, 2022 at 8:16 AM Kyotaro Horiguchi > > <horikyota.ntt@gmail.com> wrote: > >> > >> It does "fix" the case of [1]. But AFAIS > >> RelationSyncEntry.replicate_valid is only used to inhibit repeated > >> loading in get_rel_sync_entry and the function doesn't seem to be > >> assumed to return a invalid entry. (Since the flag is not checked > >> nowhere else.) > >> > >> For example pgoutput_change does not check for the flag of the entry > >> returned from the function before uses it, which is not seemingly > >> safe. (I didn't check further, though) > >> > >> Don't we need to explicitly avoid using invalid entries outside the > >> function? > >> > > > > We decide that based on pubactions in the callers, so even if entry is > > valid, it won't do anything. Actually, we don't need to avoid setting > > replication_valid flag as some of the publications for the table may > > be already present. We can check if the publications_valid flag is set > > while trying to validate the entry. Now, even if we don't find any > > publications the replicate_valid flag will be set but none of the > > actions will be set, so it won't do anything in the caller. Is this > > better than the previous approach? > > > > For the record, I'm not convinced this is the right way to fix the > issue, as it may easily mask the real problem. > > We do silently ignore missing objects in various places, but only when > either requested or when it's obvious it's expected and safe to ignore. > But I'm not sure that applies here, in a clean way. > > Imagine you have a subscriber using two publications p1 and p2, and > someone comes around and drops p1 by mistake. With the proposed patch, > the subscription will notice this, but it'll continue sending data > ignoring the missing publication. Yes, it will continue working, but > it's quite possible this breaks the subscriber and it's be better to > fail and stop replicating. > Ideally, shouldn't we disallow drop of publication in such cases where it is part of some subscription? I know it will be tricky because some subscriptions could be disabled. > The other aspect I dislike is that we just stop caching publication > info, forcing us to reload it for every replicated change/row. So even > if dropping the publication happens not to "break" the subscriber (i.e. > the data makes sense), this may easily cause performance issues, lag in > the replication, and so on. And the users will have no idea why and/or > how to fix it, because we just do this silently. > Yeah, this is true that if there are missing publications, it needs to reload all the publications info again unless we build a mechanism to change the existing cached entry by loading only required info. The other thing we could do here is to LOG the info for missing publications to make users aware of the fact. I think we can also introduce a new option while defining/altering subscription to indicate whether to continue on missing publication or not, that way by default we will stop replication as we are doing now but users will have a way to move replication. BTW, what are the other options we have to fix the cases where replication is broken (or the user has no clue on how to proceed) as we are discussing the case here or the OP reported yet another case on pgsql-bugs [1]? [1] - https://www.postgresql.org/message-id/CANWRaJyyD%3D9c1E2HdF-Tqfe7%2BvuCQnAkXd6%2BEFwxC0wM%3D313AA%40mail.gmail.com -- With Regards, Amit Kapila.
pgsql-hackers by date: