Re: Minimal logical decoding on standbys - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Minimal logical decoding on standbys
Date
Msg-id CA+Tgmoa1cZSo1tUTzxS+RyniMuNtkpxUeEhnD2+-mvJodVtdSQ@mail.gmail.com
Whole thread Raw
In response to Re: Minimal logical decoding on standbys  ("Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com>)
Responses Re: Minimal logical decoding on standbys  ("Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com>)
List pgsql-hackers
On Tue, Dec 13, 2022 at 5:49 AM Drouvot, Bertrand
<bertranddrouvot.pg@gmail.com> wrote:
> > I think the real problem here is that
> > RelationIsAccessibleInLogicalDecoding is returning *the wrong answer*
> > when the relation is a user-catalog table. It does so because it
> > relies on RelationIsUsedAsCatalogTable, and that macro relies on
> > checking whether the reloptions include user_catalog_table.
>
> [ confusion ]

Sorry, I meant: RelationIsAccessibleInLogicalDecoding is returning
*the wrong answer* when the relation is an *INDEX*.

> > But I have a feeling that the reloptions
> > code is not very well-structured to allow reloptions to be stored any
> > place but in pg_class.reloptions, so this may be difficult to
> > implement.
>
> Why don't remove this "property" from reloptions? (would probably need much more changes that the current approach
andprobably take care of upgrade scenario too). 
> I did not look in details but logged/unlogged is also propagated to the indexes, so maybe we could use the same
approachhere. But is it worth the probably added complexity (as compare to the current approach)? 

I feel like changing the user-facing syntax is probably not a great
idea, as it inflicts upgrade pain that I don't see how we can really
fix.

> > It would be very helpful if there were some place to refer to that
> > explained the design decisions here, like why the feature we're trying
> > to get requires this infrastructure around indexes to be added. It
> > could be in the commit messages, an email message, a README, or
> > whatever, but right now I don't see it anywhere in here.
>
> Like adding something around those lines in the commit message?
>
> "
> On a primary database, any catalog rows that may be needed by a logical decoding replication slot are not removed.
> This is done thanks to the catalog_xmin associated with the logical replication slot.
>
> With logical decoding on standby, in the following cases:
>
> - hot_standby_feedback is off
> - hot_standby_feedback is on but there is no a physical slot between the primary and the standby. Then,
hot_standby_feedbackwill work, but only while the connection is alive (for example a node restart would break it) 
>
> Then the primary may delete system catalog rows that could be needed by the logical decoding on the standby. Then,
it’smandatory to identify those rows and invalidate the slots that may need them if any. 
> "

This is very helpful, yes. I think perhaps we need to work some of
this into the code comments someplace, but getting it into the commit
message would be a good first step.

What I infer from the above is that the overall design looks like this:

- We want to enable logical decoding on standbys, but replay of WAL
from the primary might remove data that is needed by logical decoding,
causing replication conflicts much as hot standby does.
- Our chosen strategy for dealing with this type of replication slot
is to invalidate logical slots for which needed data has been removed.
- To do this we need the latestRemovedXid for each change, just as we
do for physical replication conflicts, but we also need to know
whether any particular change was to data that logical replication
might access.
- We can't rely on the standby's relcache entries for this purpose in
any way, because the WAL record that causes the problem might be
replayed before the standby even reaches consistency. (Is this true? I
think so.)
- Therefore every WAL record that potentially removes data from the
index or heap must carry a flag indicating whether or not it is one
that might be accessed during logical decoding.

Does that sound right?

It seems kind of unfortunate to have to add payload to a whole bevy of
record types for this feature. I think it's worth it, both because the
feature is extremely important, and also because there aren't any
record types that fall into this category that are going to be emitted
so frequently as to make it a performance problem. But it's certainly
more complicated than one might wish.

--
Robert Haas
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Vik Fearing
Date:
Subject: Re: Ordering behavior for aggregates
Next
From: Ronan Dunklau
Date:
Subject: Re: Ordering behavior for aggregates