Re: bogus: logical replication rows/cols combinations - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: bogus: logical replication rows/cols combinations
Date
Msg-id CAA4eK1JC8v8ysNS9L1et0_yyZJOEE1wB+GrvOcar_f5dVadvBg@mail.gmail.com
Whole thread Raw
In response to Re: bogus: logical replication rows/cols combinations  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Responses RE: bogus: logical replication rows/cols combinations
List pgsql-hackers
On Wed, May 11, 2022 at 12:35 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
>
> On 5/9/22 05:45, Amit Kapila wrote:
> > On Sun, May 8, 2022 at 11:41 PM Tomas Vondra
> > <tomas.vondra@enterprisedb.com> wrote:
> >>
> >> On 5/7/22 07:36, Amit Kapila wrote:
> >>> On Mon, May 2, 2022 at 6:11 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> >>>>
> >>>> On 2022-May-02, Amit Kapila wrote:
> >>>>
> >>>>
> >>>>> I think it is possible to expose a list of publications for each
> >>>>> walsender as it is stored in each walsenders
> >>>>> LogicalDecodingContext->output_plugin_private. AFAIK, each walsender
> >>>>> can have one such LogicalDecodingContext and we can probably share it
> >>>>> via shared memory?
> >>>>
> >>>> I guess we need to create a DSM each time a walsender opens a
> >>>> connection, at START_REPLICATION time.  Then ALTER PUBLICATION needs to
> >>>> connect to all DSMs of all running walsenders and see if they are
> >>>> reading from it.  Is that what you have in mind?  Alternatively, we
> >>>> could have one DSM per publication with a PID array of all walsenders
> >>>> that are sending it (each walsender needs to add its PID as it starts).
> >>>> The latter might be better.
> >>>>
> >>>
> >>> While thinking about using DSM here, I came across one of your commits
> >>> f2f9fcb303 which seems to indicate that it is not a good idea to rely
> >>> on it but I think you have changed dynamic shared memory to fixed
> >>> shared memory usage because that was more suitable rather than DSM is
> >>> not portable. Because I see a commit bcbd940806 where we have removed
> >>> the 'none' option of dynamic_shared_memory_type. So, I think it should
> >>> be okay to use DSM in this context. What do you think?
> >>>
> >>
> >> Why would any of this be needed?
> >>
> >> ALTER PUBLICATION will invalidate the RelationSyncEntry entries in all
> >> walsenders, no? So AFAICS it should be enough to enforce the limitations
> >> in get_rel_sync_entry,
> >>
> >
> > Yes, that should be sufficient to enforce limitations in
> > get_rel_sync_entry() but it will lead to the following behavior:
> > a. The Alter Publication command will be successful but later in the
> > logs, the error will be logged and the user needs to check it and take
> > appropriate action. Till that time the walsender will be in an error
> > loop which means it will restart and again lead to the same error till
> > the user takes some action.
> > b. As we use historic snapshots, so even after the user takes action
> > say by changing publication, it won't be reflected. So, the option for
> > the user would be to drop their subscription.
> >
> > Am, I missing something? If not, then are we okay with such behavior?
> > If yes, then I think it would be much easier implementation-wise and
> > probably advisable at this point. We can document it so that users are
> > careful and can take necessary action if they get into such a
> > situation. Any way we can improve this in future as you also suggested
> > earlier.
> >
> >> which is necessary anyway because the subscriber
> >> may not be connected when ALTER PUBLICATION gets executed.
> >>
> >
> > If we are not okay with the resultant behavior of detecting this in
> > get_rel_sync_entry(), then we can solve this in some other way as
> > Alvaro has indicated in one of his responses which is to detect that
> > at start replication time probably in the subscriber-side.
> >
>
> IMO that behavior is acceptable.
>

Fair enough, then we should go with a simpler approach to detect it in
pgoutput.c (get_rel_sync_entry).

> We have to do that check anyway, and
> the subscription may start failing after ALTER PUBLICATION for a number
> of other reasons anyway so the user needs/should check the logs.
>

I think ALTER PUBLICATION won't ever lead to failure in walsender.
Sure, users can do something due to which subscriber-side failures can
happen due to constraint failures. Do you have some specific cases in
mind?

> And if needed, we can improve this and start doing the proactive-checks
> during ALTER PUBLICATION too.
>

Agreed.

-- 
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Column Filtering in Logical Replication
Next
From: Bruce Momjian
Date:
Subject: Re: First draft of the PG 15 release notes