Re: bogus: logical replication rows/cols combinations - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: bogus: logical replication rows/cols combinations
Date
Msg-id CAA4eK1JJ_cJZFanVFDC-1ipKC9aounno5JcWx1ceH035FZZ1MA@mail.gmail.com
Whole thread Raw
In response to Re: bogus: logical replication rows/cols combinations  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Responses Re: bogus: logical replication rows/cols combinations
List pgsql-hackers
On Sun, May 8, 2022 at 11:41 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
>
> On 5/7/22 07:36, Amit Kapila wrote:
> > On Mon, May 2, 2022 at 6:11 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> >>
> >> On 2022-May-02, Amit Kapila wrote:
> >>
> >>
> >>> I think it is possible to expose a list of publications for each
> >>> walsender as it is stored in each walsenders
> >>> LogicalDecodingContext->output_plugin_private. AFAIK, each walsender
> >>> can have one such LogicalDecodingContext and we can probably share it
> >>> via shared memory?
> >>
> >> I guess we need to create a DSM each time a walsender opens a
> >> connection, at START_REPLICATION time.  Then ALTER PUBLICATION needs to
> >> connect to all DSMs of all running walsenders and see if they are
> >> reading from it.  Is that what you have in mind?  Alternatively, we
> >> could have one DSM per publication with a PID array of all walsenders
> >> that are sending it (each walsender needs to add its PID as it starts).
> >> The latter might be better.
> >>
> >
> > While thinking about using DSM here, I came across one of your commits
> > f2f9fcb303 which seems to indicate that it is not a good idea to rely
> > on it but I think you have changed dynamic shared memory to fixed
> > shared memory usage because that was more suitable rather than DSM is
> > not portable. Because I see a commit bcbd940806 where we have removed
> > the 'none' option of dynamic_shared_memory_type. So, I think it should
> > be okay to use DSM in this context. What do you think?
> >
>
> Why would any of this be needed?
>
> ALTER PUBLICATION will invalidate the RelationSyncEntry entries in all
> walsenders, no? So AFAICS it should be enough to enforce the limitations
> in get_rel_sync_entry,
>

Yes, that should be sufficient to enforce limitations in
get_rel_sync_entry() but it will lead to the following behavior:
a. The Alter Publication command will be successful but later in the
logs, the error will be logged and the user needs to check it and take
appropriate action. Till that time the walsender will be in an error
loop which means it will restart and again lead to the same error till
the user takes some action.
b. As we use historic snapshots, so even after the user takes action
say by changing publication, it won't be reflected. So, the option for
the user would be to drop their subscription.

Am, I missing something? If not, then are we okay with such behavior?
If yes, then I think it would be much easier implementation-wise and
probably advisable at this point. We can document it so that users are
careful and can take necessary action if they get into such a
situation. Any way we can improve this in future as you also suggested
earlier.

> which is necessary anyway because the subscriber
> may not be connected when ALTER PUBLICATION gets executed.
>

If we are not okay with the resultant behavior of detecting this in
get_rel_sync_entry(), then we can solve this in some other way as
Alvaro has indicated in one of his responses which is to detect that
at start replication time probably in the subscriber-side.

-- 
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Corey Huinker
Date:
Subject: Re: Query generates infinite loop
Next
From: Tom Lane
Date:
Subject: Re: Query generates infinite loop