Re: Time delayed LR (WAS Re: logical replication restrictions) - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: Time delayed LR (WAS Re: logical replication restrictions)
Date
Msg-id CAD21AoAeG2+RsUYD9+mEwr8-rrt8R1bqpe56T2D=euO-Qs-GAg@mail.gmail.com
Whole thread Raw
In response to RE: Time delayed LR (WAS Re: logical replication restrictions)  ("Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com>)
Responses Re: Time delayed LR (WAS Re: logical replication restrictions)  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On Wed, Mar 1, 2023 at 12:51 AM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Dear Amit,
>
> > Few comments:
>
> Thank you for reviewing! PSA new version.
> Note that the starting point of delay for 2PC was not changed,
> I think it has been under discussion.
>
> > 1.
> > + /*
> > + * If we've requested to shut down, exit the process.
> > + *
> > + * Note that WalSndDone() cannot be used here because the delaying
> > + * changes will be sent in the function.
> > + */
> > + if (got_STOPPING)
> > + {
> > + QueryCompletion qc;
> > +
> > + /* Inform the standby that XLOG streaming is done */
> > + SetQueryCompletion(&qc, CMDTAG_COPY, 0);
> > + EndCommand(&qc, DestRemote, false);
> > + pq_flush();
> >
> > Do we really need to do anything except for breaking the loop and let
> > the exit handling happen in the main loop when 'got_STOPPING' is set?
> > AFAICS, this is what we are doing in some other palces (See
> > WalSndWaitForWal). Won't that work? It seems that will help us sending
> > all the pending WAL.
>
> If we exit the loop after got_STOPPING is set, as you said, the walsender will
> send delaying changes and then exit. The behavior is same as the case that WalSndDone()
> is called. But I think it is not suitable for the motivation of the feature.
> If users notice the miss operation like TRUNCATE, they must shut down the publisher
> once and then recovery from back up or old subscriber. If the walsender sends all
> pending changes, miss operations will be also propagated to subscriber and data
> cannot be protected. So currently I want to keep the style.
> FYI - In case of physical replication, received WALs are not applied when the
> secondary is shutted down.
>
> > 2.
> > + /* Try to flush pending output to the client */
> > + if (pq_flush_if_writable() != 0)
> > + WalSndShutdown();
> >
> > Is there a reason to try flushing here?
>
> IIUC if pq_flush_if_writable() returns non-zero (EOF), it means that there is a
> trouble and walsender fails to send messages to subscriber.
>
> In Linux, the stuck trace from pq_flush_if_writable() will finally reach the send() system call.
> And according to man page[1], it will be triggered by some unexpected state or the connection is closed.
>
> Based on above, I think the returned value should be confirmed.
>
> > Apart from the above, I have made a few changes in the comments in the
> > attached diff patch. If you agree with those then please include them
> > in the next version.
>
> Thanks! I checked and I thought all of them should be included.
>
> Moreover, I used grammar checker and slightly reworded the commit message.

Thinking of side effects of this feature (no matter where we delay
applying the changes), on the publisher, vacuum cannot collect garbage
and WAL cannot be recycled. Is that okay in the first place? The point
is that the subscription setting affects the publisher. That is,
min_send_delay is specified on the subscriber but the symptoms that
could ultimately lead to a server crash appear on the publisher, which
sounds dangerous to me.

Imagine a service or system like where there is a publication server
and it's somewhat exposed so that a user (or a subsystem) arbitrarily
can create a subscriber to replicate a subset of the data. A malicious
user can have the publisher crash by creating a subscription with,
say, min_send_delay = 20d. max_slot_wal_keep_size helps this situation
but it's -1 by default.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Kyotaro Horiguchi
Date:
Subject: Re: Time delayed LR (WAS Re: logical replication restrictions)
Next
From: Stephen Frost
Date:
Subject: Re: Auth extensions, with an LDAP/SCRAM example [was: Proposal: Support custom authentication methods using hooks]