RE: Exit walsender before confirming remote flush in logical replication - Mailing list pgsql-hackers

From Hayato Kuroda (Fujitsu)
Subject RE: Exit walsender before confirming remote flush in logical replication
Date
Msg-id TYAPR01MB58661F81B38AC7A43F44A81DF5D79@TYAPR01MB5866.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: Exit walsender before confirming remote flush in logical replication  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: Exit walsender before confirming remote flush in logical replication
List pgsql-hackers
Dear Amit, Sawada-san,

> > IIUC there is no difference between smart shutdown and fast shutdown
> > in logical replication walsender, but reading the doc[1], it seems to
> > me that in the smart shutdown mode, the server stops existing sessions
> > normally. For example, If the client is psql that gets stuck for some
> > reason and the network buffer gets full, the smart shutdown waits for
> > a backend process to send all results to the client. I think the
> > logical replication walsender should follow this behavior for
> > consistency. One idea is to distinguish smart shutdown and fast
> > shutdown also in logical replication walsender so that we disconnect
> > even without the done message in fast shutdown mode, but I'm not sure
> > it's worthwhile.
> >
> 
> The main problem we want to solve here is to avoid shutdown failing in
> case walreceiver/applyworker is busy waiting for some lock or for some
> other reason as shown in the email [1]. I haven't tested it but if
> such a problem doesn't exist in smart shutdown mode then probably we
> can allow walsender to wait till all the data is sent.

Based on the idea, I made a PoC patch to introduce the smart shutdown to walsenders.
PSA 0002 patch. 0001 is not changed from v5.
When logical walsenders got shutdown request but their send buffer is full due to
the delay, they will:

* wait to complete to send data to subscriber if we are in smart shutdown mode
* exit immediately if we are in fast shutdown mode

Note that in both case, walsender does not wait the remote flush of WALs.

For implementing that, I added new attribute to WalSndCtlData that indicates the
shutdown status. Basically it is zero, but it will be changed by postmaster when
it gets request.


Best Regards,
Hayato Kuroda
FUJITSU LIMITED


Attachment

pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Time delayed LR (WAS Re: logical replication restrictions)
Next
From: Amit Langote
Date:
Subject: Re: generic plans and "initial" pruning