RE: Exit walsender before confirming remote flush in logical replication - Mailing list pgsql-hackers

From Hayato Kuroda (Fujitsu)
Subject RE: Exit walsender before confirming remote flush in logical replication
Date
Msg-id TYAPR01MB5866CCD2C21790FEBE944034F5E99@TYAPR01MB5866.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: Exit walsender before confirming remote flush in logical replication  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Responses RE: Exit walsender before confirming remote flush in logical replication
List pgsql-hackers
Dear Horiguchi-san,

> Thus how about before entering an apply_delay, logrep worker sending a
> kind of crafted feedback, which reports commit_data.end_lsn as
> flushpos?  A little tweak is needed in send_feedback() but seems to
> work..

Thanks for replying! I tested your saying but it could not work well...

I made PoC based on the latest time-delayed patches [1] for non-streaming case.
Apply workers that are delaying applications send begin_data.final_lsn as recvpos and flushpos in send_feedback().

Followings were contents of the feedback message I got, and we could see that recv and flush were overwritten.

```
DEBUG:  sending feedback (force 1) to recv 0/1553638, write 0/1553550, flush 0/1553638
CONTEXT:  processing remote data for replication origin "pg_16390" during message type "BEGIN" in transaction 730,
finishedat 0/1553638 
```

In terms of walsender, however, sentPtr seemed to be slightly larger than flushed position on subscriber.

```
(gdb) p MyWalSnd->sentPtr
$2 = 22361760
(gdb) p MyWalSnd->flush
$3 = 22361656
(gdb) p *MyWalSnd
$4 = {pid = 28807, state = WALSNDSTATE_STREAMING, sentPtr = 22361760, needreload = false, write = 22361656,
  flush = 22361656, apply = 22361424, writeLag = 20020343, flushLag = 20020343, applyLag = 20020343,
  sync_standby_priority = 0, mutex = 0 '\000', latch = 0x7ff0350cbb94, replyTime = 725113263592095}
```

Therefore I could not shut down the publisher node when applications were delaying.
Do you have any opinions about them?

```
$ pg_ctl stop -D data_pub/
waiting for server to shut down............................................................... failed
pg_ctl: server does not shut down
```

[1]:
https://www.postgresql.org/message-id/TYCPR01MB83730A3E21E921335F6EFA38EDE89@TYCPR01MB8373.jpnprd01.prod.outlook.com 

Best Regards,
Hayato Kuroda
FUJITSU LIMITED




pgsql-hackers by date:

Previous
From: David Rowley
Date:
Subject: Re: Avoid lost result of recursion (src/backend/optimizer/util/inherit.c)
Next
From: Alvaro Herrera
Date:
Subject: Re: daitch_mokotoff module