RE: Exit walsender before confirming remote flush in logical replication - Mailing list pgsql-hackers

From Hayato Kuroda (Fujitsu)
Subject RE: Exit walsender before confirming remote flush in logical replication
Date
Msg-id TYAPR01MB5866027B7A1FA4D07E6DC813F5C19@TYAPR01MB5866.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: Exit walsender before confirming remote flush in logical replication  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
List pgsql-hackers
Dear Horiguchi-san, Amit,

> At Fri, 13 Jan 2023 16:41:08 +0530, Amit Kapila <amit.kapila16@gmail.com>
> wrote in
> > Okay, but what happens in the case of physical replication when
> > synchronous_commit = remote_apply? In that case, won't it ensure that
> > apply has also happened? If so, then shouldn't the time delay feature
> > also cause a similar problem for physical replication as well?
>
> As written in another mail, WalSndDone doesn't honor
> synchronous_commit. In other words, AFAIS walsender finishes not
> waiting remote_apply.  The unapplied recods will be applied at the
> next startup.
>
> I didn't confirmed that behavior for myself, though..

If Amit wanted to say about the case that sending data is pending in physical
replication, the walsender cannot stop. But this is not related with the
synchronous_commit: it is caused because it must sweep all pending data before
shutting down. We can reproduce the situation with:

1. build streaming replication system
2. kill -STOP $walreceiver
3. insert data to primary server
4. try to stop the primary server

If what you said was not related with pending data, walsender can be stopped even
if the synchronous_commit = remote_apply. As Horiguchi-san said, such a condition
is not written in WalSndDone() [1]. I think the parameter synchronous_commit does
not affect walsender process so well. It just define when backend returns the
result to client.

I could check by following steps:

1. built streaming replication system. PSA the script to follow that.

Primary config.

```
synchronous_commit = 'remote_apply'
synchronous_standby_names = 'secondary'
```

Secondary config.

```
recovery_min_apply_delay = 1d
primary_conninfo = 'user=postgres port=$port_N1 application_name=secondary'
hot_standby = on
```

2. inserted data to primary. This waited the remote apply

psql -U postgres -p $port_primary -c "INSERT INTO tbl SELECT generate_series(1, 5000)"

3. Stopped the primary server from another terminal. It could be done.
The terminal on step2 said like:

```
WARNING:  canceling the wait for synchronous replication and terminating connection due to administrator command
DETAIL:  The transaction has already committed locally, but might not have been replicated to the standby.
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
connection to server was lost
```

[1]: https://github.com/postgres/postgres/blob/master/src/backend/replication/walsender.c#L3121

Best Regards,
Hayato Kuroda
FUJITSU LIMITED


Attachment

pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: Refactor recordExtObjInitPriv()
Next
From: "Hayato Kuroda (Fujitsu)"
Date:
Subject: RE: Exit walsender before confirming remote flush in logical replication