Re: Synchronous commit behavior during network outage - Mailing list pgsql-hackers

From Andrey Borodin
Subject Re: Synchronous commit behavior during network outage
Date
Msg-id 4B0CD464-74FA-4030-B8CC-30881D97A799@yandex-team.ru
Whole thread Raw
In response to Re: Synchronous commit behavior during network outage  (Jeff Davis <pgsql@j-davis.com>)
Responses Re: Synchronous commit behavior during network outage
List pgsql-hackers

> 2 июля 2021 г., в 10:59, Jeff Davis <pgsql@j-davis.com> написал(а):
>
> On Wed, 2021-06-30 at 17:28 +0500, Andrey Borodin wrote:
>>> My patch also covers the backend termination case. Is there a
>>> reason
>>> you left that case out?
>>
>> Yes, backend termination is used by HA tool before rewinding the
>> node.
>
> Can't you just disable sync rep first (using ALTER SYSTEM SET
> synchronous_standby_names=''), which will unstick the backend, and then
> terminate it?
If the failover happens due to unresponsive node we cannot just turn off sync rep. We need to have some spare
connectionsfor that (number of stuck backends will skyrocket during network partitioning). We need available
descriptorsand some memory to fork new backend. We will need to re-read config. We need time to try after all. 
At some failures we may lack some of these.

Partial degradation is already hard task. Without ability to easily terminate running Postgres HA tool will often
resortto SIGKILL. 

>
> If you don't handle the termination case, then there's still a chance
> for the transaction to become visible to other clients before its
> replicated.
Termination is admin command, they know what they are doing.
Cancelation is part of user protocol.

BTW can we have two GUCs? So that HA tool developers will decide on their own which guaranties they provide?

>
>> There is one more caveat we need to fix: we should prevent instant
>> recovery from happening.
>
> That can already be done with the restart_after_crash GUC.

Oh, I didn't know it, we will use it. Thanks!


Best regards, Andrey Borodin.


pgsql-hackers by date:

Previous
From: Dilip Kumar
Date:
Subject: Re: Logical replication - schema change not invalidating the relation cache
Next
From: Haotian Wu
Date:
Subject: Re: Add option --drop-cascade for pg_dump/restore