Re: Avoiding data loss with synchronous replication - Mailing list pgsql-hackers

From Andrey Borodin
Subject Re: Avoiding data loss with synchronous replication
Date
Msg-id D46D857F-5465-4688-BD6C-280942D28C39@yandex-team.ru
Whole thread Raw
In response to Re: Avoiding data loss with synchronous replication  ("Bossart, Nathan" <bossartn@amazon.com>)
Responses Re: Avoiding data loss with synchronous replication  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers

> 23 июля 2021 г., в 22:54, Bossart, Nathan <bossartn@amazon.com> написал(а):
>
> On 7/23/21, 4:33 AM, "Andrey Borodin" <x4mmm@yandex-team.ru> wrote:
>> Thanks for you interest in the topic. I think in the thread [0] we almost agreed on general design.
>> The only left question is that we want to threat pg_ctl stop and kill SIGTERM differently to pg_terminate_backend().
>
> I didn't get the idea that there was a tremendous amount of support
> for the approach to block canceling waits for synchronous replication.
> FWIW this was my initial approach as well, but I've been trying to
> think of alternatives.
>
> If we can gather support for some variation of the block-cancels
> approach, I think that would be preferred over my proposal from a
> complexity standpoint.
Let's clearly enumerate problems of blocking.
It's been mentioned that backend is not responsive when cancelation is blocked. But on the contrary, it's very
responsive.

postgres=# alter system set synchronous_standby_names to 'bogus';
ALTER SYSTEM
postgres=# alter system set synchronous_commit_cancelation TO off ;
ALTER SYSTEM
postgres=# select pg_reload_conf();
2021-07-24 15:35:03.054 +05 [10452] LOG:  received SIGHUP, reloading configuration files
l
---
t
(1 row)
postgres=# begin;
BEGIN
postgres=*# insert into t1 values(0);
INSERT 0 1
postgres=*# commit ;
^CCancel request sent
WARNING:  canceling wait for synchronous replication requested, but cancelation is not allowed
DETAIL:  The COMMIT record has already flushed to WAL locally and might not have been replicated to the standby. We
mustwait here. 
^CCancel request sent
WARNING:  canceling wait for synchronous replication requested, but cancelation is not allowed
DETAIL:  The COMMIT record has already flushed to WAL locally and might not have been replicated to the standby. We
mustwait here. 

It tells clearly what's wrong. If it's still not enough, let's add hint about synchronous standby names.

Are there any other problems with blocking cancels?


> Robert's idea to provide a way to understand
> the intent of the cancellation/termination request [0] could improve
> matters.  Perhaps adding an argument to pg_cancel/terminate_backend()
> and using different signals to indicate that we want to cancel the
> wait would be something that folks could get on board with.

Semantics of cancelation assumes correct query interruption. This is not possible already when we committed locally.
Therecannot be any correct cancelation. And I don't think it worth to add incorrect cancelation. 


Interestingly, converting transaction to 2PC is a neat idea when the backend is terminated. It provides more guaranties
thattransaction will commit correctly even after restart. But we may be short of max_prepared_xacts slots... 
Anyway backend termination bothers me a lot less than cancelation - drivers do not terminate queries on their own. But
theycancel queries by default. 


Thanks!

Best regards, Andrey Borodin.


pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Incorrect usage of strtol, atoi for non-numeric junk inputs
Next
From: Andy Fan
Date:
Subject: Maintain the pathkesy for subquery from outer side information