Thread: Synchronous commit behavior during network outage

Synchronous commit behavior during network outage

From
Ondřej Žižka
Date:
Hello all,
I would like to know your opinion on the following behaviour I see for PostgreSQL setup with synchronous replication.

This behaviour happens in a special use case. In this use case, there are 2 synchronous replicas with the following config (truncated):

- 2 nodes
- synchronous_standby_names='*'
- synchronous_commit=remote_apply
With this setup run the following steps (LAN down - LAN between master and replica):
-----------------
postgres=# truncate table a;
TRUNCATE TABLE
postgres=# insert into a values (1); -- LAN up, insert has been applied to replica.
INSERT 0 1
Vypnu LAN na serveru se standby:
postgres=# insert into a values (2); --LAN down, waiting for a confirmation from sync replica. In this situation cancel it (press CTRL+C)
^CCancel request sent
WARNING:  canceling wait for synchronous replication due to user request
DETAIL:  The transaction has already committed locally, but might not have been replicated to the standby.
INSERT 0 1
There will be warning that commit was performed only locally:
2021-04-12 19:55:53.063 CEST [26104] WARNING:  canceling wait for synchronous replication due to user request
2021-04-12 19:55:53.063 CEST [26104] DETAIL:  The transaction has already committed locally, but might not have been replicated to the standby.
 
postgres=# insert into a values (2); --LAN down, waiting for a confirmation from sync replica. In this situation cancel it (press CTRL+C)
^CCancel request sent
WARNING:  canceling wait for synchronous replication due to user request
DETAIL:  The transaction has already committed locally, but might not have been replicated to the standby.
INSERT 0 1
postgres=# insert into a values (2); --LAN down, waiting for sync replica, second attempt, cancel it as well (CTRL+C)
^CCancel request sent
WARNING:  canceling wait for synchronous replication due to user request
DETAIL:  The transaction has already committed locally, but might not have been replicated to the standby.
INSERT 0 1
postgres=# update a set n=3 where n=2; --LAN down, waiting for sync replica, cancel it (CTRL+C)
^CCancel request sent
WARNING:  canceling wait for synchronous replication due to user request
DETAIL:  The transaction has already committed locally, but might not have been replicated to the standby.
UPDATE 2
postgres=# update a set n=3 where n=2; -- run the same update,because data from the previous attempt was commited on master, it is sucessfull, but no changes
UPDATE 0
postgres=# select * from a;
 n
---
 1
 3
 3
(3 rows)
postgres=#
------------------------

Now, there is only value 1 in the sync replica table (no other values), data is not in sync. This is expected, after the LAN restore, data will come sync again, but if the main/primary node will fail and we failover to replica before the LAN is back up or the storage for this node would be destroyed and data would not sync to replica before it, we will lose data even if the client received successful commit (with a warning).
From the synchronous_commit=remote_write level and "higher", I would expect, that when the remote application (doesn't matter if flush, write or apply) would not be applied I would not receive a confirmation about the commit (even with a warning). Something like, if there is no commit from sync replica, there is no commit on primary and if someone performs the steps above, the whole transaction will not send a confirmation.

This can cause issues if the application receives a confirmation about the success and performs some follow-up steps e.g. create a user account and sends a request to the mail system to create an account or create a VPN account. If the scenario above happens, there can exist a VPN account that does not have any presence in the central database and can be a security issue.

I hope I explained it sufficiently. :-)

Do you think, that would be possible to implement a process that would solve this use case?

Thank you
Ondrej

Re: Synchronous commit behavior during network outage

From
Aleksander Alekseev
Date:
Hi Ondřej,

Thanks for the report. It seems to be a clear violation of what is
promised in the docs. Although it's unlikely that someone implemented
an application which deals with important data and "pressed Ctr+C" as
it's done in psql. So this might be not such a critical issue after
all. BTW what version of PostgreSQL are you using?


On Mon, Apr 19, 2021 at 10:13 PM Ondřej Žižka <ondrej.zizka@stratox.cz> wrote:
>
> Hello all,
> I would like to know your opinion on the following behaviour I see for PostgreSQL setup with synchronous replication.
>
> This behaviour happens in a special use case. In this use case, there are 2 synchronous replicas with the following
config(truncated): 
>
> - 2 nodes
> - synchronous_standby_names='*'
> - synchronous_commit=remote_apply
>
>
> With this setup run the following steps (LAN down - LAN between master and replica):
> -----------------
> postgres=# truncate table a;
> TRUNCATE TABLE
> postgres=# insert into a values (1); -- LAN up, insert has been applied to replica.
> INSERT 0 1
> Vypnu LAN na serveru se standby:
> postgres=# insert into a values (2); --LAN down, waiting for a confirmation from sync replica. In this situation
cancelit (press CTRL+C) 
> ^CCancel request sent
> WARNING:  canceling wait for synchronous replication due to user request
> DETAIL:  The transaction has already committed locally, but might not have been replicated to the standby.
> INSERT 0 1
> There will be warning that commit was performed only locally:
> 2021-04-12 19:55:53.063 CEST [26104] WARNING:  canceling wait for synchronous replication due to user request
> 2021-04-12 19:55:53.063 CEST [26104] DETAIL:  The transaction has already committed locally, but might not have been
replicatedto the standby. 
>
> postgres=# insert into a values (2); --LAN down, waiting for a confirmation from sync replica. In this situation
cancelit (press CTRL+C) 
> ^CCancel request sent
> WARNING:  canceling wait for synchronous replication due to user request
> DETAIL:  The transaction has already committed locally, but might not have been replicated to the standby.
> INSERT 0 1
> postgres=# insert into a values (2); --LAN down, waiting for sync replica, second attempt, cancel it as well (CTRL+C)
> ^CCancel request sent
> WARNING:  canceling wait for synchronous replication due to user request
> DETAIL:  The transaction has already committed locally, but might not have been replicated to the standby.
> INSERT 0 1
> postgres=# update a set n=3 where n=2; --LAN down, waiting for sync replica, cancel it (CTRL+C)
> ^CCancel request sent
> WARNING:  canceling wait for synchronous replication due to user request
> DETAIL:  The transaction has already committed locally, but might not have been replicated to the standby.
> UPDATE 2
> postgres=# update a set n=3 where n=2; -- run the same update,because data from the previous attempt was commited on
master,it is sucessfull, but no changes 
> UPDATE 0
> postgres=# select * from a;
>  n
> ---
>  1
>  3
>  3
> (3 rows)
> postgres=#
> ------------------------
>
> Now, there is only value 1 in the sync replica table (no other values), data is not in sync. This is expected, after
theLAN restore, data will come sync again, but if the main/primary node will fail and we failover to replica before the
LANis back up or the storage for this node would be destroyed and data would not sync to replica before it, we will
losedata even if the client received successful commit (with a warning). 
> From the synchronous_commit=remote_write level and "higher", I would expect, that when the remote application
(doesn'tmatter if flush, write or apply) would not be applied I would not receive a confirmation about the commit (even
witha warning). Something like, if there is no commit from sync replica, there is no commit on primary and if someone
performsthe steps above, the whole transaction will not send a confirmation. 
>
> This can cause issues if the application receives a confirmation about the success and performs some follow-up steps
e.g.create a user account and sends a request to the mail system to create an account or create a VPN account. If the
scenarioabove happens, there can exist a VPN account that does not have any presence in the central database and can be
asecurity issue. 
>
> I hope I explained it sufficiently. :-)
>
> Do you think, that would be possible to implement a process that would solve this use case?
>
> Thank you
> Ondrej



--
Best regards,
Aleksander Alekseev



Re: Synchronous commit behavior during network outage

From
Tomas Vondra
Date:

On 4/20/21 6:23 PM, Aleksander Alekseev wrote:
> Hi Ondřej,
> 
> Thanks for the report. It seems to be a clear violation of what is
> promised in the docs. Although it's unlikely that someone implemented
> an application which deals with important data and "pressed Ctr+C" as
> it's done in psql. So this might be not such a critical issue after
> all. BTW what version of PostgreSQL are you using?
> 

Which part of the docs does this contradict?

With Ctrl+C the application *did not* receive confirmation - the commit
was interrupted before fully completing. In a way, it's about the same
situation as if a regular commit was interrupted randomly. It might have
happened before the commit log got updated, or maybe right after it,
which determines the outcome.

What I find a bit strange is that this inserts 1, 2, 2, 2 locally, and
yet we end up with just two rows with 2 (before the update). I don't see
why a network outage should have such consequence.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Synchronous commit behavior during network outage

From
Ondřej Žižka
Date:
Hello Aleksander,

Thank you for the reaction. This was tested on version 13.2.

There are also other possible situations with the same setup and similar 
issue:

-----------------
When the background process on server fails....

On postgresql1:
tecmint=# select * from a; --> LAN on sync replica is OK
  id
----
   1
(1 row)

tecmint=# insert into a values (2); ---> LAN on sync replica is DOWN and 
insert is waiting. During this time kill the background process on the 
PostgreSQL server for this session
WARNING:  canceling the wait for synchronous replication and terminating 
connection due to administrator command
DETAIL:  The transaction has already committed locally, but might not 
have been replicated to the standby.
server closed the connection unexpectedly
     This probably means the server terminated abnormally
     before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
tecmint=# select * from a;
  id
----
   1
   2
(2 rows)

tecmint=# ---> LAN on sync replica is still DOWN

The potgres session will restore after the background process failed. 
When you run select on master, it still looks OK. But data is still not 
replicated on the sync replica. If we lost the master now, we would lost 
this data as well.

**************
Another case
**************

Kill the client process.

tecmint=# select * from a;
  id
----
   1
   2
   3
(3 rows)
tecmint=#                --> Disconnect the sync replica now. LAN on 
replica is DOWN
tecmint=# insert into a values (4); --> Kill the client process
Terminated
xzizka@service-vm:~$ psql -U postgres -h 192.168.122.6 -p 5432 -d tecmint
Password for user postgres:
psql (13.2 (Debian 13.2-1.pgdg100+1))
Type "help" for help.

tecmint=# select * from a;
  id
----
   1
   2
   3
(3 rows)

tecmint=# --> Number 4 is not there. Now switch the LAN on sync replica ON.

----------

Result from sync replica after the LAN is again UP:
tecmint=# select * from a;
  id
----
   1
   2
   3
   4
(4 rows)


In this situation, try to insert the number 4 again to the table.

tecmint=# select * from a;
  id
----
   1
   2
   3
(3 rows)

tecmint=# insert into a values (4);
ERROR:  duplicate key value violates unique constraint "a_pkey"
DETAIL:  Key (id)=(4) already exists.
tecmint=#

This is really strange... Application can be confused, It is not 
possible to insert record, which is not there, but some systems which 
use the sync node as a read replica maybe already read that record from 
the sync replica database and done some steps which can cause issues and 
can be hard to track.

If I say, that it would be hard to send the CTRL+C to the database from 
the client, I need to say, that the 2 situations I described here can 
happen in real.

What do you think?

Thank you and regards
Ondrej

On 20/04/2021 17:23, Aleksander Alekseev wrote:
> Hi Ondřej,
>
> Thanks for the report. It seems to be a clear violation of what is
> promised in the docs. Although it's unlikely that someone implemented
> an application which deals with important data and "pressed Ctr+C" as
> it's done in psql. So this might be not such a critical issue after
> all. BTW what version of PostgreSQL are you using?
>
>
> On Mon, Apr 19, 2021 at 10:13 PM Ondřej Žižka <ondrej.zizka@stratox.cz> wrote:
>> Hello all,
>> I would like to know your opinion on the following behaviour I see for PostgreSQL setup with synchronous
replication.
>>
>> This behaviour happens in a special use case. In this use case, there are 2 synchronous replicas with the following
config(truncated):
 
>>
>> - 2 nodes
>> - synchronous_standby_names='*'
>> - synchronous_commit=remote_apply
>>
>>
>> With this setup run the following steps (LAN down - LAN between master and replica):
>> -----------------
>> postgres=# truncate table a;
>> TRUNCATE TABLE
>> postgres=# insert into a values (1); -- LAN up, insert has been applied to replica.
>> INSERT 0 1
>> Vypnu LAN na serveru se standby:
>> postgres=# insert into a values (2); --LAN down, waiting for a confirmation from sync replica. In this situation
cancelit (press CTRL+C)
 
>> ^CCancel request sent
>> WARNING:  canceling wait for synchronous replication due to user request
>> DETAIL:  The transaction has already committed locally, but might not have been replicated to the standby.
>> INSERT 0 1
>> There will be warning that commit was performed only locally:
>> 2021-04-12 19:55:53.063 CEST [26104] WARNING:  canceling wait for synchronous replication due to user request
>> 2021-04-12 19:55:53.063 CEST [26104] DETAIL:  The transaction has already committed locally, but might not have been
replicatedto the standby.
 
>>
>> postgres=# insert into a values (2); --LAN down, waiting for a confirmation from sync replica. In this situation
cancelit (press CTRL+C)
 
>> ^CCancel request sent
>> WARNING:  canceling wait for synchronous replication due to user request
>> DETAIL:  The transaction has already committed locally, but might not have been replicated to the standby.
>> INSERT 0 1
>> postgres=# insert into a values (2); --LAN down, waiting for sync replica, second attempt, cancel it as well
(CTRL+C)
>> ^CCancel request sent
>> WARNING:  canceling wait for synchronous replication due to user request
>> DETAIL:  The transaction has already committed locally, but might not have been replicated to the standby.
>> INSERT 0 1
>> postgres=# update a set n=3 where n=2; --LAN down, waiting for sync replica, cancel it (CTRL+C)
>> ^CCancel request sent
>> WARNING:  canceling wait for synchronous replication due to user request
>> DETAIL:  The transaction has already committed locally, but might not have been replicated to the standby.
>> UPDATE 2
>> postgres=# update a set n=3 where n=2; -- run the same update,because data from the previous attempt was commited on
master,it is sucessfull, but no changes
 
>> UPDATE 0
>> postgres=# select * from a;
>>   n
>> ---
>>   1
>>   3
>>   3
>> (3 rows)
>> postgres=#
>> ------------------------
>>
>> Now, there is only value 1 in the sync replica table (no other values), data is not in sync. This is expected, after
theLAN restore, data will come sync again, but if the main/primary node will fail and we failover to replica before the
LANis back up or the storage for this node would be destroyed and data would not sync to replica before it, we will
losedata even if the client received successful commit (with a warning).
 
>>  From the synchronous_commit=remote_write level and "higher", I would expect, that when the remote application
(doesn'tmatter if flush, write or apply) would not be applied I would not receive a confirmation about the commit (even
witha warning). Something like, if there is no commit from sync replica, there is no commit on primary and if someone
performsthe steps above, the whole transaction will not send a confirmation.
 
>>
>> This can cause issues if the application receives a confirmation about the success and performs some follow-up steps
e.g.create a user account and sends a request to the mail system to create an account or create a VPN account. If the
scenarioabove happens, there can exist a VPN account that does not have any presence in the central database and can be
asecurity issue.
 
>>
>> I hope I explained it sufficiently. :-)
>>
>> Do you think, that would be possible to implement a process that would solve this use case?
>>
>> Thank you
>> Ondrej
>
>



Re: Synchronous commit behavior during network outage

From
Maksim Milyutin
Date:
Hi!


This is a known issue with synchronous replication [1]. You might inject 
into unmodified operation some dummy modification to overcome the 
negative sides of such partially committing without source code patching.


On 20.04.2021 19:23, Aleksander Alekseev wrote:
> Although it's unlikely that someone implemented
> an application which deals with important data and "pressed Ctr+C" as
> it's done in psql.


Some client libraries have feature to cancel session that has similar 
effect to "Ctrl+C" from psql after specified by client deadline 
expiration [2]. Hence, this case might be quite often when application 
interacts with database.


> On Mon, Apr 19, 2021 at 10:13 PM Ondřej Žižka <ondrej.zizka@stratox.cz> wrote:
>
>  From the synchronous_commit=remote_write level and "higher", I would expect, that when the remote application
(doesn'tmatter if flush, write or apply) would not be applied I would not receive a confirmation about the commit (even
witha warning). Something like, if there is no commit from sync replica, there is no commit on primary and if someone
performsthe steps above, the whole transaction will not send a confirmation.
 


The warning have to be accounted here and performed commit have not to 
be treated as *successful*.


1. 
https://www.postgresql.org/message-id/C1F7905E-5DB2-497D-ABCC-E14D4DEE506C%40yandex-team.ru

2. 
https://www.postgresql.org/message-id/CANtu0ogbu%2By6Py963p-zKJ535b8zm5AOq7zkX7wW-tryPYi1DA%40mail.gmail.com


-- 
Regards,
Maksim Milyutin




Re: Synchronous commit behavior during network outage

From
Ondřej Žižka
Date:
Hello Maksim,

I know your post [1]. That thread is why there we performed more tests 
(see another my email in this thread). We are trying to somehow 
implement RPO=0 solution using PostgreSQL. Knowing this... Would be 
possible to build RPO=0 solution with PostgreSQL?

Ondrej

On 20/04/2021 18:51, Maksim Milyutin wrote:
> Hi!
>
>
> This is a known issue with synchronous replication [1]. You might 
> inject into unmodified operation some dummy modification to overcome 
> the negative sides of such partially committing without source code 
> patching.
>
>
> On 20.04.2021 19:23, Aleksander Alekseev wrote:
>> Although it's unlikely that someone implemented
>> an application which deals with important data and "pressed Ctr+C" as
>> it's done in psql.
>
>
> Some client libraries have feature to cancel session that has similar 
> effect to "Ctrl+C" from psql after specified by client deadline 
> expiration [2]. Hence, this case might be quite often when application 
> interacts with database.
>
>
>> On Mon, Apr 19, 2021 at 10:13 PM Ondřej Žižka 
>> <ondrej.zizka@stratox.cz> wrote:
>>
>>  From the synchronous_commit=remote_write level and "higher", I would 
>> expect, that when the remote application (doesn't matter if flush, 
>> write or apply) would not be applied I would not receive a 
>> confirmation about the commit (even with a warning). Something like, 
>> if there is no commit from sync replica, there is no commit on 
>> primary and if someone performs the steps above, the whole 
>> transaction will not send a confirmation.
>
>
> The warning have to be accounted here and performed commit have not to 
> be treated as *successful*.
>
>
> 1. 
> https://www.postgresql.org/message-id/C1F7905E-5DB2-497D-ABCC-E14D4DEE506C%40yandex-team.ru
>
> 2. 
> https://www.postgresql.org/message-id/CANtu0ogbu%2By6Py963p-zKJ535b8zm5AOq7zkX7wW-tryPYi1DA%40mail.gmail.com
>
>



Re: Synchronous commit behavior during network outage

From
Ondřej Žižka
Date:
I am sorry, I forgot mentioned, that in the second situation I added a 
primary key to the table.

Ondrej


On 20/04/2021 18:49, Ondřej Žižka wrote:
> Hello Aleksander,
>
> Thank you for the reaction. This was tested on version 13.2.
>
> There are also other possible situations with the same setup and 
> similar issue:
>
> -----------------
> When the background process on server fails....
>
> On postgresql1:
> tecmint=# select * from a; --> LAN on sync replica is OK
>  id
> ----
>   1
> (1 row)
>
> tecmint=# insert into a values (2); ---> LAN on sync replica is DOWN 
> and insert is waiting. During this time kill the background process on 
> the PostgreSQL server for this session
> WARNING:  canceling the wait for synchronous replication and 
> terminating connection due to administrator command
> DETAIL:  The transaction has already committed locally, but might not 
> have been replicated to the standby.
> server closed the connection unexpectedly
>     This probably means the server terminated abnormally
>     before or while processing the request.
> The connection to the server was lost. Attempting reset: Succeeded.
> tecmint=# select * from a;
>  id
> ----
>   1
>   2
> (2 rows)
>
> tecmint=# ---> LAN on sync replica is still DOWN
>
> The potgres session will restore after the background process failed. 
> When you run select on master, it still looks OK. But data is still 
> not replicated on the sync replica. If we lost the master now, we 
> would lost this data as well.
>
> **************
> Another case
> **************
>
> Kill the client process.
>
> tecmint=# select * from a;
>  id
> ----
>   1
>   2
>   3
> (3 rows)
> tecmint=#                --> Disconnect the sync replica now. LAN on 
> replica is DOWN
> tecmint=# insert into a values (4); --> Kill the client process
> Terminated
> xzizka@service-vm:~$ psql -U postgres -h 192.168.122.6 -p 5432 -d tecmint
> Password for user postgres:
> psql (13.2 (Debian 13.2-1.pgdg100+1))
> Type "help" for help.
>
> tecmint=# select * from a;
>  id
> ----
>   1
>   2
>   3
> (3 rows)
>
> tecmint=# --> Number 4 is not there. Now switch the LAN on sync 
> replica ON.
>
> ----------
>
> Result from sync replica after the LAN is again UP:
> tecmint=# select * from a;
>  id
> ----
>   1
>   2
>   3
>   4
> (4 rows)
>
>
> In this situation, try to insert the number 4 again to the table.
>
> tecmint=# select * from a;
>  id
> ----
>   1
>   2
>   3
> (3 rows)
>
> tecmint=# insert into a values (4);
> ERROR:  duplicate key value violates unique constraint "a_pkey"
> DETAIL:  Key (id)=(4) already exists.
> tecmint=#
>
> This is really strange... Application can be confused, It is not 
> possible to insert record, which is not there, but some systems which 
> use the sync node as a read replica maybe already read that record 
> from the sync replica database and done some steps which can cause 
> issues and can be hard to track.
>
> If I say, that it would be hard to send the CTRL+C to the database 
> from the client, I need to say, that the 2 situations I described here 
> can happen in real.
>
> What do you think?
>
> Thank you and regards
> Ondrej
>
> On 20/04/2021 17:23, Aleksander Alekseev wrote:
>> Hi Ondřej,
>>
>> Thanks for the report. It seems to be a clear violation of what is
>> promised in the docs. Although it's unlikely that someone implemented
>> an application which deals with important data and "pressed Ctr+C" as
>> it's done in psql. So this might be not such a critical issue after
>> all. BTW what version of PostgreSQL are you using?
>>
>>
>> On Mon, Apr 19, 2021 at 10:13 PM Ondřej Žižka 
>> <ondrej.zizka@stratox.cz> wrote:
>>> Hello all,
>>> I would like to know your opinion on the following behaviour I see 
>>> for PostgreSQL setup with synchronous replication.
>>>
>>> This behaviour happens in a special use case. In this use case, 
>>> there are 2 synchronous replicas with the following config (truncated):
>>>
>>> - 2 nodes
>>> - synchronous_standby_names='*'
>>> - synchronous_commit=remote_apply
>>>
>>>
>>> With this setup run the following steps (LAN down - LAN between 
>>> master and replica):
>>> -----------------
>>> postgres=# truncate table a;
>>> TRUNCATE TABLE
>>> postgres=# insert into a values (1); -- LAN up, insert has been 
>>> applied to replica.
>>> INSERT 0 1
>>> Vypnu LAN na serveru se standby:
>>> postgres=# insert into a values (2); --LAN down, waiting for a 
>>> confirmation from sync replica. In this situation cancel it (press 
>>> CTRL+C)
>>> ^CCancel request sent
>>> WARNING:  canceling wait for synchronous replication due to user 
>>> request
>>> DETAIL:  The transaction has already committed locally, but might 
>>> not have been replicated to the standby.
>>> INSERT 0 1
>>> There will be warning that commit was performed only locally:
>>> 2021-04-12 19:55:53.063 CEST [26104] WARNING:  canceling wait for 
>>> synchronous replication due to user request
>>> 2021-04-12 19:55:53.063 CEST [26104] DETAIL:  The transaction has 
>>> already committed locally, but might not have been replicated to the 
>>> standby.
>>>
>>> postgres=# insert into a values (2); --LAN down, waiting for a 
>>> confirmation from sync replica. In this situation cancel it (press 
>>> CTRL+C)
>>> ^CCancel request sent
>>> WARNING:  canceling wait for synchronous replication due to user 
>>> request
>>> DETAIL:  The transaction has already committed locally, but might 
>>> not have been replicated to the standby.
>>> INSERT 0 1
>>> postgres=# insert into a values (2); --LAN down, waiting for sync 
>>> replica, second attempt, cancel it as well (CTRL+C)
>>> ^CCancel request sent
>>> WARNING:  canceling wait for synchronous replication due to user 
>>> request
>>> DETAIL:  The transaction has already committed locally, but might 
>>> not have been replicated to the standby.
>>> INSERT 0 1
>>> postgres=# update a set n=3 where n=2; --LAN down, waiting for sync 
>>> replica, cancel it (CTRL+C)
>>> ^CCancel request sent
>>> WARNING:  canceling wait for synchronous replication due to user 
>>> request
>>> DETAIL:  The transaction has already committed locally, but might 
>>> not have been replicated to the standby.
>>> UPDATE 2
>>> postgres=# update a set n=3 where n=2; -- run the same 
>>> update,because data from the previous attempt was commited on 
>>> master, it is sucessfull, but no changes
>>> UPDATE 0
>>> postgres=# select * from a;
>>>   n
>>> ---
>>>   1
>>>   3
>>>   3
>>> (3 rows)
>>> postgres=#
>>> ------------------------
>>>
>>> Now, there is only value 1 in the sync replica table (no other 
>>> values), data is not in sync. This is expected, after the LAN 
>>> restore, data will come sync again, but if the main/primary node 
>>> will fail and we failover to replica before the LAN is back up or 
>>> the storage for this node would be destroyed and data would not sync 
>>> to replica before it, we will lose data even if the client received 
>>> successful commit (with a warning).
>>>  From the synchronous_commit=remote_write level and "higher", I 
>>> would expect, that when the remote application (doesn't matter if 
>>> flush, write or apply) would not be applied I would not receive a 
>>> confirmation about the commit (even with a warning). Something like, 
>>> if there is no commit from sync replica, there is no commit on 
>>> primary and if someone performs the steps above, the whole 
>>> transaction will not send a confirmation.
>>>
>>> This can cause issues if the application receives a confirmation 
>>> about the success and performs some follow-up steps e.g. create a 
>>> user account and sends a request to the mail system to create an 
>>> account or create a VPN account. If the scenario above happens, 
>>> there can exist a VPN account that does not have any presence in the 
>>> central database and can be a security issue.
>>>
>>> I hope I explained it sufficiently. :-)
>>>
>>> Do you think, that would be possible to implement a process that 
>>> would solve this use case?
>>>
>>> Thank you
>>> Ondrej
>>
>>



Re: Synchronous commit behavior during network outage

From
Maksim Milyutin
Date:
On 20.04.2021 19:38, Tomas Vondra wrote:
>
> On 4/20/21 6:23 PM, Aleksander Alekseev wrote:
>> Hi Ondřej,
>>
>> Thanks for the report. It seems to be a clear violation of what is
>> promised in the docs. Although it's unlikely that someone implemented
>> an application which deals with important data and "pressed Ctr+C" as
>> it's done in psql. So this might be not such a critical issue after
>> all. BTW what version of PostgreSQL are you using?
>>
> Which part of the docs does this contradict?


I think, Aleksandr refers to the following phrase in docs:

"The guarantee we offer is that the application will not receive 
explicit acknowledgment of the successful commit of a transaction until 
the WAL data is known to be safely received by all the synchronous 
standbys." [1]

And IMO confusing here regards to the notion of `successful commit`. 
Does warning attached to received commit message make it not 
*successful*? I think we have to explicitly mention cases about 
cancellation and termination session in docs to avoid ambiguity in 
understanding of phrase above.


1. 
https://www.postgresql.org/docs/current/warm-standby.html#SYNCHRONOUS-REPLICATION-HA

-- 
Regards,
Maksim Milyutin




Re: Synchronous commit behavior during network outage

From
SATYANARAYANA NARLAPURAM
Date:
One idea here is to make the backend ignore query cancellation/backend termination while waiting for the synchronous commit ACK. This way client never reads the data that was never flushed remotely. The problem with this approach is that your backends get stuck until your commit log record is flushed on the remote side. Also, the client can see the data not flushed remotely if the server crashes and comes back online. You can prevent the latter case by making a SyncRepWaitForLSN before opening up the connections to the non-superusers. I have a working prototype of this logic, if there is enough interest I can post the patch.





On Tue, Apr 20, 2021 at 11:25 AM Ondřej Žižka <ondrej.zizka@stratox.cz> wrote:
I am sorry, I forgot mentioned, that in the second situation I added a
primary key to the table.

Ondrej


On 20/04/2021 18:49, Ondřej Žižka wrote:
> Hello Aleksander,
>
> Thank you for the reaction. This was tested on version 13.2.
>
> There are also other possible situations with the same setup and
> similar issue:
>
> -----------------
> When the background process on server fails....
>
> On postgresql1:
> tecmint=# select * from a; --> LAN on sync replica is OK
>  id
> ----
>   1
> (1 row)
>
> tecmint=# insert into a values (2); ---> LAN on sync replica is DOWN
> and insert is waiting. During this time kill the background process on
> the PostgreSQL server for this session
> WARNING:  canceling the wait for synchronous replication and
> terminating connection due to administrator command
> DETAIL:  The transaction has already committed locally, but might not
> have been replicated to the standby.
> server closed the connection unexpectedly
>     This probably means the server terminated abnormally
>     before or while processing the request.
> The connection to the server was lost. Attempting reset: Succeeded.
> tecmint=# select * from a;
>  id
> ----
>   1
>   2
> (2 rows)
>
> tecmint=# ---> LAN on sync replica is still DOWN
>
> The potgres session will restore after the background process failed.
> When you run select on master, it still looks OK. But data is still
> not replicated on the sync replica. If we lost the master now, we
> would lost this data as well.
>
> **************
> Another case
> **************
>
> Kill the client process.
>
> tecmint=# select * from a;
>  id
> ----
>   1
>   2
>   3
> (3 rows)
> tecmint=#                --> Disconnect the sync replica now. LAN on
> replica is DOWN
> tecmint=# insert into a values (4); --> Kill the client process
> Terminated
> xzizka@service-vm:~$ psql -U postgres -h 192.168.122.6 -p 5432 -d tecmint
> Password for user postgres:
> psql (13.2 (Debian 13.2-1.pgdg100+1))
> Type "help" for help.
>
> tecmint=# select * from a;
>  id
> ----
>   1
>   2
>   3
> (3 rows)
>
> tecmint=# --> Number 4 is not there. Now switch the LAN on sync
> replica ON.
>
> ----------
>
> Result from sync replica after the LAN is again UP:
> tecmint=# select * from a;
>  id
> ----
>   1
>   2
>   3
>   4
> (4 rows)
>
>
> In this situation, try to insert the number 4 again to the table.
>
> tecmint=# select * from a;
>  id
> ----
>   1
>   2
>   3
> (3 rows)
>
> tecmint=# insert into a values (4);
> ERROR:  duplicate key value violates unique constraint "a_pkey"
> DETAIL:  Key (id)=(4) already exists.
> tecmint=#
>
> This is really strange... Application can be confused, It is not
> possible to insert record, which is not there, but some systems which
> use the sync node as a read replica maybe already read that record
> from the sync replica database and done some steps which can cause
> issues and can be hard to track.
>
> If I say, that it would be hard to send the CTRL+C to the database
> from the client, I need to say, that the 2 situations I described here
> can happen in real.
>
> What do you think?
>
> Thank you and regards
> Ondrej
>
> On 20/04/2021 17:23, Aleksander Alekseev wrote:
>> Hi Ondřej,
>>
>> Thanks for the report. It seems to be a clear violation of what is
>> promised in the docs. Although it's unlikely that someone implemented
>> an application which deals with important data and "pressed Ctr+C" as
>> it's done in psql. So this might be not such a critical issue after
>> all. BTW what version of PostgreSQL are you using?
>>
>>
>> On Mon, Apr 19, 2021 at 10:13 PM Ondřej Žižka
>> <ondrej.zizka@stratox.cz> wrote:
>>> Hello all,
>>> I would like to know your opinion on the following behaviour I see
>>> for PostgreSQL setup with synchronous replication.
>>>
>>> This behaviour happens in a special use case. In this use case,
>>> there are 2 synchronous replicas with the following config (truncated):
>>>
>>> - 2 nodes
>>> - synchronous_standby_names='*'
>>> - synchronous_commit=remote_apply
>>>
>>>
>>> With this setup run the following steps (LAN down - LAN between
>>> master and replica):
>>> -----------------
>>> postgres=# truncate table a;
>>> TRUNCATE TABLE
>>> postgres=# insert into a values (1); -- LAN up, insert has been
>>> applied to replica.
>>> INSERT 0 1
>>> Vypnu LAN na serveru se standby:
>>> postgres=# insert into a values (2); --LAN down, waiting for a
>>> confirmation from sync replica. In this situation cancel it (press
>>> CTRL+C)
>>> ^CCancel request sent
>>> WARNING:  canceling wait for synchronous replication due to user
>>> request
>>> DETAIL:  The transaction has already committed locally, but might
>>> not have been replicated to the standby.
>>> INSERT 0 1
>>> There will be warning that commit was performed only locally:
>>> 2021-04-12 19:55:53.063 CEST [26104] WARNING:  canceling wait for
>>> synchronous replication due to user request
>>> 2021-04-12 19:55:53.063 CEST [26104] DETAIL:  The transaction has
>>> already committed locally, but might not have been replicated to the
>>> standby.
>>>
>>> postgres=# insert into a values (2); --LAN down, waiting for a
>>> confirmation from sync replica. In this situation cancel it (press
>>> CTRL+C)
>>> ^CCancel request sent
>>> WARNING:  canceling wait for synchronous replication due to user
>>> request
>>> DETAIL:  The transaction has already committed locally, but might
>>> not have been replicated to the standby.
>>> INSERT 0 1
>>> postgres=# insert into a values (2); --LAN down, waiting for sync
>>> replica, second attempt, cancel it as well (CTRL+C)
>>> ^CCancel request sent
>>> WARNING:  canceling wait for synchronous replication due to user
>>> request
>>> DETAIL:  The transaction has already committed locally, but might
>>> not have been replicated to the standby.
>>> INSERT 0 1
>>> postgres=# update a set n=3 where n=2; --LAN down, waiting for sync
>>> replica, cancel it (CTRL+C)
>>> ^CCancel request sent
>>> WARNING:  canceling wait for synchronous replication due to user
>>> request
>>> DETAIL:  The transaction has already committed locally, but might
>>> not have been replicated to the standby.
>>> UPDATE 2
>>> postgres=# update a set n=3 where n=2; -- run the same
>>> update,because data from the previous attempt was commited on
>>> master, it is sucessfull, but no changes
>>> UPDATE 0
>>> postgres=# select * from a;
>>>   n
>>> ---
>>>   1
>>>   3
>>>   3
>>> (3 rows)
>>> postgres=#
>>> ------------------------
>>>
>>> Now, there is only value 1 in the sync replica table (no other
>>> values), data is not in sync. This is expected, after the LAN
>>> restore, data will come sync again, but if the main/primary node
>>> will fail and we failover to replica before the LAN is back up or
>>> the storage for this node would be destroyed and data would not sync
>>> to replica before it, we will lose data even if the client received
>>> successful commit (with a warning).
>>>  From the synchronous_commit=remote_write level and "higher", I
>>> would expect, that when the remote application (doesn't matter if
>>> flush, write or apply) would not be applied I would not receive a
>>> confirmation about the commit (even with a warning). Something like,
>>> if there is no commit from sync replica, there is no commit on
>>> primary and if someone performs the steps above, the whole
>>> transaction will not send a confirmation.
>>>
>>> This can cause issues if the application receives a confirmation
>>> about the success and performs some follow-up steps e.g. create a
>>> user account and sends a request to the mail system to create an
>>> account or create a VPN account. If the scenario above happens,
>>> there can exist a VPN account that does not have any presence in the
>>> central database and can be a security issue.
>>>
>>> I hope I explained it sufficiently. :-)
>>>
>>> Do you think, that would be possible to implement a process that
>>> would solve this use case?
>>>
>>> Thank you
>>> Ondrej
>>
>>


Re: Synchronous commit behavior during network outage

From
Ondřej Žižka
Date:
Hello Satyanarayana,

This can be an option for us in our case. But there also needs to be a process how to detect these "stuck commits" and how to invalidate/remove them, because in reality, if the app/user would not see the change in the database, it/he/she will try to insert/delete it again. If it just stuck without management, it will create a queue which can cause, that in the queue there will be 2 similar inserts/deletes which can again cause issues (like with the primary key I mentioned before).

So the process should be in this case:

- DBA receives information, that write operations stuck (DBA in coordination with the infrastructure team disconnects all clients and prevent new ones to create a new connection).
- DBA will recognize, that there is an issue in communication between the primary and the sync replica (caused the issue with the propagation of commits)
- DBA will see that there are some commits that are in the "stuck state"
- DBA removes these stuck commits. Note: Because the client never received a confirmation about the successful commit -> changes in the DB client tried to perform can't be considered as successful.
- DBA and infrastructure team restore the communication between server nodes to be able to propagate commits from the primary node to sync replica.
- DBA and infrastructure team allows new connections to the database

This approach would require external monitoring and alerting, but I would say, that this is an acceptable solution. Would your patch be able to perform that?

Thank you
Ondrej


On 20/04/2021 22:19, SATYANARAYANA NARLAPURAM wrote:
One idea here is to make the backend ignore query cancellation/backend termination while waiting for the synchronous commit ACK. This way client never reads the data that was never flushed remotely. The problem with this approach is that your backends get stuck until your commit log record is flushed on the remote side. Also, the client can see the data not flushed remotely if the server crashes and comes back online. You can prevent the latter case by making a SyncRepWaitForLSN before opening up the connections to the non-superusers. I have a working prototype of this logic, if there is enough interest I can post the patch.





On Tue, Apr 20, 2021 at 11:25 AM Ondřej Žižka <ondrej.zizka@stratox.cz> wrote:
I am sorry, I forgot mentioned, that in the second situation I added a
primary key to the table.

Ondrej


On 20/04/2021 18:49, Ondřej Žižka wrote:
> Hello Aleksander,
>
> Thank you for the reaction. This was tested on version 13.2.
>
> There are also other possible situations with the same setup and
> similar issue:
>
> -----------------
> When the background process on server fails....
>
> On postgresql1:
> tecmint=# select * from a; --> LAN on sync replica is OK
>  id
> ----
>   1
> (1 row)
>
> tecmint=# insert into a values (2); ---> LAN on sync replica is DOWN
> and insert is waiting. During this time kill the background process on
> the PostgreSQL server for this session
> WARNING:  canceling the wait for synchronous replication and
> terminating connection due to administrator command
> DETAIL:  The transaction has already committed locally, but might not
> have been replicated to the standby.
> server closed the connection unexpectedly
>     This probably means the server terminated abnormally
>     before or while processing the request.
> The connection to the server was lost. Attempting reset: Succeeded.
> tecmint=# select * from a;
>  id
> ----
>   1
>   2
> (2 rows)
>
> tecmint=# ---> LAN on sync replica is still DOWN
>
> The potgres session will restore after the background process failed.
> When you run select on master, it still looks OK. But data is still
> not replicated on the sync replica. If we lost the master now, we
> would lost this data as well.
>
> **************
> Another case
> **************
>
> Kill the client process.
>
> tecmint=# select * from a;
>  id
> ----
>   1
>   2
>   3
> (3 rows)
> tecmint=#                --> Disconnect the sync replica now. LAN on
> replica is DOWN
> tecmint=# insert into a values (4); --> Kill the client process
> Terminated
> xzizka@service-vm:~$ psql -U postgres -h 192.168.122.6 -p 5432 -d tecmint
> Password for user postgres:
> psql (13.2 (Debian 13.2-1.pgdg100+1))
> Type "help" for help.
>
> tecmint=# select * from a;
>  id
> ----
>   1
>   2
>   3
> (3 rows)
>
> tecmint=# --> Number 4 is not there. Now switch the LAN on sync
> replica ON.
>
> ----------
>
> Result from sync replica after the LAN is again UP:
> tecmint=# select * from a;
>  id
> ----
>   1
>   2
>   3
>   4
> (4 rows)
>
>
> In this situation, try to insert the number 4 again to the table.
>
> tecmint=# select * from a;
>  id
> ----
>   1
>   2
>   3
> (3 rows)
>
> tecmint=# insert into a values (4);
> ERROR:  duplicate key value violates unique constraint "a_pkey"
> DETAIL:  Key (id)=(4) already exists.
> tecmint=#
>
> This is really strange... Application can be confused, It is not
> possible to insert record, which is not there, but some systems which
> use the sync node as a read replica maybe already read that record
> from the sync replica database and done some steps which can cause
> issues and can be hard to track.
>
> If I say, that it would be hard to send the CTRL+C to the database
> from the client, I need to say, that the 2 situations I described here
> can happen in real.
>
> What do you think?
>
> Thank you and regards
> Ondrej
>
> On 20/04/2021 17:23, Aleksander Alekseev wrote:
>> Hi Ondřej,
>>
>> Thanks for the report. It seems to be a clear violation of what is
>> promised in the docs. Although it's unlikely that someone implemented
>> an application which deals with important data and "pressed Ctr+C" as
>> it's done in psql. So this might be not such a critical issue after
>> all. BTW what version of PostgreSQL are you using?
>>
>>
>> On Mon, Apr 19, 2021 at 10:13 PM Ondřej Žižka
>> <ondrej.zizka@stratox.cz> wrote:
>>> Hello all,
>>> I would like to know your opinion on the following behaviour I see
>>> for PostgreSQL setup with synchronous replication.
>>>
>>> This behaviour happens in a special use case. In this use case,
>>> there are 2 synchronous replicas with the following config (truncated):
>>>
>>> - 2 nodes
>>> - synchronous_standby_names='*'
>>> - synchronous_commit=remote_apply
>>>
>>>
>>> With this setup run the following steps (LAN down - LAN between
>>> master and replica):
>>> -----------------
>>> postgres=# truncate table a;
>>> TRUNCATE TABLE
>>> postgres=# insert into a values (1); -- LAN up, insert has been
>>> applied to replica.
>>> INSERT 0 1
>>> Vypnu LAN na serveru se standby:
>>> postgres=# insert into a values (2); --LAN down, waiting for a
>>> confirmation from sync replica. In this situation cancel it (press
>>> CTRL+C)
>>> ^CCancel request sent
>>> WARNING:  canceling wait for synchronous replication due to user
>>> request
>>> DETAIL:  The transaction has already committed locally, but might
>>> not have been replicated to the standby.
>>> INSERT 0 1
>>> There will be warning that commit was performed only locally:
>>> 2021-04-12 19:55:53.063 CEST [26104] WARNING:  canceling wait for
>>> synchronous replication due to user request
>>> 2021-04-12 19:55:53.063 CEST [26104] DETAIL:  The transaction has
>>> already committed locally, but might not have been replicated to the
>>> standby.
>>>
>>> postgres=# insert into a values (2); --LAN down, waiting for a
>>> confirmation from sync replica. In this situation cancel it (press
>>> CTRL+C)
>>> ^CCancel request sent
>>> WARNING:  canceling wait for synchronous replication due to user
>>> request
>>> DETAIL:  The transaction has already committed locally, but might
>>> not have been replicated to the standby.
>>> INSERT 0 1
>>> postgres=# insert into a values (2); --LAN down, waiting for sync
>>> replica, second attempt, cancel it as well (CTRL+C)
>>> ^CCancel request sent
>>> WARNING:  canceling wait for synchronous replication due to user
>>> request
>>> DETAIL:  The transaction has already committed locally, but might
>>> not have been replicated to the standby.
>>> INSERT 0 1
>>> postgres=# update a set n=3 where n=2; --LAN down, waiting for sync
>>> replica, cancel it (CTRL+C)
>>> ^CCancel request sent
>>> WARNING:  canceling wait for synchronous replication due to user
>>> request
>>> DETAIL:  The transaction has already committed locally, but might
>>> not have been replicated to the standby.
>>> UPDATE 2
>>> postgres=# update a set n=3 where n=2; -- run the same
>>> update,because data from the previous attempt was commited on
>>> master, it is sucessfull, but no changes
>>> UPDATE 0
>>> postgres=# select * from a;
>>>   n
>>> ---
>>>   1
>>>   3
>>>   3
>>> (3 rows)
>>> postgres=#
>>> ------------------------
>>>
>>> Now, there is only value 1 in the sync replica table (no other
>>> values), data is not in sync. This is expected, after the LAN
>>> restore, data will come sync again, but if the main/primary node
>>> will fail and we failover to replica before the LAN is back up or
>>> the storage for this node would be destroyed and data would not sync
>>> to replica before it, we will lose data even if the client received
>>> successful commit (with a warning).
>>>  From the synchronous_commit=remote_write level and "higher", I
>>> would expect, that when the remote application (doesn't matter if
>>> flush, write or apply) would not be applied I would not receive a
>>> confirmation about the commit (even with a warning). Something like,
>>> if there is no commit from sync replica, there is no commit on
>>> primary and if someone performs the steps above, the whole
>>> transaction will not send a confirmation.
>>>
>>> This can cause issues if the application receives a confirmation
>>> about the success and performs some follow-up steps e.g. create a
>>> user account and sends a request to the mail system to create an
>>> account or create a VPN account. If the scenario above happens,
>>> there can exist a VPN account that does not have any presence in the
>>> central database and can be a security issue.
>>>
>>> I hope I explained it sufficiently. :-)
>>>
>>> Do you think, that would be possible to implement a process that
>>> would solve this use case?
>>>
>>> Thank you
>>> Ondrej
>>
>>


Re: Synchronous commit behavior during network outage

From
Aleksander Alekseev
Date:
Hi Timas,

> > Thanks for the report. It seems to be a clear violation of what is
> > promised in the docs. Although it's unlikely that someone implemented
> > an application which deals with important data and "pressed Ctr+C" as
> > it's done in psql. So this might be not such a critical issue after
> > all. BTW what version of PostgreSQL are you using?
> >
>
> Which part of the docs does this contradict?

The documentation to synchronous_commit = remote_apply explicitly states [1]:

"""
When set to remote_apply, commits will wait until replies from the
current synchronous standby(s) indicate they have received the commit
record of the transaction and applied it, so that it has become
visible to queries on the standby(s), and also written to durable
storage on the standbys.
"""

Here commit on the master happened before receiving replies from the standby(s).

[1]: https://www.postgresql.org/docs/13/runtime-config-wal.html#GUC-SYNCHRONOUS-COMMIT

-- 
Best regards,
Aleksander Alekseev



Re: Synchronous commit behavior during network outage

From
Laurenz Albe
Date:
On Tue, 2021-04-20 at 18:49 +0100, Ondřej Žižka wrote:
> tecmint=# select * from a; --> LAN on sync replica is OK
>   id
> ----
>    1
> (1 row)
> 
> tecmint=# insert into a values (2); ---> LAN on sync replica is DOWN and 
> insert is waiting. During this time kill the background process on the 
> PostgreSQL server for this session
> WARNING:  canceling the wait for synchronous replication and terminating 
> connection due to administrator command
> DETAIL:  The transaction has already committed locally, but might not 
> have been replicated to the standby.
> server closed the connection unexpectedly
>      This probably means the server terminated abnormally
>      before or while processing the request.
> The connection to the server was lost. Attempting reset: Succeeded.
> 
> tecmint=# select * from a;
>   id
> ----
>    1
>    2
> (2 rows)

It is well known that synchronous replication is sublect to that problem,
since it doesn't use the two-phase commit protocol.

What surprises me is that this is a warning.
In my opinion it should be an error.

Yours,
Laurenz Albe




Re: Synchronous commit behavior during network outage

From
Pavel Stehule
Date:


st 21. 4. 2021 v 9:51 odesílatel Laurenz Albe <laurenz.albe@cybertec.at> napsal:
On Tue, 2021-04-20 at 18:49 +0100, Ondřej Žižka wrote:
> tecmint=# select * from a; --> LAN on sync replica is OK
>   id
> ----
>    1
> (1 row)
>
> tecmint=# insert into a values (2); ---> LAN on sync replica is DOWN and
> insert is waiting. During this time kill the background process on the
> PostgreSQL server for this session
> WARNING:  canceling the wait for synchronous replication and terminating
> connection due to administrator command
> DETAIL:  The transaction has already committed locally, but might not
> have been replicated to the standby.
> server closed the connection unexpectedly
>      This probably means the server terminated abnormally
>      before or while processing the request.
> The connection to the server was lost. Attempting reset: Succeeded.
>
> tecmint=# select * from a;
>   id
> ----
>    1
>    2
> (2 rows)

It is well known that synchronous replication is sublect to that problem,
since it doesn't use the two-phase commit protocol.

What surprises me is that this is a warning.
In my opinion it should be an error.

yes, the an error has more sense

Regards

Pavel


Yours,
Laurenz Albe



Re: Synchronous commit behavior during network outage

From
SATYANARAYANA NARLAPURAM
Date:
This can be an option for us in our case. But there also needs to be a process how to detect these "stuck commits" and how to invalidate/remove them, because in reality, if the app/user would not see the change in the database, it/he/she will try to insert/delete it again. If it just stuck without management, it will create a queue which can cause, that in the queue there will be 2 similar inserts/deletes which can again cause issues (like with the primary key I mentioned before).

 This shouldn't be a problem as the previous transaction is still holding the locks and the new transaction is blocked behind this. Outside of the sync replication, this can happen today too with glitches/timeouts/ retries between the client and the server. Am I missing something?


So the process should be in this case:

- DBA receives information, that write operations stuck (DBA in coordination with the infrastructure team disconnects all clients and prevent new ones to create a new connection).
You can monitor the pg_stat_activity for the SYNC_REP_WAIT_FLUSH wait types to detect this.
 
- DBA will recognize, that there is an issue in communication between the primary and the sync replica (caused the issue with the propagation of commits)
- DBA will see that there are some commits that are in the "stuck state"
- DBA removes these stuck commits. Note: Because the client never received a confirmation about the successful commit -> changes in the DB client tried to perform can't be considered as successful.

You should consider these as in doubt transactions and the client should retry. Again, this can happen in a normal server crash case too. For example, a transaction committed on the server and before sending the acknowledgement crashed.  The client should know how to handle these cases.

- DBA and infrastructure team restore the communication between server nodes to be able to propagate commits from the primary node to sync replica.
- DBA and infrastructure team allows new connections to the database

This approach would require external monitoring and alerting, but I would say, that this is an acceptable solution. Would your patch be able to perform that?
My patch handles ignoring the cancel events. I ended up keeping the other logic (blocking super user connections in the client_authentication_hook.

There is a third problem that I didn't talk about in this thread where the async clients (including logical decoding and replication clients) can get ahead of the new primary and there is no easier way to undo those changes. For this problem, we need to implement some protocol in the WAL sender where it sends the log to the consumer only up to the flush LSN of the standby/quorum replicas. This is something I am working on right now.

Re: Synchronous commit behavior during network outage

From
Ondřej Žižka
Date:

Hello,

> You can monitor the pg_stat_activity for the SYNC_REP_WAIT_FLUSH wait types to detect this.

I tried to see this this wait_event_type Client or IPC and wait_event Client_Read or SyncRep. In which situation I can see the SYNC_REP_WAIT_FLUSH value?

> You should consider these as in doubt transactions and the client should retry. Again, this can happen in a normal server crash case too. For example, a transaction committed on the server and before sending the acknowledgement crashed.  *The client should know how to handle these cases.*

I have just a light knowledge of the in-doubt transaction. Need to study more about it, but in real world the client is mostly 'stupid' and does expect only COMMIT or ROLLBACK. Nothing between.

> There is a third problem that I didn't talk about in this thread where the async clients (including logical decoding and replication clients) can get ahead of the new primary and there is no easier way to undo those changes. For this problem, we need to implement some protocol in the WAL sender where it sends the log to the consumer only up to the flush LSN of the standby/quorum replicas. This is something I am working on right now.

We setup and architecture where are 4 nodes and Patroni as a cluster manager. Two nodes are sync an each sync node has 1 async. In case something like this happen (e.g. network to sync replica fails and user press the CTRL+C), the async replica receives the transaction and apply it. If the outage is longer than some time (30s by default), management software checks the LSN and create a new sync replica from the ASYNC replica.

Ondrej

You should consider these as in doubt transactions and the client should retry. Again, this can happen in a normal server crash case too. For example, a transaction committed on the server and before sending the acknowledgement crashed.  The client should know how to handle these cases
On 21/04/2021 09:20, SATYANARAYANA NARLAPURAM wrote:
This can be an option for us in our case. But there also needs to be a process how to detect these "stuck commits" and how to invalidate/remove them, because in reality, if the app/user would not see the change in the database, it/he/she will try to insert/delete it again. If it just stuck without management, it will create a queue which can cause, that in the queue there will be 2 similar inserts/deletes which can again cause issues (like with the primary key I mentioned before).

 This shouldn't be a problem as the previous transaction is still holding the locks and the new transaction is blocked behind this. Outside of the sync replication, this can happen today too with glitches/timeouts/ retries between the client and the server. Am I missing something?


So the process should be in this case:

- DBA receives information, that write operations stuck (DBA in coordination with the infrastructure team disconnects all clients and prevent new ones to create a new connection).
You can monitor the pg_stat_activity for the SYNC_REP_WAIT_FLUSH wait types to detect this.
 
- DBA will recognize, that there is an issue in communication between the primary and the sync replica (caused the issue with the propagation of commits)
- DBA will see that there are some commits that are in the "stuck state"
- DBA removes these stuck commits. Note: Because the client never received a confirmation about the successful commit -> changes in the DB client tried to perform can't be considered as successful.

You should consider these as in doubt transactions and the client should retry. Again, this can happen in a normal server crash case too. For example, a transaction committed on the server and before sending the acknowledgement crashed.  The client should know how to handle these cases.

- DBA and infrastructure team restore the communication between server nodes to be able to propagate commits from the primary node to sync replica.
- DBA and infrastructure team allows new connections to the database

This approach would require external monitoring and alerting, but I would say, that this is an acceptable solution. Would your patch be able to perform that?
My patch handles ignoring the cancel events. I ended up keeping the other logic (blocking super user connections in the client_authentication_hook.

There is a third problem that I didn't talk about in this thread where the async clients (including logical decoding and replication clients) can get ahead of the new primary and there is no easier way to undo those changes. For this problem, we need to implement some protocol in the WAL sender where it sends the log to the consumer only up to the flush LSN of the standby/quorum replicas. This is something I am working on right now.

Re: Synchronous commit behavior during network outage

From
Andrey Borodin
Date:
Hi Ondrej!

> 19 апр. 2021 г., в 22:19, Ondřej Žižka <ondrej.zizka@stratox.cz> написал(а):
>
> Do you think, that would be possible to implement a process that would solve this use case?
> Thank you
> Ondrej
>

Feel free to review patch fixing this at [0]. It's classified as "Server Features", but I'm sure it's a bug fix.

Yandex.Cloud PG runs with this patch for more than half a year. Because we cannot afford loosing data in HA clusters.

It's somewhat incomplete solution, because PG restart or crash recovery will make waiting transactions visible. But we
protectfrom this on HA tool's side. 

Best regards, Andrey Borodin.

[0] https://commitfest.postgresql.org/33/2402/


Re: Synchronous commit behavior during network outage

From
Ondřej Žižka
Date:
Hello Andrey,

I went through the thread for your patch and seems to me as an 
acceptable solution...

 > The only case patch does not handle is sudden backend crash - 
Postgres will recover without a restart.

We also use a HA tool (Patroni). If the whole machine fails, it will 
find a new master and it should be OK. We use a 4 node setup (2 sync 
replicas and 1 async from every replica). If there is an issue just with 
sync replica (async operated normally) and the master fails completely 
in this situation, it will be solved by Patroni (the async replica 
become another sync), but if it is just the backend process, the master 
will not failover and changes will be still visible...

If the sync replica outage is temporal it will be solved itself when the 
node will establish a replication slot again... If the outage is "long", 
Patroni will remove the "old" sync replica from the cluster and the 
async replica reading from the master would be new sync. So yes... In 2 
node setup, this can be an issue, but in 4 node setup, this seems to me 
like a solution.
The only situation I can imagine is a situation when the client 
connections use a different network than the replication network and the 
replication network would be down completely, but the client network 
will be up. In that case, the master can be an "isolated island" and if 
it fails, we can lose the changed data.
Is this situation also covered in your model: "transaction effects 
should not be observable on primary until requirements of 
synchronous_commit are satisfied."

Do you agree with my thoughts?

Maybe would be possible to implement it into PostgreSQL with a note in 
documentation, that a multinode (>=3 nodes) cluster is necessary.

Regards
Ondrej

On 22/04/2021 05:55, Andrey Borodin wrote:

> Hi Ondrej!
>
>> 19 апр. 2021 г., в 22:19, Ondřej Žižka <ondrej.zizka@stratox.cz> написал(а):
>>
>> Do you think, that would be possible to implement a process that would solve this use case?
>> Thank you
>> Ondrej
>>
> Feel free to review patch fixing this at [0]. It's classified as "Server Features", but I'm sure it's a bug fix.
>
> Yandex.Cloud PG runs with this patch for more than half a year. Because we cannot afford loosing data in HA
clusters.
>
> It's somewhat incomplete solution, because PG restart or crash recovery will make waiting transactions visible. But
weprotect from this on HA tool's side.
 
>
> Best regards, Andrey Borodin.
>
> [0] https://commitfest.postgresql.org/33/2402/



Re: Synchronous commit behavior during network outage

From
Andrey Borodin
Date:
Thanks for reviewing Ondřej!

> 26 апр. 2021 г., в 22:01, Ondřej Žižka <ondrej.zizka@stratox.cz> написал(а):
>
> Hello Andrey,
>
> I went through the thread for your patch and seems to me as an acceptable solution...
>
> > The only case patch does not handle is sudden backend crash - Postgres will recover without a restart.
>
> We also use a HA tool (Patroni). If the whole machine fails, it will find a new master and it should be OK. We use a
4node setup (2 sync replicas and 1 async from every replica). If there is an issue just with sync replica (async
operatednormally) and the master fails completely in this situation, it will be solved by Patroni (the async replica
becomeanother sync), but if it is just the backend process, the master will not failover and changes will be still
visible...
>
> If the sync replica outage is temporal it will be solved itself when the node will establish a replication slot
again...If the outage is "long", Patroni will remove the "old" sync replica from the cluster and the async replica
readingfrom the master would be new sync. So yes... In 2 node setup, this can be an issue, but in 4 node setup, this
seemsto me like a solution. 
> The only situation I can imagine is a situation when the client connections use a different network than the
replicationnetwork and the replication network would be down completely, but the client network will be up. In that
case,the master can be an "isolated island" and if it fails, we can lose the changed data. 
It is, in fact, very common type of network partition.

> Is this situation also covered in your model: "transaction effects should not be observable on primary until
requirementsof synchronous_commit are satisfied." 
Yes. If synchronous_commit_cancelation = off, no backend crash occurs and HA tool does not start PostgreSQL service
whenin doubt that other primary may exists. 

> Do you agree with my thoughts?
I could not understand your reasoning about 2 and 4 nodes. Can you please clarify a bit how 4 node setup can help
preventvisibility of commited-locall-but-canceled transactions? 

I do not think we can classify network partitions as "temporal" and "long". Due to the distributed nature of the system
networkpartitions are eternal and momentary. Simultaneously. And if the node A can access node B and node C, this
neitherimplies B can access C, nor B can access A. 

> Maybe would be possible to implement it into PostgreSQL with a note in documentation, that a multinode (>=3 nodes)
clusteris necessary. 
PostgreSQL does not provide and fault detection and automatic failover. Documenting anything wrt failover is the
responsibilityof HA tool. 

Thanks!

Best regards, Andrey Borodin.





Re: Synchronous commit behavior during network outage

From
Ondřej Žižka
Date:
On 06/05/2021 06:09, Andrey Borodin wrote:
> I could not understand your reasoning about 2 and 4 nodes. Can you please clarify a bit how 4 node setup can help
preventvisibility of commited-locall-but-canceled transactions?
 
Hello Andrey,

The initial request (for us) was to have a geo cluster with 2 locations 
where would be possible to have 2 sync replicas even in case of failure 
of one location. This means to have 2 nodes in every location (4 
together). If one location fails completely (broken network connection), 
Patroni will choose the working location (5 node etcd in 3 locations to 
ensure this).

In the initial state, there is 1 sync replica in each location and one 
async replica in each location using as a source the sync replica in its 
location.
Let's have the following initial situation:
1) Nodes pg11 and pg12 are in one location nodes pg21 and pg22 are in 
another location.
2) Nodes pg11 and pg21 are in sync replica
3) Node pg12 is an async replica from pg11
4) Node pg22 is an async replica from pg21
5) Master is pg11.

When the commited-locally-but-canceled situation happens and there is a 
problem only with node pg21 (not with the network between nodes), the 
async replica pg12 will receive the local commit from pg11 just after 
the local commit on pg11 even if the cancellation happens. So there will 
be a situation when the commit is present on both pg11 and pg12. If the 
pg11 fails, the transaction already exists on pg12 and this node will be 
selected as a new leader (latest LSN).

There is a period between the time it is committed and the time it will 
have been sent to the async replica when we can lose data, but I expect 
this in milliseconds (maybe less).

It will not prevent visibility but will ensure, that the data would not 
be lost and in that case, data can be visible on the leader even if they 
are not present on the sync replica because there is ensured the 
continuity of the data persistence in the async replica.

I hope I explained it understandably.

Regards
Ondrej




Re: Synchronous commit behavior during network outage

From
Jeff Davis
Date:
On Tue, 2021-04-20 at 14:19 -0700, SATYANARAYANA NARLAPURAM wrote:
> One idea here is to make the backend ignore query
> cancellation/backend termination while waiting for the synchronous
> commit ACK. This way client never reads the data that was never
> flushed remotely. The problem with this approach is that your
> backends get stuck until your commit log record is flushed on the
> remote side. Also, the client can see the data not flushed remotely
> if the server crashes and comes back online. You can prevent the
> latter case by making a SyncRepWaitForLSN before opening up the
> connections to the non-superusers. I have a working prototype of this
> logic, if there is enough interest I can post the patch.

I didn't see a patch here yet, so I wrote a simple one for
consideration (attached).

The problem exists for both cancellation and termination requests. The
patch adds a GUC that makes SyncRepWaitForLSN keep waiting. It does not
ignore the requests; for instance, a termination request will still be
honored when it's done waiting for sync rep.

The idea of this GUC is not to wait forever (obviously), but to allow
the administrator (or an automated network agent) to be in control of
the logic:

If the primary is non-responsive, the administrator can decide to fail
over, knowing that all visible transactions on the primary are durable
on the standby (because any transaction that didn't make it to the
standby also didn't release locks yet). If the standby is non-
responsive, the administrator can intervene with something like:

    ALTER SYSTEM SET synchronous_standby_names = '';
    SELECT pg_reload_conf();

which will disable sync rep, allowing the primary to complete the query
and continue on without the standby; but in that case the admin must be
sure not to fail over until there's a new standby fully caught-up.

The patch may be somewhat controversial, so I'll wait for feedback
before documenting it properly.

Regards,
    Jeff Davis


Attachment

Re: Synchronous commit behavior during network outage

From
Andrey Borodin
Date:

> 29 июня 2021 г., в 03:56, Jeff Davis <pgsql@j-davis.com> написал(а):
>
> The patch may be somewhat controversial, so I'll wait for feedback
> before documenting it properly.

The patch seems similar to [0]. But I like your wording :)
I'd be happy if we go with any version of these idea.

Best regards, Andrey Borodin.


[0]https://commitfest.postgresql.org/33/2402/




Re: Synchronous commit behavior during network outage

From
Jeff Davis
Date:
On Tue, 2021-06-29 at 11:48 +0500, Andrey Borodin wrote:
> > 29 июня 2021 г., в 03:56, Jeff Davis <pgsql@j-davis.com>
> > написал(а):
> > 
> > The patch may be somewhat controversial, so I'll wait for feedback
> > before documenting it properly.
> 
> The patch seems similar to [0]. But I like your wording :)
> I'd be happy if we go with any version of these idea.

Thank you, somehow I missed that one, we should combine the CF entries.

My patch also covers the backend termination case. Is there a reason
you left that case out?

Regards,
    Jeff Davis





Re: Synchronous commit behavior during network outage

From
Andrey Borodin
Date:

> 29 июня 2021 г., в 23:35, Jeff Davis <pgsql@j-davis.com> написал(а):
>
> On Tue, 2021-06-29 at 11:48 +0500, Andrey Borodin wrote:
>>> 29 июня 2021 г., в 03:56, Jeff Davis <pgsql@j-davis.com>
>>> написал(а):
>>>
>>> The patch may be somewhat controversial, so I'll wait for feedback
>>> before documenting it properly.
>>
>> The patch seems similar to [0]. But I like your wording :)
>> I'd be happy if we go with any version of these idea.
>
> Thank you, somehow I missed that one, we should combine the CF entries.
>
> My patch also covers the backend termination case. Is there a reason
> you left that case out?
Yes, backend termination is used by HA tool before rewinding the node. Initially I was considering termination as PANIC
andgot a ton of coredumps during failovers on drills. 

There is one more caveat we need to fix: we should prevent instant recovery from happening. HA tool must know that our
processwas restarted.  
Consider following scenario:
1. Node A is primary with sync rep.
2. A is going through network partitioning, somewhere node B is promoted.
3. All backends of A are stuck in sync rep, until HA tool discovers A is failed node.
4. One backend crashes with segfault in some buggy extension or OOM or whatever
5. Postgres server is doing restartless crash recovery making local-but-not-replicated data visible.

We should prevent 5 also as we prevent cancels. HA tool will discover postmaster fail and will recheck in coordinatino
systemthat it can raise up Postgres locally. 

Thanks!

Best regards, Andrey Borodin.


Re: Synchronous commit behavior during network outage

From
Jeff Davis
Date:
On Wed, 2021-06-30 at 17:28 +0500, Andrey Borodin wrote:
> > My patch also covers the backend termination case. Is there a
> > reason
> > you left that case out?
> 
> Yes, backend termination is used by HA tool before rewinding the
> node.

Can't you just disable sync rep first (using ALTER SYSTEM SET
synchronous_standby_names=''), which will unstick the backend, and then
terminate it?

If you don't handle the termination case, then there's still a chance
for the transaction to become visible to other clients before its
replicated.

> There is one more caveat we need to fix: we should prevent instant
> recovery from happening.

That can already be done with the restart_after_crash GUC.

Regards,
    Jeff Davis





Re: Synchronous commit behavior during network outage

From
Andrey Borodin
Date:

> 2 июля 2021 г., в 10:59, Jeff Davis <pgsql@j-davis.com> написал(а):
>
> On Wed, 2021-06-30 at 17:28 +0500, Andrey Borodin wrote:
>>> My patch also covers the backend termination case. Is there a
>>> reason
>>> you left that case out?
>>
>> Yes, backend termination is used by HA tool before rewinding the
>> node.
>
> Can't you just disable sync rep first (using ALTER SYSTEM SET
> synchronous_standby_names=''), which will unstick the backend, and then
> terminate it?
If the failover happens due to unresponsive node we cannot just turn off sync rep. We need to have some spare
connectionsfor that (number of stuck backends will skyrocket during network partitioning). We need available
descriptorsand some memory to fork new backend. We will need to re-read config. We need time to try after all. 
At some failures we may lack some of these.

Partial degradation is already hard task. Without ability to easily terminate running Postgres HA tool will often
resortto SIGKILL. 

>
> If you don't handle the termination case, then there's still a chance
> for the transaction to become visible to other clients before its
> replicated.
Termination is admin command, they know what they are doing.
Cancelation is part of user protocol.

BTW can we have two GUCs? So that HA tool developers will decide on their own which guaranties they provide?

>
>> There is one more caveat we need to fix: we should prevent instant
>> recovery from happening.
>
> That can already be done with the restart_after_crash GUC.

Oh, I didn't know it, we will use it. Thanks!


Best regards, Andrey Borodin.


Re: Synchronous commit behavior during network outage

From
Jeff Davis
Date:
On Fri, 2021-07-02 at 11:39 +0500, Andrey Borodin wrote:
> If the failover happens due to unresponsive node we cannot just turn
> off sync rep. We need to have some spare connections for that (number
> of stuck backends will skyrocket during network partitioning). We
> need available descriptors and some memory to fork new backend. We
> will need to re-read config. We need time to try after all.
> At some failures we may lack some of these.

I think it's a good point that, when things start to go wrong, they can
go very wrong very quickly.

But until you've disabled sync rep, the primary will essentially be
down for writes whether using this new feature or not. Even if you can
terminate some backends to try to free space, the application will just
make new connections that will get stuck the same way.

You can avoid the "fork backend" problem by keeping a connection always
open from the HA tool, or by editing the conf to disable sync rep and
issuing SIGHUP instead. Granted, that still takes some memory.

> Partial degradation is already hard task. Without ability to easily
> terminate running Postgres HA tool will often resort to SIGKILL.

When the system is really wedged as you describe (waiting on sync rep,
tons of connections, and low memory), what information do you expect
the HA tool to be able to collect, and what actions do you expect it to
take?

Presumably, you'd want it to disable sync rep at some point to get back
online. Where does SIGTERM fit into the picture?

> > If you don't handle the termination case, then there's still a
> > chance
> > for the transaction to become visible to other clients before its
> > replicated.
> 
> Termination is admin command, they know what they are doing.
> Cancelation is part of user protocol.

From the pg_terminate_backend() docs: "This is also allowed if the
calling role is a member of the role whose backend is being terminated
or the calling role has been granted pg_signal_backend", so it's not
really an admin command. Even for an admin, it might be hard to
understand why terminating a backend could result in losing a visible
transaction.

I'm not really seeing two use cases here for two GUCs. Are you sure you
want to disable only cancels but allow termination to proceed?

Regards,
    Jeff Davis





Re: Synchronous commit behavior during network outage

From
Andrey Borodin
Date:

> 3 июля 2021 г., в 01:15, Jeff Davis <pgsql@j-davis.com> написал(а):
>
> On Fri, 2021-07-02 at 11:39 +0500, Andrey Borodin wrote:
>> If the failover happens due to unresponsive node we cannot just turn
>> off sync rep. We need to have some spare connections for that (number
>> of stuck backends will skyrocket during network partitioning). We
>> need available descriptors and some memory to fork new backend. We
>> will need to re-read config. We need time to try after all.
>> At some failures we may lack some of these.
>
> I think it's a good point that, when things start to go wrong, they can
> go very wrong very quickly.
>
> But until you've disabled sync rep, the primary will essentially be
> down for writes whether using this new feature or not. Even if you can
> terminate some backends to try to free space, the application will just
> make new connections that will get stuck the same way.
Surely I'm talking about terminating postmaster, not individual backends. But postmaster will need to terminate each
runningquery. 
We surely need to have a way to stop whole instance without making any single query. And I do not like kill -9 for this
purpose.

>
> You can avoid the "fork backend" problem by keeping a connection always
> open from the HA tool, or by editing the conf to disable sync rep and
> issuing SIGHUP instead. Granted, that still takes some memory.
>
>> Partial degradation is already hard task. Without ability to easily
>> terminate running Postgres HA tool will often resort to SIGKILL.
>
> When the system is really wedged as you describe (waiting on sync rep,
> tons of connections, and low memory), what information do you expect
> the HA tool to be able to collect, and what actions do you expect it to
> take?
HA tool is not going to collect anything. It just calls pg_ctl stop [0] or it's equivalent.

>
> Presumably, you'd want it to disable sync rep at some point to get back
> online. Where does SIGTERM fit into the picture?

HA tool is going to terminate running instance, rewind it, switch to new timeline and enroll into cluster again as
standby.

>
>>> If you don't handle the termination case, then there's still a
>>> chance
>>> for the transaction to become visible to other clients before its
>>> replicated.
>>
>> Termination is admin command, they know what they are doing.
>> Cancelation is part of user protocol.
>
> From the pg_terminate_backend() docs: "This is also allowed if the
> calling role is a member of the role whose backend is being terminated
> or the calling role has been granted pg_signal_backend", so it's not
> really an admin command. Even for an admin, it might be hard to
> understand why terminating a backend could result in losing a visible
> transaction.
Ok, I see backend termination is not described as admin command.
We cannot prevent user from doing stupid things, they are able to delete their data anyway.

> I'm not really seeing two use cases here for two GUCs. Are you sure you
> want to disable only cancels but allow termination to proceed?

Yes, I'm sure. I had been running production with disabled termination for some weeks. cluster reparation was much
slower.For some reason kill-9-ed instances were successfully rewound much less often. But maybe I've done something
wrong.

If we can stop whole instance the same way as we did without activating proposed GUC - there is no any problem.

Thanks!

Best regards, Andrey Borodin.

[0] https://github.com/zalando/patroni/blob/master/patroni/postgresql/postmaster.py#L155


Re: Synchronous commit behavior during network outage

From
Jeff Davis
Date:
On Sat, 2021-07-03 at 14:06 +0500, Andrey Borodin wrote:
> > But until you've disabled sync rep, the primary will essentially be
> > down for writes whether using this new feature or not. Even if you
> > can
> > terminate some backends to try to free space, the application will
> > just
> > make new connections that will get stuck the same way.
> 
> Surely I'm talking about terminating postmaster, not individual
> backends. But postmaster will need to terminate each running query.
> We surely need to have a way to stop whole instance without making
> any single query. And I do not like kill -9 for this purpose.

kill -6 would suffice.

I see the point that you don't want this to interfere with an
administrative shutdown. But it seems like most shutdowns will need to
escalate to SIGABRT for cases where things are going badly wrong (low
memory, etc.) anyway. I don't see a better solution here.

I don't fully understand why you'd be concerned about cancellation but
not concerned about similar problems with termination, but if you think
two GUCs are important I can do that.

Regards,
    Jeff Davis






Re: Synchronous commit behavior during network outage

From
Andrey Borodin
Date:

> 3 июля 2021 г., в 23:44, Jeff Davis <pgsql@j-davis.com> написал(а):
>
> On Sat, 2021-07-03 at 14:06 +0500, Andrey Borodin wrote:
>>> But until you've disabled sync rep, the primary will essentially be
>>> down for writes whether using this new feature or not. Even if you
>>> can
>>> terminate some backends to try to free space, the application will
>>> just
>>> make new connections that will get stuck the same way.
>>
>> Surely I'm talking about terminating postmaster, not individual
>> backends. But postmaster will need to terminate each running query.
>> We surely need to have a way to stop whole instance without making
>> any single query. And I do not like kill -9 for this purpose.
>
> kill -6 would suffice.
SIGABRT is expected to generate a core dump, isn't it? Node failover is somewhat expected state in HA system.

>
> I see the point that you don't want this to interfere with an
> administrative shutdown. But it seems like most shutdowns will need to
> escalate to SIGABRT for cases where things are going badly wrong (low
> memory, etc.) anyway. I don't see a better solution here.
In my experience SIGTERM coped fine so far.

> I don't fully understand why you'd be concerned about cancellation but
> not concerned about similar problems with termination, but if you think
> two GUCs are important I can do that.
I think 2 GUCs is a better solution than 1 GUC disabling both cancelation and termination.
It would be great if some other HA tool developers would chime in.

Thanks!

Best regards, Andrey Borodin.


Re: Synchronous commit behavior during network outage

From
Jeff Davis
Date:
On Fri, 2021-07-09 at 23:10 +0500, Andrey Borodin wrote:
> In my experience SIGTERM coped fine so far.

OK. I don't think ignoring SIGTERM in the way my patch does it is a
great solution, and it's not getting much support, so I think I'll back
away from that idea.

I had a separate discussion with Andres, and he made a distinction
between explicit vs. implicit actions. For instance, an explicit
SIGTERM or SIGINT should not be ignored (or the functions that cause
those to happen); but if we are waiting for sync rep then it might be
OK to ignore a cancel caused by statement_timeout or a termination due
to a network disconnect.

Separately, I'm taking a vacation. Since there are two versions of the
patch floating around, I will withdraw mine.

Regards,
    Jeff Davis