Re: Synchronous commit behavior during network outage - Mailing list pgsql-hackers

From Aleksander Alekseev
Subject Re: Synchronous commit behavior during network outage
Date
Msg-id CAJ7c6TM4RvuXiGQqfBa8G-s6eH8RMOG9yjM-WVo3ZwSPHG2xng@mail.gmail.com
Whole thread Raw
In response to Synchronous commit behavior during network outage  (Ondřej Žižka <ondrej.zizka@stratox.cz>)
Responses Re: Synchronous commit behavior during network outage  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Re: Synchronous commit behavior during network outage  (Maksim Milyutin <milyutinma@gmail.com>)
Re: Synchronous commit behavior during network outage  (Ondřej Žižka <ondrej.zizka@stratox.cz>)
List pgsql-hackers
Hi Ondřej,

Thanks for the report. It seems to be a clear violation of what is
promised in the docs. Although it's unlikely that someone implemented
an application which deals with important data and "pressed Ctr+C" as
it's done in psql. So this might be not such a critical issue after
all. BTW what version of PostgreSQL are you using?


On Mon, Apr 19, 2021 at 10:13 PM Ondřej Žižka <ondrej.zizka@stratox.cz> wrote:
>
> Hello all,
> I would like to know your opinion on the following behaviour I see for PostgreSQL setup with synchronous replication.
>
> This behaviour happens in a special use case. In this use case, there are 2 synchronous replicas with the following
config(truncated): 
>
> - 2 nodes
> - synchronous_standby_names='*'
> - synchronous_commit=remote_apply
>
>
> With this setup run the following steps (LAN down - LAN between master and replica):
> -----------------
> postgres=# truncate table a;
> TRUNCATE TABLE
> postgres=# insert into a values (1); -- LAN up, insert has been applied to replica.
> INSERT 0 1
> Vypnu LAN na serveru se standby:
> postgres=# insert into a values (2); --LAN down, waiting for a confirmation from sync replica. In this situation
cancelit (press CTRL+C) 
> ^CCancel request sent
> WARNING:  canceling wait for synchronous replication due to user request
> DETAIL:  The transaction has already committed locally, but might not have been replicated to the standby.
> INSERT 0 1
> There will be warning that commit was performed only locally:
> 2021-04-12 19:55:53.063 CEST [26104] WARNING:  canceling wait for synchronous replication due to user request
> 2021-04-12 19:55:53.063 CEST [26104] DETAIL:  The transaction has already committed locally, but might not have been
replicatedto the standby. 
>
> postgres=# insert into a values (2); --LAN down, waiting for a confirmation from sync replica. In this situation
cancelit (press CTRL+C) 
> ^CCancel request sent
> WARNING:  canceling wait for synchronous replication due to user request
> DETAIL:  The transaction has already committed locally, but might not have been replicated to the standby.
> INSERT 0 1
> postgres=# insert into a values (2); --LAN down, waiting for sync replica, second attempt, cancel it as well (CTRL+C)
> ^CCancel request sent
> WARNING:  canceling wait for synchronous replication due to user request
> DETAIL:  The transaction has already committed locally, but might not have been replicated to the standby.
> INSERT 0 1
> postgres=# update a set n=3 where n=2; --LAN down, waiting for sync replica, cancel it (CTRL+C)
> ^CCancel request sent
> WARNING:  canceling wait for synchronous replication due to user request
> DETAIL:  The transaction has already committed locally, but might not have been replicated to the standby.
> UPDATE 2
> postgres=# update a set n=3 where n=2; -- run the same update,because data from the previous attempt was commited on
master,it is sucessfull, but no changes 
> UPDATE 0
> postgres=# select * from a;
>  n
> ---
>  1
>  3
>  3
> (3 rows)
> postgres=#
> ------------------------
>
> Now, there is only value 1 in the sync replica table (no other values), data is not in sync. This is expected, after
theLAN restore, data will come sync again, but if the main/primary node will fail and we failover to replica before the
LANis back up or the storage for this node would be destroyed and data would not sync to replica before it, we will
losedata even if the client received successful commit (with a warning). 
> From the synchronous_commit=remote_write level and "higher", I would expect, that when the remote application
(doesn'tmatter if flush, write or apply) would not be applied I would not receive a confirmation about the commit (even
witha warning). Something like, if there is no commit from sync replica, there is no commit on primary and if someone
performsthe steps above, the whole transaction will not send a confirmation. 
>
> This can cause issues if the application receives a confirmation about the success and performs some follow-up steps
e.g.create a user account and sends a request to the mail system to create an account or create a VPN account. If the
scenarioabove happens, there can exist a VPN account that does not have any presence in the central database and can be
asecurity issue. 
>
> I hope I explained it sufficiently. :-)
>
> Do you think, that would be possible to implement a process that would solve this use case?
>
> Thank you
> Ondrej



--
Best regards,
Aleksander Alekseev



pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
Next
From: Daniel Carter
Date:
Subject: Re: PATCH: Add GSSAPI ccache_name option to libpq