Thread: Disallow cancellation of waiting for synchronous replication

Disallow cancellation of waiting for synchronous replication

From
Andrey Borodin
Date:
Hi, hackers!

This is continuation of thread [0] in pgsql-general with proposed changes. As Maksim pointed out, this topic was raised
beforehere [1]. 

Currently, we can have split brain with the combination of following steps:
0. Setup cluster with synchronous replication. Isolate primary from standbys.
1. Issue upsert query INSERT .. ON CONFLICT DO NOTHING
2. CANCEL 1 during wait for synchronous replication
3. Retry 1. Idempotent query will succeed and client have confirmation of written data, while it is not present on any
standby.

Thread [0] contain reproduction from psql.

In certain situations we cannot avoid cancelation of timed out queries. Yes, we can interpret warnings and thread them
aserrors, but warning is emitted on step 1, not on step 3. 

I think proper solution here would be to add GUC to disallow cancellation of synchronous replication. Retry step 3 will
waiton locks after hanging 1 and data will be consistent. 
Three is still a problem when backend is not canceled, but terminated [2]. Ideal solution would be to keep locks on
changeddata. Some well known databases threat termination of synchronous replication as system failure and refuse to
operateuntil standbys appear (see Maximum Protection mode). From my point of view it's enough to PANIC once so that HA
toolbe informed that something is going wrong. 
Anyway situation with cancelation is more dangerous. We've observed it in some user cases.

Please find attached draft of proposed change.

Best regards, Andrey Borodin.

[0] https://www.postgresql.org/message-id/flat/B70260F9-D0EC-438D-9A59-31CB996B320A%40yandex-team.ru
[1] https://www.postgresql.org/message-id/flat/CAEET0ZHG5oFF7iEcbY6TZadh1mosLmfz1HLm311P9VOt7Z%2Bjeg%40mail.gmail.com
[2] https://www.postgresql.org/docs/current/warm-standby.html#SYNCHRONOUS-REPLICATION-HA


Attachment

Re: Disallow cancellation of waiting for synchronous replication

From
Marco Slot
Date:
On Fri, Dec 20, 2019 at 6:04 AM Andrey Borodin <x4mmm@yandex-team.ru> wrote:
> I think proper solution here would be to add GUC to disallow cancellation of synchronous replication. Retry step 3
willwait on locks after hanging 1 and data will be consistent. 
> Three is still a problem when backend is not canceled, but terminated [2]. Ideal solution would be to keep locks on
changeddata. Some well known databases threat termination of synchronous replication as system failure and refuse to
operateuntil standbys appear (see Maximum Protection mode). From my point of view it's enough to PANIC once so that HA
toolbe informed that something is going wrong. 

Sending a cancellation is currently the only way to resume after
disabling synchronous replication. Some HA solutions (e.g.
pg_auto_failover) rely on this behaviour. Would it be worth checking
whether synchronous replication is still required?

Marco



Re: Disallow cancellation of waiting for synchronous replication

From
Andrey Borodin
Date:

> 20 дек. 2019 г., в 12:23, Marco Slot <marco@citusdata.com> написал(а):
>
> On Fri, Dec 20, 2019 at 6:04 AM Andrey Borodin <x4mmm@yandex-team.ru> wrote:
>> I think proper solution here would be to add GUC to disallow cancellation of synchronous replication. Retry step 3
willwait on locks after hanging 1 and data will be consistent. 
>> Three is still a problem when backend is not canceled, but terminated [2]. Ideal solution would be to keep locks on
changeddata. Some well known databases threat termination of synchronous replication as system failure and refuse to
operateuntil standbys appear (see Maximum Protection mode). From my point of view it's enough to PANIC once so that HA
toolbe informed that something is going wrong. 
>
> Sending a cancellation is currently the only way to resume after
> disabling synchronous replication. Some HA solutions (e.g.
> pg_auto_failover) rely on this behaviour. Would it be worth checking
> whether synchronous replication is still required?

I think changing synchronous_standby_names to some available standbys will resume all backends waiting for synchronous
replication.
Do we need to check necessity of synchronous replication in any other case?

Best regards, Andrey Borodin.


Re: Disallow cancellation of waiting for synchronous replication

From
Tom Lane
Date:
Andrey Borodin <x4mmm@yandex-team.ru> writes:
> I think proper solution here would be to add GUC to disallow cancellation of synchronous replication.

This sounds entirely insane to me.  There is no possibility that you
can prevent a failure from occurring at this step.

> Three is still a problem when backend is not canceled, but terminated [2].

Exactly.  If you don't have a fix that handles that case, you don't have
anything.  In fact, you've arguably made things worse, by increasing the
temptation to terminate or "kill -9" the nonresponsive session.

            regards, tom lane



Re: Disallow cancellation of waiting for synchronous replication

From
Michael Paquier
Date:
On Fri, Dec 20, 2019 at 03:07:26PM +0500, Andrey Borodin wrote:
>> Sending a cancellation is currently the only way to resume after
>> disabling synchronous replication. Some HA solutions (e.g.
>> pg_auto_failover) rely on this behaviour. Would it be worth checking
>> whether synchronous replication is still required?
>
> I think changing synchronous_standby_names to some available
> standbys will resume all backends waiting for synchronous
> replication.  Do we need to check necessity of synchronous
> replication in any other case?

Yeah, I am not on board with the concept of this thread.  Depending
on your HA configuration you can also reset synchronous_standby_names
after a certain small-ish threshold has been reached in WAL to get at
the same result by disabling synchronous replication, though your
cluster cannot perform safely a failover so you need to keep track of
that state.  Something which would be useful is to improve some cases
where you still want to use synchronous replication by switching to a
different standby.  I recall that sometimes that can be rather slow..
--
Michael

Attachment

Re: Disallow cancellation of waiting for synchronous replication

From
Marco Slot
Date:
On Fri, Dec 20, 2019 at 11:07 AM Andrey Borodin <x4mmm@yandex-team.ru> wrote:
> I think changing synchronous_standby_names to some available standbys will resume all backends waiting for
synchronousreplication.
 
> Do we need to check necessity of synchronous replication in any other case?

The GUCs are not re-checked in the main loop in SyncRepWaitForLSN, so
backends will remain stuck there even if synchronous replication has
been (temporarily) disabled while they were waiting.

I do agree with the general sentiment that terminating the connection
is preferable over sending a response to the client (except when
synchronous replication was already disabled). Synchronous replication
does not guarantee that a committed write is actually on any replica,
but it does in general guarantee that a commit has been replicated
before sending a response to the client. That's arguably more
important because the rest of what the application might depend on the
transaction completing and replicating successfully. I don't know of
cases other than cancellation in which a response is sent to the
client without replication when synchronous replication is enabled.

The error level should be FATAL instead of PANIC, since PANIC restarts
the database and I don't think there is a reason to do that.

Marco



Re: Disallow cancellation of waiting for synchronous replication

From
Andrey Borodin
Date:

> 21 дек. 2019 г., в 2:19, Tom Lane <tgl@sss.pgh.pa.us> написал(а):
>
> Andrey Borodin <x4mmm@yandex-team.ru> writes:
>> I think proper solution here would be to add GUC to disallow cancellation of synchronous replication.
>
> This sounds entirely insane to me.  There is no possibility that you
> can prevent a failure from occurring at this step.
Yes, we cannot prevent failure. If we wait here for a long time and someone cancels query, probably, node is failed.
Databasealready lives in some other Availability Zone. 
All we should do - refuse to commit anything here. Any committed data will be lost.

>> Three is still a problem when backend is not canceled, but terminated [2].
>
> Exactly.  If you don't have a fix that handles that case, you don't have
> anything.  In fact, you've arguably made things worse, by increasing the
> temptation to terminate or "kill -9" the nonresponsive session.
Currently, any Postgres HA solution can loose data when application issues INSERT ... ON CONFLICT DO NOTHING with
retry.There is no need for any DBA mistake. Just a driver capable of issuing cancel on timeout. 

Administrator issuing kill -9 is OK, database must shutdown to prevent splitbrain. Preferably, database should refuse
tostart after shutdown. 
I'm not proposing this behavior as default. If administrator (or HA tool) configured DB in this mode - probably they
knowwhat they are doing. 

> 21 дек. 2019 г., в 15:34, Marco Slot <marco@citusdata.com> написал(а):
>
> On Fri, Dec 20, 2019 at 11:07 AM Andrey Borodin <x4mmm@yandex-team.ru> wrote:
>> I think changing synchronous_standby_names to some available standbys will resume all backends waiting for
synchronousreplication. 
>> Do we need to check necessity of synchronous replication in any other case?
>
> The GUCs are not re-checked in the main loop in SyncRepWaitForLSN, so
> backends will remain stuck there even if synchronous replication has
> been (temporarily) disabled while they were waiting.
SyncRepInitConfig() will be called on SIGHUP. Backends waiting for synchronous replication will wake up on
WAIT_EVENT_SYNC_REPand happily succeed. 

> I do agree with the general sentiment that terminating the connection
> is preferable over sending a response to the client (except when
> synchronous replication was already disabled). Synchronous replication
> does not guarantee that a committed write is actually on any replica,
> but it does in general guarantee that a commit has been replicated
> before sending a response to the client. That's arguably more
> important because the rest of what the application might depend on the
> transaction completing and replicating successfully. I don't know of
> cases other than cancellation in which a response is sent to the
> client without replication when synchronous replication is enabled.
>
> The error level should be FATAL instead of PANIC, since PANIC restarts
> the database and I don't think there is a reason to do that.

Terminating connection is unacceptable, actually. Client will retry idempotent query. This query now do not need to
writeanything and will be committed. 
We need to shutdown database and prevent it from starting. We should not acknowledge any data before synchronous
replicationconfiguration allows us. 

When client tries to cancel his query - we refuse to do so and hold his write locks. If anyone terminate connection -
lockswill be released. It is better to shut down whole DB, then release these locks and continue to receive queries. 


All this does not apply to simple cases when user accidentally enabled synchronous replication. This is a setup for
quitesophisticated HA tool, which will rewind local database, when transient network partition will be over and old
timelineis archived, and attach it to new primary. 


Best regards, Andrey Borodin.


Re: Disallow cancellation of waiting for synchronous replication

From
Maksim Milyutin
Date:
On 21.12.2019 00:19, Tom Lane wrote:

>> Three is still a problem when backend is not canceled, but terminated [2].
> Exactly.  If you don't have a fix that handles that case, you don't have
> anything.  In fact, you've arguably made things worse, by increasing the
> temptation to terminate or "kill -9" the nonresponsive session.


I assume that the termination of backend that causes termination of 
PostgreSQL instance in Andrey's patch proposal have to be resolved by 
external HA agents that could interrupt such terminations as parent 
process of postmaster and make appropriate decisions e.g., restart 
PostgreSQL node in closed from external users state (via pg_hba.conf 
manipulation) until all sync replicas synchronize changes from master. 
Stolon HA tool implements this strategy  [1]. This logic (waiting for 
all replicas declared in synchronous_standby_names replicate all WAL 
from master) could be implemented inside PostgreSQL kernel after start 
recovery process before database is opened to users and this can be done 
separately later.

Another approach is to implement two-phase commit over master and sync 
replicas (as it did Oracle in old versions [2]) where the risk to get 
local committed data under instance restarting and query canceling is 
minimal (after starting of final commitment phase). But this approach 
has latency penalty and complexity to resolve partial (prepared but not 
committed) transactions under coordinator (in this case master node) 
failure in automatic mode. Nicely if this approach will be implemented 
later as option of synchronous commit.


1. 

https://github.com/sorintlab/stolon/blob/master/doc/syncrepl.md#handling-postgresql-sync-repl-limits-under-such-circumstances

2. 
https://docs.oracle.com/cd/B28359_01/server.111/b28326/repmaster.htm#i33607

-- 
Best regards,
Maksim Milyutin




Re: Disallow cancellation of waiting for synchronous replication

From
Maksim Milyutin
Date:
On 21.12.2019 13:34, Marco Slot wrote:

> I do agree with the general sentiment that terminating the connection
> is preferable over sending a response to the client (except when
> synchronous replication was already disabled).


But in this case locally committed data becomes visible to new incoming 
transactions that is bad side-effect of this issue. Under failover those 
changes potentially undo.


> Synchronous replication
> does not guarantee that a committed write is actually on any replica,
> but it does in general guarantee that a commit has been replicated
> before sending a response to the client. That's arguably more
> important because the rest of what the application might depend on the
> transaction completing and replicating successfully. I don't know of
> cases other than cancellation in which a response is sent to the
> client without replication when synchronous replication is enabled.


Yes, at query canceling (e.g. by timeout from client driver) client 
receives response about completed transaction (though with warning which 
not all client drivers can handle properly) and the guarantee about 
successfully replicated transaction *violates*.


-- 
Best regards,
Maksim Milyutin




Re: Disallow cancellation of waiting for synchronous replication

From
Andrey Borodin
Date:

> 25 дек. 2019 г., в 15:28, Maksim Milyutin <milyutinma@gmail.com> написал(а):
>
>> Synchronous replication
>> does not guarantee that a committed write is actually on any replica,
>> but it does in general guarantee that a commit has been replicated
>> before sending a response to the client. That's arguably more
>> important because the rest of what the application might depend on the
>> transaction completing and replicating successfully. I don't know of
>> cases other than cancellation in which a response is sent to the
>> client without replication when synchronous replication is enabled.
>
>
> Yes, at query canceling (e.g. by timeout from client driver) client receives response about completed transaction
(thoughwith warning which not all client drivers can handle properly) and the guarantee about successfully replicated
transaction*violates*. 

We obviously need a design discussion here to address the issue. But the immediate question is should we add this topic
toJanuary CF items? 

Best regards, Andrey Borodin.


Re: Disallow cancellation of waiting for synchronous replication

From
Marco Slot
Date:


On Wed, Dec 25, 2019, 11:28 Maksim Milyutin <milyutinma@gmail.com> wrote:
But in this case locally committed data becomes visible to new incoming
transactions that is bad side-effect of this issue. 

Your application should be prepared for that in any case.

At the point where synchronous replication waits, the commit has already been written to disk on the primary. If postgres restarts while waiting for replication then the write becomes immediately visible regardless of whether it was replicated. I don't think throwing a PANIC actually prevents that and if it does it's coincidental. Best you can do is signal to the client that the commit status is unknown.

That's far from ideal, but fixing it requires a much bigger change to streaming replication. The write should be replicated prior to commit on the primary, but applied after in a way where unapplied writes on the secondary can be overwritten/discarded if it turns out they did not commit on the primary.

Marco

Re: Disallow cancellation of waiting for synchronous replication

From
Maksim Milyutin
Date:

On 25.12.2019 14:27, Marco Slot wrote:



On Wed, Dec 25, 2019, 11:28 Maksim Milyutin <milyutinma@gmail.com> wrote:
But in this case locally committed data becomes visible to new incoming
transactions that is bad side-effect of this issue. 

Your application should be prepared for that in any case.

At the point where synchronous replication waits, the commit has already been written to disk on the primary. If postgres restarts while waiting for replication then the write becomes immediately visible regardless of whether it was replicated.


Yes, this write is recovered after starting of instance. At first, this case I want to delegate to external HA tool around PostgreSQL. It can handle instance stopping and take switchover to sync replica or start current instance with closed connections from external users until all writes replicate to sync replicas. Later, arguably closing connection after recovery process could be implemented inside the kernel.


I don't think throwing a PANIC actually prevents that and if it does it's coincidental.


PANIC lets down instance and doesn't provide clients to read locally committed data. HA tool takes further steps to close access to these data as described above.


That's far from ideal, but fixing it requires a much bigger change to streaming replication. The write should be replicated prior to commit on the primary, but applied after in a way where unapplied writes on the secondary can be overwritten/discarded if it turns out they did not commit on the primary.


Thanks for sharing your opinion about enhancement of synchronous commit protocol. Here [1] my position is listed. It would like to see positions of other members of community.


1. https://www.postgresql.org/message-id/f3ffc220-e601-cc43-3784-f9bba66dc382%40gmail.com

-- 
Best regards,
Maksim Milyutin

Re: Disallow cancellation of waiting for synchronous replication

From
Maksim Milyutin
Date:
On 25.12.2019 13:45, Andrey Borodin wrote:
>> 25 дек. 2019 г., в 15:28, Maksim Milyutin <milyutinma@gmail.com> написал(а):
>>
>>> Synchronous replication
>>> does not guarantee that a committed write is actually on any replica,
>>> but it does in general guarantee that a commit has been replicated
>>> before sending a response to the client. That's arguably more
>>> important because the rest of what the application might depend on the
>>> transaction completing and replicating successfully. I don't know of
>>> cases other than cancellation in which a response is sent to the
>>> client without replication when synchronous replication is enabled.
>>
>> Yes, at query canceling (e.g. by timeout from client driver) client receives response about completed transaction
(thoughwith warning which not all client drivers can handle properly) and the guarantee about successfully replicated
transaction*violates*.
 
> We obviously need a design discussion here to address the issue. But the immediate question is should we add this
topicto January CF items?
 


+1 on posting this topic to January CF.

Andrey, some fixes from me:

1) pulled out the cancelling of QueryCancelPending from internal branch 
where synchronous_commit_cancelation is set so that to avoid dummy 
iterations with printing message "canceling the wait for ..."

2) rewrote errdetail message under cancelling query: I hold in this case 
we cannot assert that transaction committed locally because its changes 
are not visible as yet so I propose to write about locally flushed 
commit wal record.

Updated patch is attached.

-- 
Best regards,
Maksim Milyutin


Attachment

Re: Disallow cancellation of waiting for synchronous replication

From
Robert Haas
Date:
On Fri, Dec 20, 2019 at 12:04 AM Andrey Borodin <x4mmm@yandex-team.ru> wrote:
> Currently, we can have split brain with the combination of following steps:
> 0. Setup cluster with synchronous replication. Isolate primary from standbys.
> 1. Issue upsert query INSERT .. ON CONFLICT DO NOTHING
> 2. CANCEL 1 during wait for synchronous replication
> 3. Retry 1. Idempotent query will succeed and client have confirmation of written data, while it is not present on
anystandby.
 

It seems to me that in order for synchronous replication to work
reliably, you've got to be very careful about any situation where a
commit might or might not have completed, and this is one such
situation. When the client sends the query cancel, it does not and
cannot know whether the INSERT statement has not yet completed, has
already completed but not yet replicated, or has completed and
replicated but not yet sent back a response. However, the server's
response will be different in each of those cases, because in the
second case, there will be a WARNING about synchronous replication
having been interrupted. If the client ignores that WARNING, there are
going to be problems.

Now, even if you do pay attention to the warning, things are not
totally great here, because if you have inadvertently interrupted a
replication wait, how do you recover? You can't send a command that
means "oh, I want to wait after all." You would have to check the
standbys yourself, from the application code, and see whether the
changes that the query made have shown up there, or check the LSN on
the master and wait for the standbys to advance to that LSN. That's
not great, but might be doable for some applications.

One idea that I had during the initial discussion around synchronous
replication was that maybe there ought to be a distinction between
trying to cancel the query and trying to cancel the replication wait.
Imagine that you could send a cancel that would only cancel
replication waits but not queries, or only queries but not replication
waits. Then you could solve this problem by asking the server to
PQcancelWithAdvancedMagic(conn, PQ_CANCEL_TYPE_QUERY). I wasn't sure
that people would want this, and it didn't seem essential for the
version of this feature, but maybe this example shows that it would be
worthwhile. I don't really have any idea how you'd integrate such a
feature with psql, but maybe it would be good enough to have it
available through the C interface. Also, it's a wire-protocol change,
so there are compatibility issues to think about.

All that being said, like Tom and Michael, I don't think teaching the
backend to ignore cancels is the right approach. We have had
innumerable problems over the years that were caused by the backend
failing to respond to cancels when we would really have liked it to do
so, and users REALLY hate it when you tell them that they have to shut
down and restart (with crash recovery) the entire database because of
a single stuck backend.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Disallow cancellation of waiting for synchronous replication

From
Maksim Milyutin
Date:
On 29.12.2019 00:55, Robert Haas wrote:

> On Fri, Dec 20, 2019 at 12:04 AM Andrey Borodin <x4mmm@yandex-team.ru> wrote:
>> Currently, we can have split brain with the combination of following steps:
>> 0. Setup cluster with synchronous replication. Isolate primary from standbys.
>> 1. Issue upsert query INSERT .. ON CONFLICT DO NOTHING
>> 2. CANCEL 1 during wait for synchronous replication
>> 3. Retry 1. Idempotent query will succeed and client have confirmation of written data, while it is not present on
anystandby.
 
> All that being said, like Tom and Michael, I don't think teaching the
> backend to ignore cancels is the right approach. We have had
> innumerable problems over the years that were caused by the backend
> failing to respond to cancels when we would really have liked it to do
> so, and users REALLY hate it when you tell them that they have to shut
> down and restart (with crash recovery) the entire database because of
> a single stuck backend.
>

The stuckness of backend is not deadlock here. To cancel waiting of 
backend fluently, client is enough to turn off synchronous replication 
(change synchronous_standby_names through server reload) or change 
synchronous replica to another livable one (again through changing of 
synchronous_standby_names). In first case he explicitly agrees with 
existence of local (not replicated) commits in master.


-- 
Best regards,
Maksim Milyutin




Re: Disallow cancellation of waiting for synchronous replication

From
Robert Haas
Date:
On Sat, Dec 28, 2019 at 6:19 PM Maksim Milyutin <milyutinma@gmail.com> wrote:
> The stuckness of backend is not deadlock here. To cancel waiting of
> backend fluently, client is enough to turn off synchronous replication
> (change synchronous_standby_names through server reload) or change
> synchronous replica to another livable one (again through changing of
> synchronous_standby_names). In first case he explicitly agrees with
> existence of local (not replicated) commits in master.

Sure, that's true. But I still maintain that responding to ^C is an
important property of the system. If you have to do some more
complicated set of steps like the ones you propose here, a decent
number of people aren't going to figure it out and will end up
unhappy. Now, as it is, you're unhappy, so I guess you can't please
everyone, but you asked for opinions so I'm giving you mine.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Disallow cancellation of waiting for synchronous replication

From
Andrey Borodin
Date:

> 29 дек. 2019 г., в 4:54, Robert Haas <robertmhaas@gmail.com> написал(а):
>
> On Sat, Dec 28, 2019 at 6:19 PM Maksim Milyutin <milyutinma@gmail.com> wrote:
>> The stuckness of backend is not deadlock here. To cancel waiting of
>> backend fluently, client is enough to turn off synchronous replication
>> (change synchronous_standby_names through server reload) or change
>> synchronous replica to another livable one (again through changing of
>> synchronous_standby_names). In first case he explicitly agrees with
>> existence of local (not replicated) commits in master.
>
> Sure, that's true. But I still maintain that responding to ^C is an
> important property of the system.
Not loosing data - is a nice property of the database either.
Currently, synchronous replication fails to provide its guaranty - no data will be acknowledged until it is replicated.
We want to create a mode where this guaranty is provided.

When user issued CANCEL we could return him his warning or error, but we should not drop data locks. Other transactions
shouldnot get acknowledged on basis of non-replicated data. 

> If you have to do some more
> complicated set of steps like the ones you propose here, a decent
> number of people aren't going to figure it out and will end up
> unhappy. Now, as it is, you're unhappy, so I guess you can't please
> everyone, but you asked for opinions so I'm giving you mine.

There are many cases when we do not allow user to shoot into his foot. For example, anti-wraparound vacuum. Single-user
vacuumfreeze is much less pain than split-brain. In case of wraparound protection, there is deterministic steps to take
toget your database back consistently. But in case of split-brain there is no single plan for cure. 

Best regards, Andrey Borodin.


Re: Disallow cancellation of waiting for synchronous replication

From
Bruce Momjian
Date:
On Sat, Dec 28, 2019 at 04:55:55PM -0500, Robert Haas wrote:
> On Fri, Dec 20, 2019 at 12:04 AM Andrey Borodin <x4mmm@yandex-team.ru> wrote:
> > Currently, we can have split brain with the combination of following steps:
> > 0. Setup cluster with synchronous replication. Isolate primary from standbys.
> > 1. Issue upsert query INSERT .. ON CONFLICT DO NOTHING
> > 2. CANCEL 1 during wait for synchronous replication
> > 3. Retry 1. Idempotent query will succeed and client have confirmation of written data, while it is not present on
anystandby.
 
> 
> It seems to me that in order for synchronous replication to work
> reliably, you've got to be very careful about any situation where a
> commit might or might not have completed, and this is one such
> situation. When the client sends the query cancel, it does not and
> cannot know whether the INSERT statement has not yet completed, has
> already completed but not yet replicated, or has completed and
> replicated but not yet sent back a response. However, the server's
> response will be different in each of those cases, because in the
> second case, there will be a WARNING about synchronous replication
> having been interrupted. If the client ignores that WARNING, there are
> going to be problems.

This gets to the heart of something I was hoping to discuss.  When is
something committed?  You would think it is when the client receives the
commit message, but Postgres can commit something, and try to inform the
client but fail to inform, perhaps due to network problems.  In Robert's
case above, we send a "success", but it is only a success on the primary
and not on the synchronous standby.

In the first case I mentioned, we commit without guaranteeing the client
knows, but in the second case, we tell the client success with a warning
that the synchronous standby didn't get the commit.  Are clients even
checking warning messages?  You see it in psql, but what about
applications that use Postgres.  Do they even check for warnings?
Should administrators be informed via email or some command when this
happens?

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Re: Disallow cancellation of waiting for synchronous replication

From
Robert Haas
Date:
On Sun, Dec 29, 2019 at 4:13 AM Andrey Borodin <x4mmm@yandex-team.ru> wrote:
> Not loosing data - is a nice property of the database either.

Sure, but there's more than one way to fix that problem, as I pointed
out in my first response.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Disallow cancellation of waiting for synchronous replication

From
Robert Haas
Date:
On Mon, Dec 30, 2019 at 9:39 AM Bruce Momjian <bruce@momjian.us> wrote:
> This gets to the heart of something I was hoping to discuss.  When is
> something committed?  You would think it is when the client receives the
> commit message, but Postgres can commit something, and try to inform the
> client but fail to inform, perhaps due to network problems.

This kind of problem can happen even without synchronous replication.
I've alluded to this problem in a couple of blog posts I've done on
sync rep.

If an application is connected to the database and sends a COMMIT
command (or a data-modifying command outside a transaction that will
commit implicitly) and the connection is closed before it receives a
response, it does not know whether the COMMIT actually happened. It
will have to wait until the database is back up and running and then
go examine the state of the database with SELECT statements and try to
figure out whether the changes it wanted actually got made. Otherwise
it doesn't know whether the failure that resulted in a loss of network
connectivity occurred before or after the commit.

I imagine that most applications are way too dumb to do this properly
and just report errors to the user and let the user decide what to do
to try to recover. And I imagine that most users are not terribly
careful about it and such events cause minor data loss/corruption on a
regular basis. But there are also probably some applications where
people are really fanatical about it.

>  In Robert's
> case above, we send a "success", but it is only a success on the primary
> and not on the synchronous standby.
>
> In the first case I mentioned, we commit without guaranteeing the client
> knows, but in the second case, we tell the client success with a warning
> that the synchronous standby didn't get the commit.  Are clients even
> checking warning messages?  You see it in psql, but what about
> applications that use Postgres.  Do they even check for warnings?
> Should administrators be informed via email or some command when this
> happens?

I suspect a lot of clients are not checking warning messages, but
whether that's really the server's problem is arguable. I think we've
developed a general practice over the years of trying to avoid warning
messages as a way of telling users about problems, and that's a good
idea in general precisely because they might just get ignored, but
there are cases where it is really the only possible way forward. It
would certainly be pretty bad to have the COMMIT succeed on the local
node but produce an ERROR; that would doubtless be much more confusing
than what it's doing now. There's nothing at all to prevent
administrators from watching the logs for such warnings and taking
whatever action they deem appropriate.

I continue to think that the root cause of this issue is that we can't
distinguish between cancelling the query and cancelling the sync rep
wait. The client in this case is asking for both when it really only
wants the former, and then ignoring the warning that the latter is
what actually occurred.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Disallow cancellation of waiting for synchronous replication

From
Andrey Borodin
Date:

> 2 янв. 2020 г., в 19:13, Robert Haas <robertmhaas@gmail.com> написал(а):
>
> On Sun, Dec 29, 2019 at 4:13 AM Andrey Borodin <x4mmm@yandex-team.ru> wrote:
>> Not loosing data - is a nice property of the database either.
>
> Sure, but there's more than one way to fix that problem, as I pointed
> out in my first response.
Sorry, it took some more reading iterations of your message for me to understand the problem you are writing about.

You proposed two solutions:
1. Client analyze warning an understand that data is not actually committed. This, as you pointed out, does not solve
theproblem: data is lost for another client, who never saw the warning. 
Actually, "client" is a stateless number of connections unable to communicate with each other by any means beside
database.They cannot share information about not committed transactions (they would need a database, thus chicken and
theegg problem). 

2. Add another message "CANCEL --force" to stop synchronous replication for specific backend.
We already have a way to stop synchronous replication "alter system set synchronous_standby_names to
'working.stand.by';select pg_reload_conf();". This will stop it for every backend, but "CANCEL --force" will be more
picky.
User still can loose data when they issue idempotent query based on data, committed by "CANCEL --force". Moreover, user
canloose data if his upsert is based on data committed by someone else with "set synchronous_commit to off". 
We could fix upserts: make them wait for replication even if nothing was changed, but this will not cover the case when
useris doing SELECT and decides not to insert anything. 
We can fix SELECT: if user asks for synchronous_commit=remote_write - give him snapshot no newer than synchronously
committeddata. ISTM this would solve all above problems, but I do not see implications of this approach. We should add
allXIDs to XIP if their commit LSN > sync rep LSN. But I'm not sure all other transactional mechanics will be OK with
this.

From practical point of view - when all writing queries use same synchronous_commit level - easiest solution is to just
disallowcancel of sync replication. In psql we can just reset connection on second CTRL+C. That's more generic than
"CANCEL--force". 

When all queries runs with same synchronous_commit there is no point in protocol message for canceling sync rep for
singleconnection. Just drop that connection. Ignoring cancel is the only way to satisfy synchronous_commit level, which
isconstant for transaction. 
When queries run in various synchronous_commit - things are much more complicated. Adding protocol message to change
synchronous_commitfor running queries does not seems to be a viable option. 

> I continue to think that the root cause of this issue is that we can't
> distinguish between cancelling the query and cancelling the sync rep
> wait.
Yes, it is. But canceling sync rep wait exists already. Just change synchronous_stanby_names. Canceling sync rep for
oneclient - is, effectively, changing synchronous commit level for running transaction. It opens a way for way more
difficultcomplications. 

> The client in this case is asking for both when it really only
> wants the former, and then ignoring the warning that the latter is
> what actually occurred.
Client is not ignoring warnings. Data is lost for the client which never received warning. If we could just fix our
code,I would not be making so much noise. There are workarounds, but they are very pleasant to explain. 


Best regards, Andrey Borodin.


Re: Disallow cancellation of waiting for synchronous replication

From
Bruce Momjian
Date:
On Thu, Jan  2, 2020 at 10:26:16PM +0500, Andrey Borodin wrote:
> 
> 
> > 2 янв. 2020 г., в 19:13, Robert Haas <robertmhaas@gmail.com> написал(а):
> > 
> > On Sun, Dec 29, 2019 at 4:13 AM Andrey Borodin <x4mmm@yandex-team.ru> wrote:
> >> Not loosing data - is a nice property of the database either.
> > 
> > Sure, but there's more than one way to fix that problem, as I pointed
> > out in my first response.
> Sorry, it took some more reading iterations of your message for me to understand the problem you are writing about.
> 
> You proposed two solutions:
> 1. Client analyze warning an understand that data is not actually committed. This, as you pointed out, does not solve
theproblem: data is lost for another client, who never saw the warning.
 
> Actually, "client" is a stateless number of connections unable to communicate with each other by any means beside
database.They cannot share information about not committed transactions (they would need a database, thus chicken and
theegg problem).
 

Actually, it might be worse than that.  In my reading of
RecordTransactionCommit(), we do this:

    write to WAL
    flush WAL (durable)
    make visible to other backends
    replicate
    communicate to the client

I think this means we make the transaction commit visible to all
backends _before_ we replicate it, and potentially wait until we get a
replication reply to return SUCCESS to the client.  This means other
clients are acting on data that is durable on the local machine, but not
on the replicated machine, even if synchronous_standby_names is set.

I feel this topic needs a lot more thought before we consider changing
anything.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Re: Disallow cancellation of waiting for synchronous replication

From
Andrey Borodin
Date:

> 11 янв. 2020 г., в 7:34, Bruce Momjian <bruce@momjian.us> написал(а):
>
> Actually, it might be worse than that.  In my reading of
> RecordTransactionCommit(), we do this:
>
>     write to WAL
>     flush WAL (durable)
>     make visible to other backends
>     replicate
>     communicate to the client
>
> I think this means we make the transaction commit visible to all
> backends _before_ we replicate it, and potentially wait until we get a
> replication reply to return SUCCESS to the client.
No. Data is not visible to other backend when we await sync rep. It's easy to check.
in one psql you can start waiting for sync rep:
postgres=# \d+ x
                                     Table "public.x"
 Column |  Type   | Collation | Nullable | Default | Storage  | Stats target | Description
--------+---------+-----------+----------+---------+----------+--------------+-------------
 key    | integer |           | not null |         | plain    |              |
 data   | text    |           |          |         | extended |              |
Indexes:
    "x_pkey" PRIMARY KEY, btree (key)
Access method: heap

postgres=# alter system set synchronous_standby_names to 'nonexistent';
ALTER SYSTEM
postgres=# select pg_reload_conf();
2020-01-12 16:09:58.167 +05 [45677] LOG:  received SIGHUP, reloading configuration files
 pg_reload_conf
----------------
 t
(1 row)

postgres=# insert into x values (7, '7');


In other try to see inserted (already locally committed data)

postgres=# select * from x where key = 7;
 key | data
-----+------
(0 rows)


try to insert same data and backend will hand on locks

postgres=# insert into x values (7,'7') on conflict do nothing;

 ProcessQuery  (in postgres) + 189  [0x1014b05bd]
 standard_ExecutorRun  (in postgres) + 301  [0x101339fcd]
 ExecModifyTable  (in postgres) + 1106  [0x101362b62]
 ExecInsert  (in postgres) + 494  [0x10136344e]
 ExecCheckIndexConstraints  (in postgres) + 570  [0x10133910a]
 check_exclusion_or_unique_constraint  (in postgres) + 977  [0x101338db1]
 XactLockTableWait  (in postgres) + 176  [0x101492770]
 LockAcquireExtended  (in postgres) + 1274  [0x101493aaa]



Thanks!

Best regards, Andrey Borodin.


Re: Disallow cancellation of waiting for synchronous replication

From
Andres Freund
Date:
Hi,

On 2020-01-12 16:18:38 +0500, Andrey Borodin wrote:
> > 11 янв. 2020 г., в 7:34, Bruce Momjian <bruce@momjian.us> написал(а):
> > 
> > Actually, it might be worse than that.  In my reading of
> > RecordTransactionCommit(), we do this:
> > 
> >     write to WAL
> >     flush WAL (durable)
> >     make visible to other backends
> >     replicate
> >     communicate to the client
> > 
> > I think this means we make the transaction commit visible to all
> > backends _before_ we replicate it, and potentially wait until we get a
> > replication reply to return SUCCESS to the client.
> No. Data is not visible to other backend when we await sync rep.

Yea, as the relevant comment in RecordTransactionCommit() says;

     * Note that at this stage we have marked clog, but still show as running
     * in the procarray and continue to hold locks.
     */
    if (wrote_xlog && markXidCommitted)
        SyncRepWaitForLSN(XactLastRecEnd, true);


But it's worthwhile to emphasize that data at that stage actually *can*
be visible on standbys. The fact that the transaction still shows as
running via procarray, on the primary, does not influence visibility
determinations on the standby.

Greetings,

Andres Freund



Re: Disallow cancellation of waiting for synchronous replication

From
Maksim Milyutin
Date:
On 15.01.2020 01:53, Andres Freund wrote:

> On 2020-01-12 16:18:38 +0500, Andrey Borodin wrote:
>>> 11 янв. 2020 г., в 7:34, Bruce Momjian <bruce@momjian.us> написал(а):
>>>
>>> Actually, it might be worse than that.  In my reading of
>>> RecordTransactionCommit(), we do this:
>>>
>>>     write to WAL
>>>     flush WAL (durable)
>>>     make visible to other backends
>>>     replicate
>>>     communicate to the client
>>>
>>> I think this means we make the transaction commit visible to all
>>> backends _before_ we replicate it, and potentially wait until we get a
>>> replication reply to return SUCCESS to the client.
>> No. Data is not visible to other backend when we await sync rep.
> Yea, as the relevant comment in RecordTransactionCommit() says;
>
>      * Note that at this stage we have marked clog, but still show as running
>      * in the procarray and continue to hold locks.
>      */
>     if (wrote_xlog && markXidCommitted)
>         SyncRepWaitForLSN(XactLastRecEnd, true);
>
>
> But it's worthwhile to emphasize that data at that stage actually *can*
> be visible on standbys. The fact that the transaction still shows as
> running via procarray, on the primary, does not influence visibility
> determinations on the standby.


In common case, consistent reading in cluster (even if remote_apply is 
on) is available (and have to be) only on master node. For example, if 
random load balancer on read-only queries is established above master 
and sync replicas (while meeting remote_apply is on) it's possible to 
catch the case when preceding reads would return data that will be 
absent on subsequent ones.
Moreover, such visible commits on sync standby are not durable from the 
point of cluster view. For example, if we have two sync standbys then 
under failover we can switch master to sync standby on which waiting 
commit was not replicated but it was applied (and visible) on other 
standby. This switching is completely safe because client haven't 
received acknowledge on commit request and that transaction is in 
indeterminate state, it can be as committed so aborted depending on 
which standby will be promoted.


-- 
Best regards,
Maksim Milyutin




Re: Disallow cancellation of waiting for synchronous replication

From
Michail Nikolaev
Date:
Hello.

Just want to share some thoughts about how it looks from perspective
of a high availability web-service application developer.
Because sometimes things look different from other sides. And
everything looks like disaster to be honest.

But let's take it one at a time.

First - the problem is not related to upsert queries only. It could be
reproduced by plain INSERTS or UPDATES. For example:

* client 1 inserts new records and waits for synchronous replication
* client 1 cancels the query
* clients 2, 3, 4 and 5 see new data and perform some actions outside
of the database in external systems
* master is switched to the replica with no WAL of new records replicated yet

As a result: newly inserted data are just gone, but external systems
already rely on it.
And this is just a huge pain for the application and its developer.

Second - it is all not about the client who canceled the query. It may
be super clever and totally understand all of the tricky aspects and
risks of such action.
But it is about *other* clients who become able to see the
"non-existing" data. They even have no option to detect such
situation.

Yes, currently there are a few ways for non-synchronous-replicated
data to become visible (for complex reasons of course):
1) client cancels the query while waiting synchronous replications
2) database restart
3) kill -9 of backend waiting synchronous replications

What is the main difference among 1 vs 2 and 3? Because 1 is performed
not only by humans.
And moreover it is performed mostly by applications. And it happens
right now on thousands on servers!

Check [1] and [2]. It is official JDBC driver for PostgreSQL. I am
sure it is the most popular way to communicate with PostgreSQL these
days.
And implementation of Statement::setQueryTimeout creates timer to send
cancellation after the timeout. It is official and recommended way to
limit statement execution to some interval in JDBC.
In my project (and its libraries) it is used in dozens of places. It
is also possible to search GitHub [4] to understand how widely it's
used.
For example it is used by Spring framework[5], probably the most
popular framework in the world for the rank-2 programming language.

And situation is even worse. What is the case when setQueryTimeout
starts to cancel queries during synchronous replication like crazy?
Yes, it is the moment of losing connection between master and sync
replica (because all backends are now stuck in synrep). New master
will be elected in a few seconds.... (or maybe already elected and
working).
And... Totally correct code cancels hundreds of queries stuck in
synrep making "non-existing" data available to be read for other
clients in same availability zone....

It is just nightmare to be honest.

These days almost every web-service needs HA for postgres. And
practically if your code (or some of library code) calls
Statement::setQueryTimeout - your HA (for example - Patroni) is
broken.
And it is really not easy to control setQueryTimeout call in modern
application with thousands of third-party libraries. Also, a lot of
applications are in the support phase.

As for me - I am going to hack postgres jdbc driver to ignore
setQueryTimeout at all for now.

>> I think proper solution here would be to add GUC to disallow cancellation of synchronous replication.
> This sounds entirely insane to me.  There is no possibility that you
> can prevent a failure from occurring at this step.

Yes, maybe it is insane but looks like whole java-postgres-HA (and may
be others) world is going down. So, I believe we should even backport
such an insane knob.


As developers of distributed systems we don't have many things to rely
on. And they are:
1) if database clearly says something is committed - it is committed
with ACID guarantees
2) anything else - it may be committed, may be not committed, may be
waiting to be committed

And we've just lost letter D from ACID practically.

Thanks,
Michail.

[1]
https://github.com/pgjdbc/pgjdbc/blob/23cce8ad35d9af6e2a1cb97fac69fdc0a7f94b42/pgjdbc/src/main/java/org/postgresql/core/QueryExecutorBase.java#L164-L200
[2]
https://github.com/pgjdbc/pgjdbc/blob/ed09fd1165f046ae956bf21b6c7882f1267fb8d7/pgjdbc/src/main/java/org/postgresql/jdbc/PgStatement.java#L538-L540
[3] https://docs.oracle.com/javase/7/docs/api/java/sql/Statement.html#setQueryTimeout(int)
[4] https://github.com/search?l=Java&q=statement.setQueryTimeout&type=Code
[5]
https://github.com/spring-projects/spring-framework/blob/master/spring-jdbc/src/main/java/org/springframework/jdbc/datasource/DataSourceUtils.java#L329-L343



Re: Disallow cancellation of waiting for synchronous replication

From
Jeff Davis
Date:
On Sat, 2019-12-21 at 11:34 +0100, Marco Slot wrote:
> The GUCs are not re-checked in the main loop in SyncRepWaitForLSN, so
> backends will remain stuck there even if synchronous replication has
> been (temporarily) disabled while they were waiting.

If you do:

  alter system set synchronous_standby_names='';
  select pg_reload_conf();

it will release the backends waiting on sync rep.

Regards,
    Jeff Davis





Re: Disallow cancellation of waiting for synchronous replication

From
Andrey Borodin
Date:

> 9 июня 2020 г., в 23:32, Jeff Davis <pgsql@j-davis.com> написал(а):
>
>

After using a patch for a while it became obvious that PANICing during termination is not a good idea. Even when we
waitfor synchronous replication. It generates undesired coredumps. 
I think in presence of SIGTERM it's reasonable to say that we cannot protect user anymore.

PFA v3.

Best regards, Andrey Borodin.

Attachment

Re: Disallow cancellation of waiting for synchronous replication

From
Aleksander Alekseev
Date:
Hi hackers,

> >> After using a patch for a while it became obvious that PANICing during termination is not a good idea. Even when
wewait for synchronous replication. It generates undesired coredumps.
 
> >> I think in presence of SIGTERM it's reasonable to say that we cannot protect user anymore.
> >> PFA v3.

This patch, although solving a concrete and important problem, looks
more like a quick workaround than an appropriate solution. Or is it
just me?

Ideally, the transaction should be committed only after getting a
reply from the standby. If the user cancels the transaction, it
doesn't get committed anywhere. This is what people into distributed
systems would expect unless stated otherwise, at least. Although I
realize how complicated it is to implement, especially considering all
the possible corner cases (netsplit right after getting a reply, etc).
Maybe we could come up with a less than ideal, but still sound and
easy-to-understand model, which, as soon as you learned it, doesn't
bring unexpected surprises to the user.

I believe at this point it's important to agree if the community is
ready to accept a patch as is to make existing users suffer less and
iterate afterward. Or we choose not to do it and to come up with
another idea. Personally, I don't have any better ideas, thus maybe
accepting Andrey's patch would be the lesser of two evils.

-- 
Best regards,
Aleksander Alekseev



Re: Disallow cancellation of waiting for synchronous replication

From
Andrey Borodin
Date:
Hi Aleksander!

Thanks for looking into this.

> 23 апр. 2021 г., в 14:30, Aleksander Alekseev <aleksander@timescale.com> написал(а):
>
> Hi hackers,
>
>>>> After using a patch for a while it became obvious that PANICing during termination is not a good idea. Even when
wewait for synchronous replication. It generates undesired coredumps. 
>>>> I think in presence of SIGTERM it's reasonable to say that we cannot protect user anymore.
>>>> PFA v3.
>
> This patch, although solving a concrete and important problem, looks
> more like a quick workaround than an appropriate solution. Or is it
> just me?
>
> Ideally, the transaction should be committed only after getting a
> reply from the standby.
Getting reply from the standby is a part of a commit. Commit is completed only when WAL reached standby. Commit,
certainly,was initiated before getting reply from standby. We cannot commit only after we commit. 

> If the user cancels the transaction, it
> doesn't get committed anywhere.
The problem is user tries to cancel a transaction after they asked for commit. We never promised rolling back committed
transaction.
When user asks for commit we insert commit record into WAL. And then wait when it is acknowledged by quorum of standbys
andlocal storage. 
We cannot discard this record on standbys. Or, at one point we will have to discard discard records. Or discard discard
discardrecords. 

> This is what people into distributed
> systems would expect unless stated otherwise, at least.
I think, our transaction semantics is stated clearly in documentation.

> Although I
> realize how complicated it is to implement, especially considering all
> the possible corner cases (netsplit right after getting a reply, etc).
> Maybe we could come up with a less than ideal, but still sound and
> easy-to-understand model, which, as soon as you learned it, doesn't
> bring unexpected surprises to the user.
The model proposed by my patch sounds as follows:
transaction effects should not be observable on primary until requirements of synchronous_commit are satisfied.

E.g. even if user issues cancel of committed locally transaction, we should not release locks held by this transaction.
What unexpected surprises do you see in this model?

Thanks for reviewing!

Best regards, Andrey Borodin.


Automatic notification for top transaction IDs

From
Gurjeet Singh
Date:
I came across this thread [1] to disallow canceling a transaction not
yet confirmed by a synchronous replica. I think my proposed patch
might help that case as well, hence adding all involved in that thread
to BCC, for one-time notification.

As mentioned in that thread, when sending a cancellation signal, the
client cannot be sure if the cancel signal was honored, and if the
transaction was cancelled successfully. In the attached patch, the
backend emits a NotificationResponse containing the current full
transaction id. It does so only if the relevant GUC is enabled, and
when the top-transaction is being assigned the ID.

This information can be useful to the client, when:
i) it wants to cancel a transaction _after_ issuing a COMMIT, and
ii) it wants to check the status of its transaction that it sent
COMMIT for, but never received a response (perhaps because the server
crashed).

Additionally, this information can be useful for middleware, like
Transaction Processing Monitors, which can now transparently (without
any change in application code) monitor the status of transactions (by
watching for the transaction status indicator in the ReadyForQuery
protocol message). They can use the transaction ID from the
NotificationResponse to open a watcher, and on seeing either an 'E' or
'I' payload in subsequent ReadyForQuery messages, close the watcher.
On server crash, or other adverse events, they can then use the
transaction IDs still being watched to check status of those
transactions, and take appropriate actions, e.g. retry any aborted
transactions.

We cannot use the elog() mechanism for this notification because it is
sensitive to the value of client_min_messages. Hence I used the NOTIFY
infrastructure for this message. I understand that this usage violates
some expectations as to how NOTIFY messages are supposed to behave
(see [2] below), but I think these are acceptable violations; open to
hearing if/why this might not be acceptable, and any possible
alternatives.

I'm not very familiar with the parallel workers infrastructure, so the
patch is missing any consideration for those.

Reviews welcome.

[1]: subject was: Re: Disallow cancellation of waiting for synchronous
replication
thread: https://www.postgresql.org/message-id/flat/C1F7905E-5DB2-497D-ABCC-E14D4DEE506C%40yandex-team.ru

[2]:
     At present, NotificationResponse can only be sent outside a
     transaction, and thus it will not occur in the middle of a
     command-response series, though it might occur just before ReadyForQuery.
     It is unwise to design frontend logic that assumes that, however.
     Good practice is to be able to accept NotificationResponse at any
     point in the protocol.

Best regards,
--
Gurjeet Singh http://gurjeet.singh.im/

Attachment