Home > mailing lists

Re: Queries that should be canceled will get stuck on secure_write function - Mailing list pgsql-hackers

From	Adrien Nayrat
Subject	Re: Queries that should be canceled will get stuck on secure_write function
Date	December 16, 2025 18:08:26
Msg-id	1184223b-5140-4dcc-8e7e-08161a39267e@anayrat.info Whole thread Raw
In response to	Queries that should be canceled will get stuck on secure_write function ("蔡梦娟(玊于)" <mengjuan.cmj@alibaba-inc.com>)
List	pgsql-hackers

Tree view

On 8/23/21 10:15 AM, 蔡梦娟(玊于) wrote:
> Hi, all
> 
> Recently, I got a problem that the startup process of standby is stuck and keeps in a waiting state. The backtrace of
startupprocess shows that it is waiting for a backend process which conflicts with recovery processing to exit,  the
gucparameters max_standby_streaming_delay and max_standby_archive_delay are both set as 30 seconds, but it doesn't
work.The backend process keeps alive, and the backtrace of this backend process show that it is waiting for the socket
tobe writeable in secure_write(). After further reading the code, I found that ProcessClientWriteInterrupt() in
secure_write()only process interrupts when ProcDiePending is true, otherwise do nothing. However, snapshot conflicts
withrecovery will only set QueryCancelPending as true, so the response to the signal will de delayed indefinitely if
thecorresponding client is stuck, thus blocking the recovery process.

> 
> I want to know why the interrupt is only handled when ProcDiePending is true, I think query which is supposed to be
canceledalso should respond to the signal.

> 
> I also want to share a patch with you, I add a guc parameter max_standby_client_write_delay, if a query is supposed
tobe canceled, and the time delayed by a client exceeds max_standby_client_write_delay, then set ProcDiePending as true
toavoid being delayed indefinitely, what do you think of it, hope to get your reply.

> 
> Thanks & Best Regard

Hello,

A customer encountered a similar issue (Postgres 15.12) :

1. An heavily updated table on the primary. Reads seems to generate heap 
prune record.
2. This same table is heavily read on standby and queries conflicts with 
recovery
3. This generates a lag and the load balancer decide to cut the 
connection to the clients. Thus, we have actives sessions with 
ClientWrite wait event.
4. The recovery is still waiting, there is no wait_event on the startup 
process, neither "not granted" locks.
5. After 900s, recovery is resumed

In the logs we have :

LOG: recovery still waiting after 1002.241 ms: recovery conflict on snapshot
<here the load balancer decide to cut connections>

Then ~900s later, we can a see a canceled query due to "connection to 
client lost".
Then, recovery can resume :
LOG:  recovery finished waiting after 952697.269 ms: recovery conflict 
on snapshot

It is surprising as we have both :
max_standby_archive_delay = 30s
max_standby_streaming_delay = 30s

And all standbys became stuck around at the same time during the same 
duration.

We tried to put aggressive tcp_keepalives* settings :

tcp_keepalives_count             = 3
tcp_keepalives_idle              = 5
tcp_keepalives_interval          = 5
client_connection_check_interval = 10s

It changed nothing, we are investigating why they are not working.
We suspect "something" in k8s network layerS.
Anyway, it is on "system side".

As mentioned in this thread, we expect the query should be canceled 
after 30s in all cases (even if network is lying).

(The main issue was hot_standby_feedback was at off. However, I know, 
even at on, it can't prevent all recovery conflicts. That's why I wanted
to add a secure belt with keepalives settings).

Thanks

-- 
Adrien NAYRAT

pgsql-hackers by date:

From: Andres Freund
Date: 16 December 2025, 17:55:09
Subject: Re: Change the signature of pgstat_report_vacuum() so that it's passed a Relation

From: Viktor Holmberg
Date: 16 December 2025, 18:14:06
Subject: Re: ON CONFLICT DO SELECT (take 3)

Re: Queries that should be canceled will get stuck on secure_write function - Mailing list pgsql-hackers

Previous

Next