Home > mailing lists

Re: Undetected deadlock between client backend and startup processes on a standby (Previously, Undetected deadlock between primary and standby processes) - Mailing list pgsql-bugs

From	Tomas Vondra
Subject	Re: Undetected deadlock between client backend and startup processes on a standby (Previously, Undetected deadlock between primary and standby processes)
Date	March 10, 2024 20:43:11
Msg-id	ea96bc84-e242-4179-a440-9d4b8a7bae9f@enterprisedb.com Whole thread Raw
In response to	RE:Undetected deadlock between client backend and startup processes on a standby (Previously, Undetected deadlock between primary and standby processes) (<Rintaro.Ikeda@nttdata.com>)
List	pgsql-bugs

Tree view


On 3/4/24 09:35, Rintaro.Ikeda@nttdata.com wrote:
> Hi,
> 
> I correct the previous bug report [1] to provide a more accurate 
> description. The bug report demonstrated undetected deadlock between 
> client backend and startup processes on a standby server. (The title
> in the previous bug report is "Undetected deadlock between primary
> and standby processes". But this was wrong. Actually, this should be
> noted that "Undetected deadlock between client backend and startup
> process on a standby server".)
> 
> After the procedures proposed in my bug report [1], a recovery 
> conflict is present because the tablespace which startup process
> tries to drop is used by cliend backend process in standby. We see
> the pg_stat_activity (shown below), which implies a deadlock. A
> client backend process waits for AccessExclusiveLock to be released.
> Startup process waits for recovery conflict resolution for dropping
> the tablespace. This deadlock is not resolved after deadlock_timeout
> passes.
>
> (Standby server)
> postgres=# select datid, datname, wait_event_type, wait_event, query, backend_type from pg_stat_activity ;
> datid | datname  | wait_event_type |         wait_event         |                                              query
                                           |   backend_type
 
>
-------+----------+-----------------+----------------------------+-------------------------------------------------------------------------------------------------+-------------------
>      5 | postgres | Lock            | relation                   | SELECT * FROM t;
                                            | client backend
 
>        |          | IPC             | RecoveryConflictTablespace |
                                            | startup
 
> 
> 
> This deadlock is similar to the previously identified and patched 
> issue [2], which also involved an undetected deadlock between
> backend process and recovery on a standby server. I think the
> deadlock explained in this report should be detected and resolved.
>

Thanks for the report.

So what are the steps to reproduce this? The previous message did all
kinds of stuff on the primary and then got stuck on pg_switch_wal() on
the primary, but this updated seems to do stuff on the standby and gets
the lockup there.

It seems similar in the sense that it's about interaction between
recovery and a regular backend, but unfortunately
ResolveRecoveryConflictWithVirtualXIDs does not wait for a lock, it just
checks if the XID is still running, so it's invisible to the deadlock
detector :-(

But it's still checked against max_standby_streaming_delay, which should
resolve the deadlock (unless set to -1 to allow infinite delays) at some
point, right?

Also, I'm not very familiar with ResolveRecoveryConflictWithVirtualXIDs,
but it seems it's doing a busy wait. I wonder if that's a good idea, but
it's independent of this bug report.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

pgsql-bugs by date:

From: PG Bug reporting form
Date: 10 March 2024, 19:00:00
Subject: BUG #18385: Assert("strategy_delta >= 0") in BgBufferSync() fails due to race condition

From: Alexey Ermakov
Date: 11 March 2024, 08:12:57
Subject: Re: BUG #18349: ERROR: invalid DSA memory alloc request size 1811939328, CONTEXT: parallel worker

Re: Undetected deadlock between client backend and startup processes on a standby (Previously, Undetected deadlock between primary and standby processes) - Mailing list pgsql-bugs

Previous

Next