Undetected deadlock between primary and standby processes - Mailing list pgsql-bugs

From
Subject Undetected deadlock between primary and standby processes
Date
Msg-id OS7PR01MB11702355B9A28CE07242507B6CE422@OS7PR01MB11702.jpnprd01.prod.outlook.com
Whole thread Raw
Responses RE:Undetected deadlock between client backend and startup processes on a standby (Previously, Undetected deadlock between primary and standby processes)  (<Rintaro.Ikeda@nttdata.com>)
List pgsql-bugs

Hi,

 

We found a undetected deadlock between a client backend process on the primary and a startup process on the standby during replication.

 

To reproduce the situation, follow the steps below. After completing the procedures, both the backend process and the startup process are left waiting for each other, resulting in a deadlock. The deadlock is not automatically detected and resolved.

 

1. (primary) setup the primary database cluster and create a table space

$ initdb -D data --no-locale --encoding=UTF8

$ pg_ctl -D data start

$ mkdir /tmp/hoge1

$ psql -c "CREATE TABLESPACE hoge LOCATION '/tmp/hoge1'"

 

2. (standby) setup the standby database cluster

$ pg_basebackup -D sby1 -R -T /tmp/hoge1=/tmp/hoge2 -X fetch

$ echo "port = 5433" >> sby1/postgresql.conf

$ echo "temp_tablespaces = 'hoge'" >> sby1/postgresql.conf

$ echo "max_standby_streaming_delay = -1" >> sby1/postgresql.conf

$ pg_ctl -D sby1 start

 

3. (primary) create table and get ACCESS EXCLUSIVE lock in primary

CREATE TABLE t();

BEGIN;

LOCK TABLE t IN ACCESS EXCLUSIVE MODE;

SELECT pg_switch_wal();

 

4. (standby) execute SELECT with ORDER BY clause to produce a temporary file.

BEGIN;

SET work_mem TO 64;

DECLARE mycur CURSOR FOR SELECT * FROM generate_series(1, 1000000) n ORDER BY n;

FETCH mycur;

SELECT * FROM t;

 

5. (new session on the primary) drop the table space in a new session other than the one which created table t().

DROP TABLESPACE hoge;

 

6. check the waiting event

(primary)

postgres=# select datid, datname, wait_event_type, wait_event, query, backend_type from pg_stat_activity ;

datid | datname  | wait_event_type |     wait_event      |                                              query                                              |         backend_type

-------+----------+-----------------+---------------------+-------------------------------------------------------------------------------------------------+------------------------------

     5 | postgres | Client          | ClientRead          | SELECT pg_switch_wal();                                                                         | client backend

 

(standby)

postgres=#  select datid, datname, wait_event_type, wait_event, query, backend_type from pg_stat_activity ;

datid | datname  | wait_event_type |         wait_event         |                                              query                                              |   backend_type

-------+----------+-----------------+----------------------------+-------------------------------------------------------------------------------------------------+-------------------

     5 | postgres | Lock            | relation                   | SELECT * FROM t;                                                                                | client backend

       |          | IPC             | RecoveryConflictTablespace |                                                                                                 | startup

 

 

My environment is following.

PostgreSQL: 16.1

OS: Rocky Linux 9

 

 

With Regards,

Rintaro Ikeda

NTT DATA GROUP CORPORATION

 

 

 

 

pgsql-bugs by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: Potential data loss due to race condition during logical replication slot creation
Next
From: PG Bug reporting form
Date:
Subject: BUG #18323: Cannot install the best update candidate for package gdal36-libs-3.6.4-5PGDG.rhel9.x86_64