About row locking ordering - Mailing list pgsql-general

From Ryo Yamaji (Fujitsu)
Subject About row locking ordering
Date
Msg-id TYAPR01MB6073506ECCD7B8F51DA807F68A0F9@TYAPR01MB6073.jpnprd01.prod.outlook.com
Whole thread Raw
Responses Re: About row locking ordering  (Kirk Wolak <wolakk@gmail.com>)
List pgsql-general
Hi All.

When multiple sessions tried to acquire a row lock on the same tuple sequentially, I expected
that the first session waiting for the lock would acquire the lock first (FIFO). However, when we
actually tested it, there were cases where a session that was behind a first session acquired
a row lock ahead of first session.
Is this behavior is expected specifications or is it a bug?

The following is a list of the cases where the next session gets the lock first.
・Multiple sessions request row locks for the same tuple
・Update occurs for target tuple

The above behavior may have the following problems:
・If the lock acquiring order is FIFO in Postgres, it is not satisfied.
・Any sessions which cannot acquire a lock for a long time result in a long transaction

If the above behavior is a specification, I think it is necessary to document design considerations
when dealing with row locks.

[Condition]
・Version
    - PostgreSQL 14.0 (compiled with LOCK_DEBUG)
・postgresql.conf
  - log_lock_waits = on
  - deadlock_timeout = 10
  - log_line_prefix = '%m [%p:%l:%x] '
  - trace_locks = on
  - trace_lock_oidmin = 16000

[Table Definition]
create table t1 (col1 int, col2 int, col3 char (10));
insert into t1 values (1,0, ''); insert into t1 values (100, 9999, '');

[Application]
<a.sql> ---------------------------
   BEGIN;
   SELECT * FROM t1 WHERE col1 = 1 FOR UPDATE;
   SELECT current_timestamp AS TRANSTART, clock_timestamp() AS NOW;
   UPDATE t1 set col2 = :val, col3 = :valb WHERE col1 = 1;
   SELECT pg_sleep(1);
   COMMIT;
------------------------------------

[Test]
The test set is attached.
1. Run the following in 100 multiplex
 for i in `seq 100`; do psql postgres -f ./a.sql -v val=${i} -v valb="'a.sh'" &; done
2.After 10 seconds (1. After completion of execution), run the following in 10 multiplexes
 for i in `seq 10`; do ii=`expr ${i} + 900`; psql postgres -f ./a.sql -v val=${ii} -v valb="'b.sh'" &; done

All a.sh were expected to complete processing first (FIFO). However, b.sh completed the process
before part of a.sh.

[Log]
We are checking two types of logs.

1. ShareLock has one wait, the rest is in AccessExclusiveLock

1-1. Only 1369555 is aligned with ShareLock, the transaction ID obtained by 1369547, and the rest with
  AccessExclusiveLock, the tuple obtained by 1369555.
  This is similar to a pattern in which no updates have occurred to the tuple.
--------------------------------------------------------------
2022-10-26 01:20:08.881 EDT [1369555:19:0] LOG: process 1369555 still waiting for ShareLock on transaction 2501 after
10.072ms 
2022-10-26 01:20:08.881 EDT [1369555:20:0] DETAIL: Process holding the lock: 1369547. Wait queue: 1369555.
〜
2022-10-26 01:21:58.918 EDT [1369898:17:0] LOG: process 1369898 acquired AccessExclusiveLock on tuple (1, 0) of
relation16546 of database 13779 after 10.321 ms 
2022-10-26 01:21:58.918 EDT [1369898:18:0] DETAIL: Process holding the lock: 1369555. Wait queue: 1369558, 1369561,
1369564,1369567, 1369570, 1369573, 1369576, ... 
--------------------------------------------------------------


2. All processes wait with ShareLock

2-1. With 1369558 holding the t1 (0, 4) lock, the queue head is 1369561.
--------------------------------------------------------------
2022-10-26 01:22:27.230 EDT [1369623:46:2525] LOG: process 1369623 still waiting for ShareLock on transaction 2504
after10.133 msprocess 1369623 still waiting for ShareLock on transaction 2504 after 10.133 ms 
2022-10-26 01:22:27.242 EDT [1369877:47:2604] DETAIL: Process holding the lock: 1369558. Wait queue: 1369561, 1369623,
1369626,... 
--------------------------------------------------------------

2-2. When 1369558 locks are released, the first 1369561 in the Wait queue was expected to acquire the lock,
  but the process actually acquired 1369787
--------------------------------------------------------------
2022-10-26 01:22:28.237 EDT [1369623:63:2525] LOG: process 1369623 still waiting for ShareLock on transaction 2577
after10.028 ms 
2022-10-26 01:22:28.237 EDT [1369623:64:2525] DETAIL: Process holding the lock: 1369787. Wait queue: 1369623, 1369610,
1369614,1369617, 1369620. 
--------------------------------------------------------------

2-3. Checking that the 1369561 is rearranging.
--------------------------------------------------------------
2022-10-26 01:22:28.237 EDT [1369629:64:2527] DETAIL: Process holding the lock: 1369623. Wait queue: 1369629, 1369821,
1369644,... 1369561, ... 
--------------------------------------------------------------


Regerds, Ryo

Attachment

pgsql-general by date:

Previous
From: "zxwsbg"
Date:
Subject: Re:Why not set builder->next_phase_at=InvalidTransactionId in SnapBuildRestore function?
Next
From: Rama Krishnan
Date:
Subject: How to select unique records in PostgreSQL