Re: Too coarse predicate locks granularity for B+ tree indexes - Mailing list pgsql-general
From | Rinat Shigapov |
---|---|
Subject | Re: Too coarse predicate locks granularity for B+ tree indexes |
Date | |
Msg-id | CAJneKMWy+4SQ6_JTESEzRJakaVX2gfGSkH24zvDjqymZ3LK05A@mail.gmail.com Whole thread Raw |
In response to | Re: Too coarse predicate locks granularity for B+ tree indexes (Laurenz Albe <laurenz.albe@cybertec.at>) |
List | pgsql-general |
Thank you for your prompt reply!
I've mentioned that I've generated ballast data to make the cost optimizer to switch to page-level locks.
But my question is about more finer grained (less then page) predicate locks for indices. With page-level locks I could still get serialization failures if I add more queries (or emulate it with sleeps) to the transaction with the UPDATE Users query.
I've mentioned that I've generated ballast data to make the cost optimizer to switch to page-level locks.
But my question is about more finer grained (less then page) predicate locks for indices. With page-level locks I could still get serialization failures if I add more queries (or emulate it with sleeps) to the transaction with the UPDATE Users query.
Below I describe the problem again for psql-general:
Concurrent `update_user` operation run the UPDATE query to change user email to a unique value
I use the following helper view to monitor locks:
I have a concurrent testsuite that runs 14 test cases. Each test case operates on a disjoint set of records, doesn't retry transactions and is run under 'serializable' isolation level. The test data is small and likely fits within a single tuple page.
When I finished the test suite I was surprised that PostgreSQL 14.5 returns serialization failure on every test suite run. I was even more surprised when I tested the suite against the current CockroachDB and didn't get serialization failures. Actually I was able to reproduce RETRY_SERIALIZABLE errors a couple of times on CockroachDB but it required me to run the test suite in a loop for more than a half hour.
I started to investigate the test behavior with PostgreSQL with more simplified and shrinked code and found a serialization failure of two concurrent `update_user` operations.
The test defines the following `Users` table:
CREATE TABLE Users (
id UUID,
title VARCHAR(255),
first_name VARCHAR(40),
last_name VARCHAR(80) NOT NULL,
email VARCHAR(255) NOT NULL,
lower_email VARCHAR(255) GENERATED ALWAYS AS (lower(email)) STORED,
marketing_optin BOOLEAN,
mobile_phone VARCHAR(50),
phone VARCHAR(50),
phone_ext VARCHAR(40),
is_contact BOOLEAN DEFAULT false NOT NULL,
unlinked_link_ids UUID[],
CONSTRAINT unique_user_email UNIQUE(lower_email),
PRIMARY KEY (id)
);
UPDATE Users
SET
title = CASE WHEN false= true THEN 'foo' ELSE title END,
first_name = CASE WHEN false= true THEN 'foo' ELSE first_name END,
last_name = CASE WHEN false= true THEN 'foo' ELSE last_name END,
email = CASE WHEN true = true THEN 'email2' ELSE email END,
marketing_optin = CASE WHEN false = true THEN true ELSE marketing_optin END,
mobile_phone = CASE WHEN false = true THEN 'foo' ELSE mobile_phone END,
phone = CASE WHEN false = true THEN 'foo' ELSE phone END,
phone_ext = CASE WHEN false = true THEN 'foo' ELSE phone_ext END
WHERE id = '018629fd-7b28-743c-8647-b6321c166d46';
I use the following helper view to monitor locks:
CREATE VIEW locks_v AS
SELECT pid,
virtualtransaction,
locktype,
CASE locktype
WHEN 'relation' THEN relation::regclass::text
WHEN 'virtualxid' THEN virtualxid::text
WHEN 'transactionid' THEN transactionid::text
WHEN 'tuple' THEN relation::regclass::text||':'||page::text||':'||tuple::text
WHEN 'page' THEN relation::regclass::text||':'||page::text
END AS lockid,
mode,
granted
FROM pg_locks;
When the test Users table has only a few records the query uses a sequential scan the serialization failure is reproducible without inserting sleeps before `update_user` transaction commit.
This is caused by relation level predicate locks on Users table:
This is caused by relation level predicate locks on Users table:
select * from locks_v;
pid | virtualtransaction | locktype | lockid | mode | granted
------+--------------------+---------------+-------------------+------------------+---------
3676 | 5/2444 | relation | unique_user_email | RowExclusiveLock | t
3676 | 5/2444 | relation | users_pkey | RowExclusiveLock | t
3676 | 5/2444 | relation | users | RowExclusiveLock | t
3676 | 5/2444 | virtualxid | 5/2444 | ExclusiveLock | t
3737 | 4/13470 | relation | pg_locks | AccessShareLock | t
3737 | 4/13470 | relation | locks_v | AccessShareLock | t
3737 | 4/13470 | virtualxid | 4/13470 | ExclusiveLock | t
3669 | 3/17334 | relation | unique_user_email | RowExclusiveLock | t
3669 | 3/17334 | relation | users_pkey | RowExclusiveLock | t
3669 | 3/17334 | relation | users | RowExclusiveLock | t
3669 | 3/17334 | virtualxid | 3/17334 | ExclusiveLock | t
3676 | 5/2444 | transactionid | 6571 | ExclusiveLock | t
3669 | 3/17334 | transactionid | 6570 | ExclusiveLock | t
3676 | 5/2444 | relation | users | SIReadLock | t
3669 | 3/17334 | relation | users | SIReadLock | t
(15 rows)
If I add ballast data to Users table (1000 records) the cost optimizer switches to index scan and it's hard to reproduce the issue for two concurrent `update_user` operations without sleeps. After adding long sleeps after UPDATE query and before commit I could see page-level predicates locks for the primary key index users_pkey:
select * from locks_v;
pid | virtualtransaction | locktype | lockid | mode | granted
-----+--------------------+---------------+-------------------+------------------+---------
371 | 6/523 | relation | unique_user_email | RowExclusiveLock | t
371 | 6/523 | relation | users_pkey | RowExclusiveLock | t
371 | 6/523 | relation | users | RowExclusiveLock | t
371 | 6/523 | virtualxid | 6/523 | ExclusiveLock | t
381 | 14/215 | relation | unique_user_email | RowExclusiveLock | t
381 | 14/215 | relation | users_pkey | RowExclusiveLock | t
381 | 14/215 | relation | users | RowExclusiveLock | t
381 | 14/215 | virtualxid | 14/215 | ExclusiveLock | t
350 | 4/885 | relation | pg_locks | AccessShareLock | t
350 | 4/885 | relation | locks_v | AccessShareLock | t
350 | 4/885 | virtualxid | 4/885 | ExclusiveLock | t
371 | 6/523 | transactionid | 1439 | ExclusiveLock | t
381 | 14/215 | transactionid | 1431 | ExclusiveLock | t
381 | 14/215 | page | users_pkey:5 | SIReadLock | t
371 | 6/523 | page | users_pkey:5 | SIReadLock | t
(15 rows)
With sleeps the serialization failure is reproduced on each run.
I started to read more about SSI implementation in PostgreSQL. The article https://arxiv.org/pdf/1208.4179.pdf mentions that
I started to read more about SSI implementation in PostgreSQL. The article https://arxiv.org/pdf/1208.4179.pdf mentions that
Currently, locks on B+-tree indexes are acquired at page granularity; we intend to refine this to next-key locking [16] in a future release.
[16] C. Mohan. ARIES/KVL: A key-value locking method for concurrency control of multiaction transactions operating on B-tree indexes. In VLDB, pages 392–405, 1990.
My question follows:
Does the current PostgreSQL release support B+ tree index predicate locks more granular then page-level locks?
With kindest regards, Rinat Shigapov
вт, 7 февр. 2023 г. в 16:29, Laurenz Albe <laurenz.albe@cybertec.at>:
On Tue, 2023-02-07 at 16:23 +0600, Rinat Shigapov wrote:
> I have a concurrent testsuite that runs 14 test cases. Each test case operates
> on a disjoint set of records, doesn't retry transactions and is run under
> 'serializable' isolation level. The test data is small and likely fits within
> a single tuple page.
>
> When I finished the test suite I was surprised that PostgreSQL 14.5 returns
> serialization failure on every test suite run.
This is no question for the hackers list; redirecting to general.
That behavior sounds perfectly normal to me: if everything is in a single
page, PostgreSQL probably won't use an index scan. With a sequential scan,
the predicate lock will be on the whole table. So you should expect
serialization failures. This is well documented.
Perhaps you should use a more realistic test case with a reasonable
amount of data.
Yours,
Laurenz Albe
pgsql-general by date: