Thread: Too coarse predicate locks granularity for B+ tree indexes
Hi,
Concurrent `update_user` operation run the UPDATE query to change user email to a unique value
I use the following helper view to monitor locks:
TLDR: this email describes a serialization failure that happens (as I understand it) due to too coarse predicate locks granularity for primary key index.
I have a concurrent testsuite that runs 14 test cases. Each test case operates on a disjoint set of records, doesn't retry transactions and is run under 'serializable' isolation level. The test data is small and likely fits within a single tuple page.
When I finished the test suite I was surprised that PostgreSQL 14.5 returns serialization failure on every test suite run. I was even more surprised when I tested the suite against the current CockroachDB and didn't get serialization failures. Actually I was able to reproduce RETRY_SERIALIZABLE errors a couple of times on CockroachDB but it required me to run the test suite in a loop for more than a half hour.
I started to investigate the test behavior with PostgreSQL with more simplified and shrinked code and found a serialization failure of two concurrent `update_user` operations.
The test defines the following `Users` table:
CREATE TABLE Users (
id UUID,
title VARCHAR(255),
first_name VARCHAR(40),
last_name VARCHAR(80) NOT NULL,
email VARCHAR(255) NOT NULL,
lower_email VARCHAR(255) GENERATED ALWAYS AS (lower(email)) STORED,
marketing_optin BOOLEAN,
mobile_phone VARCHAR(50),
phone VARCHAR(50),
phone_ext VARCHAR(40),
is_contact BOOLEAN DEFAULT false NOT NULL,
unlinked_link_ids UUID[],
CONSTRAINT unique_user_email UNIQUE(lower_email),
PRIMARY KEY (id)
);
UPDATE Users
SET
title = CASE WHEN false= true THEN 'foo' ELSE title END,
first_name = CASE WHEN false= true THEN 'foo' ELSE first_name END,
last_name = CASE WHEN false= true THEN 'foo' ELSE last_name END,
email = CASE WHEN true = true THEN 'email2' ELSE email END,
marketing_optin = CASE WHEN false = true THEN true ELSE marketing_optin END,
mobile_phone = CASE WHEN false = true THEN 'foo' ELSE mobile_phone END,
phone = CASE WHEN false = true THEN 'foo' ELSE phone END,
phone_ext = CASE WHEN false = true THEN 'foo' ELSE phone_ext END
WHERE id = '018629fd-7b28-743c-8647-b6321c166d46';
I use the following helper view to monitor locks:
CREATE VIEW locks_v AS
SELECT pid,
virtualtransaction,
locktype,
CASE locktype
WHEN 'relation' THEN relation::regclass::text
WHEN 'virtualxid' THEN virtualxid::text
WHEN 'transactionid' THEN transactionid::text
WHEN 'tuple' THEN relation::regclass::text||':'||page::text||':'||tuple::text
WHEN 'page' THEN relation::regclass::text||':'||page::text
END AS lockid,
mode,
granted
FROM pg_locks;
When the test Users table has only a few records the query uses a sequential scan the serialization failure is reproducible without inserting sleeps before `update_user` transaction commit.
This is caused by relation level predicate locks on Users table:
This is caused by relation level predicate locks on Users table:
select * from locks_v;
pid | virtualtransaction | locktype | lockid | mode | granted
------+--------------------+---------------+-------------------+------------------+---------
3676 | 5/2444 | relation | unique_user_email | RowExclusiveLock | t
3676 | 5/2444 | relation | users_pkey | RowExclusiveLock | t
3676 | 5/2444 | relation | users | RowExclusiveLock | t
3676 | 5/2444 | virtualxid | 5/2444 | ExclusiveLock | t
3737 | 4/13470 | relation | pg_locks | AccessShareLock | t
3737 | 4/13470 | relation | locks_v | AccessShareLock | t
3737 | 4/13470 | virtualxid | 4/13470 | ExclusiveLock | t
3669 | 3/17334 | relation | unique_user_email | RowExclusiveLock | t
3669 | 3/17334 | relation | users_pkey | RowExclusiveLock | t
3669 | 3/17334 | relation | users | RowExclusiveLock | t
3669 | 3/17334 | virtualxid | 3/17334 | ExclusiveLock | t
3676 | 5/2444 | transactionid | 6571 | ExclusiveLock | t
3669 | 3/17334 | transactionid | 6570 | ExclusiveLock | t
3676 | 5/2444 | relation | users | SIReadLock | t
3669 | 3/17334 | relation | users | SIReadLock | t
(15 rows)
If I add ballast data to Users table (1000 records) the cost optimizer switches to index scan and it's hard to reproduce the issue for two concurrent `update_user` operations without sleeps. After adding long sleeps after UPDATE query and before commit I could see page-level predicates locks for the primary key index users_pkey:
select * from locks_v;
pid | virtualtransaction | locktype | lockid | mode | granted
-----+--------------------+---------------+-------------------+------------------+---------
371 | 6/523 | relation | unique_user_email | RowExclusiveLock | t
371 | 6/523 | relation | users_pkey | RowExclusiveLock | t
371 | 6/523 | relation | users | RowExclusiveLock | t
371 | 6/523 | virtualxid | 6/523 | ExclusiveLock | t
381 | 14/215 | relation | unique_user_email | RowExclusiveLock | t
381 | 14/215 | relation | users_pkey | RowExclusiveLock | t
381 | 14/215 | relation | users | RowExclusiveLock | t
381 | 14/215 | virtualxid | 14/215 | ExclusiveLock | t
350 | 4/885 | relation | pg_locks | AccessShareLock | t
350 | 4/885 | relation | locks_v | AccessShareLock | t
350 | 4/885 | virtualxid | 4/885 | ExclusiveLock | t
371 | 6/523 | transactionid | 1439 | ExclusiveLock | t
381 | 14/215 | transactionid | 1431 | ExclusiveLock | t
381 | 14/215 | page | users_pkey:5 | SIReadLock | t
371 | 6/523 | page | users_pkey:5 | SIReadLock | t
(15 rows)
With sleeps the serialization failure is reproduced on each run.
I started to read more about SSI implementation in PostgreSQL. The article https://arxiv.org/pdf/1208.4179.pdf mentions that
I started to read more about SSI implementation in PostgreSQL. The article https://arxiv.org/pdf/1208.4179.pdf mentions that
Currently, locks on B+-tree indexes are acquired at page granularity; we intend to refine this to next-key locking [16] in a future release.
[16] C. Mohan. ARIES/KVL: A key-value locking method for concurrency control of multiaction transactions operating on B-tree indexes. In VLDB, pages 392–405, 1990.
My question follows:
Does the current PostgreSQL release support B+ tree index predicate locks more granular then page-level locks?
With kindest regards, Rinat Shigapov
On Tue, Feb 7, 2023 at 11:24 PM Rinat Shigapov <rinatshigapov@gmail.com> wrote: > Does the current PostgreSQL release support B+ tree index predicate locks more granular then page-level locks? No. I tried to follow some breadcrumbs left by Kevin and Dan that should allow unique index scans that find a match to skip the btree page lock, though, and p-lock just the heap tuple. If you like half-baked experimental code, see the v4-0002 patch in this thread, where I took some shortcuts (jamming stuff that should be in the planner down into the executor) for a proof-of-concept: https://www.postgresql.org/message-id/flat/CAEepm%3D2GK3FVdnt5V3d%2Bh9njWipCv_fNL%3DwjxyUhzsF%3D0PcbNg%40mail.gmail.com With that approach, if it *doesn't* find a match, then you're back to having to p-lock the whole index page to represent the "gap", so that you can conflict with anyone who tries to insert a matching value later. I believe the next-key approach would allow for finer grained gap-locks (haven't studied that myself), but that's a secondary problem; the primary problem (it seems to me) is getting rid of index locks completely in the (common?) case that you have a qualifying match.
Thomas, thank you for the details!
Have you kept the branch that you used to generate the patch? Which commit should the patch apply to?
With kindest regards, Rinat Shigapov
вт, 7 февр. 2023 г. в 17:11, Thomas Munro <thomas.munro@gmail.com>:
On Tue, Feb 7, 2023 at 11:24 PM Rinat Shigapov <rinatshigapov@gmail.com> wrote:
> Does the current PostgreSQL release support B+ tree index predicate locks more granular then page-level locks?
No. I tried to follow some breadcrumbs left by Kevin and Dan that
should allow unique index scans that find a match to skip the btree
page lock, though, and p-lock just the heap tuple. If you like
half-baked experimental code, see the v4-0002 patch in this thread,
where I took some shortcuts (jamming stuff that should be in the
planner down into the executor) for a proof-of-concept:
https://www.postgresql.org/message-id/flat/CAEepm%3D2GK3FVdnt5V3d%2Bh9njWipCv_fNL%3DwjxyUhzsF%3D0PcbNg%40mail.gmail.com
With that approach, if it *doesn't* find a match, then you're back to
having to p-lock the whole index page to represent the "gap", so that
you can conflict with anyone who tries to insert a matching value
later. I believe the next-key approach would allow for finer grained
gap-locks (haven't studied that myself), but that's a secondary
problem; the primary problem (it seems to me) is getting rid of index
locks completely in the (common?) case that you have a qualifying
match.
On Tue, Feb 7, 2023 at 4:01 AM Rinat Shigapov <rinatshigapov@gmail.com> wrote: > > Thomas, thank you for the details! > > Have you kept the branch that you used to generate the patch? Which commit should the patch apply to? > You can try something like git checkout 'master@{2018-05-13 13:37:00}' to get a commit by date from rev-parse. Best regards, Andrey Borodin.
On Wed, Feb 8, 2023 at 5:25 AM Andrey Borodin <amborodin86@gmail.com> wrote: > On Tue, Feb 7, 2023 at 4:01 AM Rinat Shigapov <rinatshigapov@gmail.com> wrote: > > Thomas, thank you for the details! > > > > Have you kept the branch that you used to generate the patch? Which commit should the patch apply to? > > You can try something like > git checkout 'master@{2018-05-13 13:37:00}' > to get a commit by date from rev-parse. I don't have time to work on this currently but if Rinat or others want to look into it... maybe I should rebase that experiment on top of current master. Here's the branch: https://github.com/macdice/postgres/tree/ssi-index-locking-refinements
On Wed, Feb 8, 2023 at 10:44 AM Thomas Munro <thomas.munro@gmail.com> wrote: > On Wed, Feb 8, 2023 at 5:25 AM Andrey Borodin <amborodin86@gmail.com> wrote: > > On Tue, Feb 7, 2023 at 4:01 AM Rinat Shigapov <rinatshigapov@gmail.com> wrote: > > > Thomas, thank you for the details! > > > > > > Have you kept the branch that you used to generate the patch? Which commit should the patch apply to? > > > > You can try something like > > git checkout 'master@{2018-05-13 13:37:00}' > > to get a commit by date from rev-parse. > > I don't have time to work on this currently but if Rinat or others > want to look into it... maybe I should rebase that experiment on top > of current master. Here's the branch: > > https://github.com/macdice/postgres/tree/ssi-index-locking-refinements Erm, I guess I should also post the rebased patches here, for the mailing list archives.