Thread: Too coarse predicate locks granularity for B+ tree indexes

Too coarse predicate locks granularity for B+ tree indexes

From

Rinat Shigapov

Date:

07 February 2023, 10:23:54

Hi,

TLDR: this email describes a serialization failure that happens (as I understand it) due to too coarse predicate locks granularity for primary key index.

I have a concurrent testsuite that runs 14 test cases. Each test case operates on a disjoint set of records, doesn't retry transactions and is run under 'serializable' isolation level. The test data is small and likely fits within a single tuple page.

When I finished the test suite I was surprised that PostgreSQL 14.5 returns serialization failure on every test suite run. I was even more surprised when I tested the suite against the current CockroachDB and didn't get serialization failures. Actually I was able to reproduce RETRY_SERIALIZABLE errors a couple of times on CockroachDB but it required me to run the test suite in a loop for more than a half hour.

I started to investigate the test behavior with PostgreSQL with more simplified and shrinked code and found a serialization failure of two concurrent `update_user` operations.

The test defines the following `Users` table:

CREATE TABLE Users (
id UUID,
title VARCHAR(255),
first_name VARCHAR(40),
last_name VARCHAR(80) NOT NULL,
email VARCHAR(255) NOT NULL,
lower_email VARCHAR(255) GENERATED ALWAYS AS (lower(email)) STORED,
marketing_optin BOOLEAN,
mobile_phone VARCHAR(50),
phone VARCHAR(50),
phone_ext VARCHAR(40),
is_contact BOOLEAN DEFAULT false NOT NULL,
unlinked_link_ids UUID[],

CONSTRAINT unique_user_email UNIQUE(lower_email),
PRIMARY KEY (id)
);

Concurrent `update_user` operation run the UPDATE query to change user email to a unique value

UPDATE Users
SET
title = CASE WHEN false= true THEN 'foo' ELSE title END,
first_name = CASE WHEN false= true THEN 'foo' ELSE first_name END,
last_name = CASE WHEN false= true THEN 'foo' ELSE last_name END,
email = CASE WHEN true = true THEN 'email2' ELSE email END,
marketing_optin = CASE WHEN false = true THEN true ELSE marketing_optin END,
mobile_phone = CASE WHEN false = true THEN 'foo' ELSE mobile_phone END,
phone = CASE WHEN false = true THEN 'foo' ELSE phone END,
phone_ext = CASE WHEN false = true THEN 'foo' ELSE phone_ext END
WHERE id = '018629fd-7b28-743c-8647-b6321c166d46';

I use the following helper view to monitor locks:

CREATE VIEW locks_v AS
SELECT pid,
virtualtransaction,
locktype,
CASE locktype
WHEN 'relation' THEN relation::regclass::text
WHEN 'virtualxid' THEN virtualxid::text
WHEN 'transactionid' THEN transactionid::text
WHEN 'tuple' THEN relation::regclass::text||':'||page::text||':'||tuple::text
WHEN 'page' THEN relation::regclass::text||':'||page::text
END AS lockid,
mode,
granted
FROM pg_locks;

When the test Users table has only a few records the query uses a sequential scan the serialization failure is reproducible without inserting sleeps before `update_user` transaction commit.

This is caused by relation level predicate locks on Users table:

select * from locks_v;
pid | virtualtransaction | locktype | lockid | mode | granted
------+--------------------+---------------+-------------------+------------------+---------
3676 | 5/2444 | relation | unique_user_email | RowExclusiveLock | t
3676 | 5/2444 | relation | users_pkey | RowExclusiveLock | t
3676 | 5/2444 | relation | users | RowExclusiveLock | t
3676 | 5/2444 | virtualxid | 5/2444 | ExclusiveLock | t
3737 | 4/13470 | relation | pg_locks | AccessShareLock | t
3737 | 4/13470 | relation | locks_v | AccessShareLock | t
3737 | 4/13470 | virtualxid | 4/13470 | ExclusiveLock | t
3669 | 3/17334 | relation | unique_user_email | RowExclusiveLock | t
3669 | 3/17334 | relation | users_pkey | RowExclusiveLock | t
3669 | 3/17334 | relation | users | RowExclusiveLock | t
3669 | 3/17334 | virtualxid | 3/17334 | ExclusiveLock | t
3676 | 5/2444 | transactionid | 6571 | ExclusiveLock | t
3669 | 3/17334 | transactionid | 6570 | ExclusiveLock | t
3676 | 5/2444 | relation | users | SIReadLock | t
3669 | 3/17334 | relation | users | SIReadLock | t
(15 rows)

If I add ballast data to Users table (1000 records) the cost optimizer switches to index scan and it's hard to reproduce the issue for two concurrent `update_user` operations without sleeps. After adding long sleeps after UPDATE query and before commit I could see page-level predicates locks for the primary key index users_pkey:

select * from locks_v;
pid | virtualtransaction | locktype | lockid | mode | granted
-----+--------------------+---------------+-------------------+------------------+---------
371 | 6/523 | relation | unique_user_email | RowExclusiveLock | t
371 | 6/523 | relation | users_pkey | RowExclusiveLock | t
371 | 6/523 | relation | users | RowExclusiveLock | t
371 | 6/523 | virtualxid | 6/523 | ExclusiveLock | t
381 | 14/215 | relation | unique_user_email | RowExclusiveLock | t
381 | 14/215 | relation | users_pkey | RowExclusiveLock | t
381 | 14/215 | relation | users | RowExclusiveLock | t
381 | 14/215 | virtualxid | 14/215 | ExclusiveLock | t
350 | 4/885 | relation | pg_locks | AccessShareLock | t
350 | 4/885 | relation | locks_v | AccessShareLock | t
350 | 4/885 | virtualxid | 4/885 | ExclusiveLock | t
371 | 6/523 | transactionid | 1439 | ExclusiveLock | t
381 | 14/215 | transactionid | 1431 | ExclusiveLock | t
381 | 14/215 | page | users_pkey:5 | SIReadLock | t
371 | 6/523 | page | users_pkey:5 | SIReadLock | t
(15 rows)

With sleeps the serialization failure is reproduced on each run.

I started to read more about SSI implementation in PostgreSQL. The article https://arxiv.org/pdf/1208.4179.pdf mentions that

Currently, locks on B+-tree indexes are acquired at page granularity; we intend to refine this to next-key locking [16] in a future release.

[16] C. Mohan. ARIES/KVL: A key-value locking method for concurrency control of multiaction transactions operating on B-tree indexes. In VLDB, pages 392–405, 1990.

My question follows:

Does the current PostgreSQL release support B+ tree index predicate locks more granular then page-level locks?

With kindest regards, Rinat Shigapov

Re: Too coarse predicate locks granularity for B+ tree indexes

From

Thomas Munro

Date:

07 February 2023, 11:10:37

On Tue, Feb 7, 2023 at 11:24 PM Rinat Shigapov <rinatshigapov@gmail.com> wrote:
> Does the current PostgreSQL release support B+ tree index predicate locks more granular then page-level locks?

No.  I tried to follow some breadcrumbs left by Kevin and Dan that
should allow unique index scans that find a match to skip the btree
page lock, though, and p-lock just the heap tuple.  If you like
half-baked experimental code, see the v4-0002 patch in this thread,
where I took some shortcuts (jamming stuff that should be in the
planner down into the executor) for a proof-of-concept:


https://www.postgresql.org/message-id/flat/CAEepm%3D2GK3FVdnt5V3d%2Bh9njWipCv_fNL%3DwjxyUhzsF%3D0PcbNg%40mail.gmail.com

With that approach, if it *doesn't* find a match, then you're back to
having to p-lock the whole index page to represent the "gap", so that
you can conflict with anyone who tries to insert a matching value
later.  I believe the next-key approach would allow for finer grained
gap-locks (haven't studied that myself), but that's a secondary
problem; the primary problem (it seems to me) is getting rid of index
locks completely in the (common?) case that you have a qualifying
match.

Re: Too coarse predicate locks granularity for B+ tree indexes

From

Rinat Shigapov

Date:

07 February 2023, 12:00:48

Thomas, thank you for the details!

Have you kept the branch that you used to generate the patch? Which commit should the patch apply to?

With kindest regards, Rinat Shigapov

вт, 7 февр. 2023 г. в 17:11, Thomas Munro <thomas.munro@gmail.com>:

On Tue, Feb 7, 2023 at 11:24 PM Rinat Shigapov <rinatshigapov@gmail.com> wrote:
> Does the current PostgreSQL release support B+ tree index predicate locks more granular then page-level locks?

No. I tried to follow some breadcrumbs left by Kevin and Dan that
should allow unique index scans that find a match to skip the btree
page lock, though, and p-lock just the heap tuple. If you like
half-baked experimental code, see the v4-0002 patch in this thread,
where I took some shortcuts (jamming stuff that should be in the
planner down into the executor) for a proof-of-concept:

https://www.postgresql.org/message-id/flat/CAEepm%3D2GK3FVdnt5V3d%2Bh9njWipCv_fNL%3DwjxyUhzsF%3D0PcbNg%40mail.gmail.com

With that approach, if it *doesn't* find a match, then you're back to
having to p-lock the whole index page to represent the "gap", so that
you can conflict with anyone who tries to insert a matching value
later. I believe the next-key approach would allow for finer grained
gap-locks (haven't studied that myself), but that's a secondary
problem; the primary problem (it seems to me) is getting rid of index
locks completely in the (common?) case that you have a qualifying
match.

Re: Too coarse predicate locks granularity for B+ tree indexes

From

Andrey Borodin

Date:

07 February 2023, 16:24:44

On Tue, Feb 7, 2023 at 4:01 AM Rinat Shigapov <rinatshigapov@gmail.com> wrote:
>
> Thomas, thank you for the details!
>
> Have you kept the branch that you used to generate the patch? Which commit should the patch apply to?
>

You can try something like
git checkout 'master@{2018-05-13 13:37:00}'
to get a commit by date from rev-parse.

Best regards, Andrey Borodin.

Re: Too coarse predicate locks granularity for B+ tree indexes

From

Thomas Munro

Date:

07 February 2023, 21:44:36

On Wed, Feb 8, 2023 at 5:25 AM Andrey Borodin <amborodin86@gmail.com> wrote:
> On Tue, Feb 7, 2023 at 4:01 AM Rinat Shigapov <rinatshigapov@gmail.com> wrote:
> > Thomas, thank you for the details!
> >
> > Have you kept the branch that you used to generate the patch? Which commit should the patch apply to?
>
> You can try something like
> git checkout 'master@{2018-05-13 13:37:00}'
> to get a commit by date from rev-parse.

I don't have time to work on this currently but if Rinat or others
want to look into it... maybe I should rebase that experiment on top
of current master.  Here's the branch:

https://github.com/macdice/postgres/tree/ssi-index-locking-refinements

Re: Too coarse predicate locks granularity for B+ tree indexes

From

Thomas Munro

Date:

07 February 2023, 21:51:08

On Wed, Feb 8, 2023 at 10:44 AM Thomas Munro <thomas.munro@gmail.com> wrote:
> On Wed, Feb 8, 2023 at 5:25 AM Andrey Borodin <amborodin86@gmail.com> wrote:
> > On Tue, Feb 7, 2023 at 4:01 AM Rinat Shigapov <rinatshigapov@gmail.com> wrote:
> > > Thomas, thank you for the details!
> > >
> > > Have you kept the branch that you used to generate the patch? Which commit should the patch apply to?
> >
> > You can try something like
> > git checkout 'master@{2018-05-13 13:37:00}'
> > to get a commit by date from rev-parse.
>
> I don't have time to work on this currently but if Rinat or others
> want to look into it... maybe I should rebase that experiment on top
> of current master.  Here's the branch:
>
> https://github.com/macdice/postgres/tree/ssi-index-locking-refinements

Erm, I guess I should also post the rebased patches here, for the
mailing list archives.

Thread: Too coarse predicate locks granularity for B+ tree indexes

Too coarse predicate locks granularity for B+ tree indexes

Re: Too coarse predicate locks granularity for B+ tree indexes

Re: Too coarse predicate locks granularity for B+ tree indexes

Re: Too coarse predicate locks granularity for B+ tree indexes

Re: Too coarse predicate locks granularity for B+ tree indexes

Re: Too coarse predicate locks granularity for B+ tree indexes

Attachment