B-tree descend for insertion locking - Mailing list pgsql-hackers

When inserting into a B-tree index, all the pages are read-locked when
descending the tree. When we reach the leaf page, the read-lock is
exchanged for a write-lock.

There's nothing wrong with that, but why don't we just directly grab a
write-lock on the leaf page? When descending, we know the level we're
on, and what level the child page is. The only downside I can see is
that we would unnecessarily hold a write-lock when a read-lock would
suffice, if the page was just split and we have to move right. But that
seems like a really bad bet - hitting the page when it was just split is
highly unlikely.

Grabbing the write-lock directly would obviously save one buffer
unlock+lock sequence, for what it's worth, but I think it would also
slightly simplify the code. Am I missing something?

See attached patch. The new contract of _bt_getroot is a bit weird: it
locks the returned page in the requested lock-mode, shared or exclusive,
if the root page was also the leaf page. Otherwise it's locked in shared
mode regardless off the requested lock mode. But OTOH, the new contract
for _bt_search() is more clear now: it actually locks the returned page
in the requested mode, where it used to only use the access parameter to
decide whether to create a root page if the index was empty.

- Heikki

Attachment

pgsql-hackers by date:

Previous
From: Stephen Frost
Date:
Subject: Re: GSoC question
Next
From: Robert Haas
Date:
Subject: Re: Triggers on foreign tables