On 18 March 2014 11:12, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:
> When inserting into a B-tree index, all the pages are read-locked when
> descending the tree. When we reach the leaf page, the read-lock is exchanged
> for a write-lock.
>
> There's nothing wrong with that, but why don't we just directly grab a
> write-lock on the leaf page? When descending, we know the level we're on,
> and what level the child page is. The only downside I can see is that we
> would unnecessarily hold a write-lock when a read-lock would suffice, if the
> page was just split and we have to move right. But that seems like a really
> bad bet - hitting the page when it was just split is highly unlikely.
Sounds good.
Grabbing write lock directly will reduce contention on the buffer, not
just reduce the code path.
If we have a considerable number of duplicates we would normally step
thru until we found a place to insert. Presumably that will happen
with write lock enabled, rather than read lock. Would this slow down
the insertion of highly duplicate keys under concurrent load? i.e. is
this a benefit for nearly-unique but not for other cases?
-- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services