Re: pgsql: Optimize btree insertions for common case of increasing values - Mailing list pgsql-committers

From Peter Geoghegan
Subject Re: pgsql: Optimize btree insertions for common case of increasing values
Date
Msg-id CAH2-Wzm_i6TkxppTsT7ZwkGBfaa9fgZrFaW99909-+qYy2BFUA@mail.gmail.com
Whole thread Raw
In response to Re: pgsql: Optimize btree insertions for common case of increasing values  (Pavan Deolasee <pavan.deolasee@gmail.com>)
Responses Re: pgsql: Optimize btree insertions for common case of increasing values  (Pavan Deolasee <pavan.deolasee@gmail.com>)
List pgsql-committers
On Wed, Apr 4, 2018 at 5:33 AM, Pavan Deolasee <pavan.deolasee@gmail.com> wrote:
> Ok. Adding a check for tree height and setting target block only if it's >=
> 2, as suggested by you and Simon later. Rahila helped me also ran another
> round of tests and this does not lead to any performance regression (we were
> worried about whether calling _bt_getrootheight will be expensive).

Right.

> Also moved RelationSetTargetBlock() to a point where we are not holding any
> locks and are outside the critical section.

Right.

>> * An assertion would make me feel a lot better about depending on not
>> having a page split from a significant distance.
>
>
> Ok. I assume you mean an assertion to check that the target page doesn't
> have an in-complete split. Added that though not sure if it's useful since
> we concluded that right-most page can't have in-complete split.
>
> Let me know if you mean something else.

I meant something else. I was talking about the assertion discussed
below. I don't see too much point in the !P_INCOMPLETE_SPLIT(lpageop)
assertion, though.

>> Your optimization should continue to not be used when it would result
>> in a page split, but only because that would be slow. The comments
>> should say so, IMV.
>
>
> Added.

I think the wording here could use some tweaking:

>             /*
> -            * Check if the page is still the rightmost leaf page, has enough
> -            * free space to accommodate the new tuple, no split is in progress
> -            * and the scankey is greater than or equal to the first key on the
> -            * page.
> +            * Check if the page is still the rightmost valid leaf page, has
> +            * enough free space to accommodate the new tuple and the scankey
> +            * is strictly greater than the first key on the page.
> +            *
> +            * NB: We could take the fastpath even when the target block
> +            * doesn't have enough free space (but it's the right-most block)
> +            * since _bt_insertonpg() is capable of working with a NULL stack
> +            * and that's the only additional thing the slow path sets up. But
> +            * we don't optimise for that case because while spliting and
> +            * inserting into the parent without the stack is relatively slow
> +            * operation.
>              */

I would cut this down, and just say "We only insert if it definitely
won't result in a pagesplit" -- no need for the second paragraph in
this high-level routine. The full details can go on top of the new
_bt_insertonpg() assertion I talk about later.

> You mean passing "fastpath" to _bt_insertonpg and then checking it's false
> if page split is needed? But isn't page split is only needed if the page
> doesn't have enough free space? If so, we have checked for that before
> setting "fastpath".

That's not exactly what I meant. I meant that if:

1. This is an insertion to the leaf level in _bt_insertonpg().

and

2. We don't have space on the page, and so must do a split (or at
least free LP_DEAD items).

and

3. RelationGetTargetBlock(rel) != InvalidBlockNumber

There should be an assertion failure. This new assertion within
_bt_insertonpg() makes it much less likely that the assumption breaks
later.

This is where you could point out the low-level details that I
suggested be omitted from _bt_doinsert() at the beginning of this
e-mail. You can mention here that it would actually work without a
pagesplit, but that is only intended for crash recovery, and is a much
slower path that would make the optimization totally
counter-productive. We add an assertion because without one it would
be easy to miss a regression where there is a page split with an empty
stack.

Finally, I'd like to see a small paragraph in the nbtree README, about
the high level theory behind this optimization and page recycling. The
assumption is that there can only be one non-ignorable leaf rightmost
page, and so even a RecentGlobalXmin style interlock isn't required.
We cannot fail to detect that our hint was invalidated, because there
can only be one such page in the B-Tree at any time. It's possible
that the page will be deleted and recycled without a backend's cached
page also being detected as invalidated, but only when we happen to
recycle a page that once again becomes the rightmost leaf page.

Once those changes are made, this should be fine to commit.

-- 
Peter Geoghegan


pgsql-committers by date:

Previous
From: Alvaro Herrera
Date:
Subject: pgsql: Foreign keys on partitioned tables
Next
From: Tom Lane
Date:
Subject: pgsql: Improve FSM management for BRIN indexes.