Re: pgsql: Optimize btree insertions for common case of increasing values - Mailing list pgsql-committers

From Peter Geoghegan
Subject Re: pgsql: Optimize btree insertions for common case of increasing values
Date
Msg-id CAH2-WzkpJd5TP6uRCqa473mzZCNsVk5KjE3xrs-4=FfFiAMHog@mail.gmail.com
Whole thread Raw
In response to Re: pgsql: Optimize btree insertions for common case of increasing values  (Pavan Deolasee <pavan.deolasee@gmail.com>)
Responses Re: pgsql: Optimize btree insertions for common case of increasing values  (Peter Geoghegan <pg@bowt.ie>)
List pgsql-committers
On Thu, Apr 5, 2018 at 6:27 AM, Pavan Deolasee <pavan.deolasee@gmail.com> wrote:
> I came up with something like this:
>
> +               /*
> +                * If we're here then a pagesplit is needed. We should never
> reach here
> +                * if we're using the fastpath since we should have checked
> for all the
> +                * required conditions, including the fact that this page
> has enough
> +                * freespace. Note that this routine can in-theory deal with
> the
> +                * situation where a NULL stack pointer is passed (that's
> what would
> +                * happen if the fastpath is taken), like it does during
> crash
> +                * recovery. But that path is much slower, defeating the
> very purpose
> +                * of the optimisation.  The following assertion should
> protect us from
> +                * any future code changes that invalidate those
> assumptions.
> +                *
> +                * Note the fact that whenever we fail to take the fastpath,
> we clear
> +                * the cached block. So checking for a valid cached block at
> this point
> +                * is enough to decide whether we're in a fastpath or not.
> +                */
> +               Assert(!(P_ISLEAF(lpageop) &&
> +
> BlockNumberIsValid(RelationGetTargetBlock(rel))));
> +

This looks good to me.

> But then I started thinking can _bt_insertonpg() be called from a code path
> which doesn't start at _bt_doinsert()? I traced one such path
> _bt_getstackbuf() -> _bt_finish_split() -> _bt_insert_parent(), but that can
> only happen in case of a non-leaf page. The assertion which checks for a
> leaf page, should be fine, right?

Right. That's the whole reason for the P_ISLEAF() part of the
assertion. Actually, since a page in an internal page can only happen
as a direct consequence of a page split in one of its children, you
don't even need the P_ISLEAF() part. I'd leave it in, though, since it
makes it clearer which caller you're thinking about.

>> Finally, I'd like to see a small paragraph in the nbtree README, about
>> the high level theory behind this optimization and page recycling. The
>> assumption is that there can only be one non-ignorable leaf rightmost
>> page, and so even a RecentGlobalXmin style interlock isn't required.
>> We cannot fail to detect that our hint was invalidated, because there
>> can only be one such page in the B-Tree at any time. It's possible
>> that the page will be deleted and recycled without a backend's cached
>> page also being detected as invalidated, but only when we happen to
>> recycle a page that once again becomes the rightmost leaf page.
>>
>
> Can I borrow that almost verbatim, after adding details about the
> optimisation itself? Looks like I'm too tired to think sensibly.

I know the feeling.

I think you can take that wording almost verbatim. Obviously it should
refer to the optimization by name, and blend into the surrounding text
in the README. I suggest putting a small section before "On-the-Fly
Deletion Of Index Tuples", but after the main discussion of deletion +
recycling. It's essentially an exception to the general rule, so that
placement makes sense to me.

Thanks
-- 
Peter Geoghegan


pgsql-committers by date:

Previous
From: Magnus Hagander
Date:
Subject: pgsql: Fix worker_spi for new parameter to initialize connection
Next
From: Peter Eisentraut
Date:
Subject: pgsql: Fix plan cache issue in PL/pgSQL CALL