Re: Failure while inserting parent tuple to B-tree is not fun - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Failure while inserting parent tuple to B-tree is not fun
Date
Msg-id CAM3SWZRGBXTYNVuAx3byQ0bo8J6HsAOx8t47bt61By1GTsxk3Q@mail.gmail.com
Whole thread Raw
In response to Re: Failure while inserting parent tuple to B-tree is not fun  (Peter Geoghegan <pg@heroku.com>)
Responses Re: Failure while inserting parent tuple to B-tree is not fun  (Peter Geoghegan <pg@heroku.com>)
Re: Failure while inserting parent tuple to B-tree is not fun  (Heikki Linnakangas <hlinnakangas@vmware.com>)
List pgsql-hackers
Some more thoughts:

Please add comments above _bt_mark_page_halfdead(), a new routine from
the dependency patch. I realize that this is substantially similar to
part of how _bt_pagedel() used to work, but it's still incongruous.

> ! Our approach is to create any missing downlinks on-they-fly, when
> ! searching the tree for a new insertion. It could be done during searches,
> ! too, but it seems best not to put any extra updates in what would otherwise

s/on-they-fly/on-the-fly/

I'm not sure about this:

> *************** _bt_findinsertloc(Relation rel,
> *** 675,680 ****
> --- 701,707 ----
> static void
> _bt_insertonpg(Relation rel,
>                Buffer buf,
> +                Buffer cbuf,
>                 BTStack stack,
>                 IndexTuple itup,
>                 OffsetNumber newitemoff,

Wouldn't lcbuf be a better name for the new argument? After all, in
relevant contexts 'buf' is the parent of both of the pages post-split,
but there are two children in play here. You say:

>+  *        When inserting to a non-leaf page, 'cbuf' is the left-sibling of the
>+  *        page we're inserting the downlink for.  This function will clear the
>+  *        INCOMPLETE_SPLIT flag on it, and release the buffer.

I suggest also noting in comments at this point that cbuf/the
left-sibling is the "old half" from the split.

Even having vented about cbuf's name, I can kind of see why you didn't
call it lcbuf: we already have an "BTPageOpaque lpageop" variable for
the special 'buf' metadata within the _bt_insertonpg() function; so
there might be an ambiguity between the possibly-soon-to-be-left page
(the target page) and the *child* left page that needs to have its
flag cleared. Although I still think you should change the name of
"lpageop" (further details follow).

Consider this:

>          /* release buffers */
>+         if (!P_ISLEAF(lpageop))
>+             _bt_relbuf(rel, cbuf);

(Rebased, so this looks a bit different to your original version IIRC).

This is checking that the _bt_insertonpg() target page (whose special
data lpageop points to) is not a leaf page, and releasing the *child*
left page if it isn't. This is pretty confusing. So I suggest naming
"lpageop" "tpageop" ('t' for target, a terminology already used in the
comments above the function).

Also, I suggest you change this existing code comment:
*        On entry, we must have the right buffer in which to do the*        insertion, and the buffer must be pinned
andwrite-locked.    On return,*        we will have dropped both the pin and the lock on the buffer.
 

Change "right" to "correct" here. Minor point, but "right" is a loaded word.

But why are you doing new things while checking P_ISLEAF(lpageop) in
_bt_insertonpg() to begin with? Would you not be better off doing
things when there is a child buffer passed (e.g. "if
(BufferIsValid(cbuf))..."), and only then asserting
!P_ISLEAF(lpageop)? That seems to me to more naturally establish the
context of "_bt_insertonpg() is called to insert downlink after
split". Although maybe that conflates things within "XLOG stuff". Hmm.

Perhaps some of these observations could equally well apply to
_bt_split(), which is similarly charged with releasing a left child
page on inserting a downlink. Particularly around reconsidering the
"left vs left child vs target" terminology.

Consider what happens when _bt_split() is passed a (left) child buffer
(a 'cbuf' argument). We are:

1. Splitting a page (first phase, which may only be finished lazily
some time later).

2. Iff this is a non-leaf page split, simultaneously unmark the flag
from some *other* split which we have a child buffer to unmark. Needs
to be part of same critical section. Unlock + release cbuf, again only
iff target/right split page contains a leaf page.

So, at the risk of belaboring the point:

* Some callers to _bt_split() and _bt_insertonpg() pass a 'cbuf' that
is not invalid.

* If either of those 2 functions don't unlock + release those buffers,
no one ever will.

* Therefore, at the very least we should unlock + release those
buffers **just because they're not invalid**. That's a sufficient
condition to do so, and attaching that to "level" is unnecessarily
unclear. As I said, assert non-leafness.

Incidentally, I think that in general we could be clearer on the fact
that a root page may also be a leaf page.
-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Performance Improvement by reducing WAL for Update Operation
Next
From: Craig Ringer
Date:
Subject: Re: Row-security on updatable s.b. views