Re: Failure while inserting parent tuple to B-tree is not fun - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Failure while inserting parent tuple to B-tree is not fun
Date
Msg-id 5328967C.5040706@vmware.com
Whole thread Raw
In response to Re: Failure while inserting parent tuple to B-tree is not fun  (Peter Geoghegan <pg@heroku.com>)
List pgsql-hackers
On 02/06/2014 06:42 AM, Peter Geoghegan wrote:
> I'm not sure about this:
>
>> *************** _bt_findinsertloc(Relation rel,
>> *** 675,680 ****
>> --- 701,707 ----
>> static void
>> _bt_insertonpg(Relation rel,
>>                Buffer buf,
>> +                Buffer cbuf,
>>                  BTStack stack,
>>                  IndexTuple itup,
>>                  OffsetNumber newitemoff,
>
> Wouldn't lcbuf be a better name for the new argument? After all, in
> relevant contexts 'buf' is the parent of both of the pages post-split,
> but there are two children in play here. You say:
>
>> +  *        When inserting to a non-leaf page, 'cbuf' is the left-sibling of the
>> +  *        page we're inserting the downlink for.  This function will clear the
>> +  *        INCOMPLETE_SPLIT flag on it, and release the buffer.
>
> I suggest also noting in comments at this point that cbuf/the
> left-sibling is the "old half" from the split.
>
> Even having vented about cbuf's name, I can kind of see why you didn't
> call it lcbuf: we already have an "BTPageOpaque lpageop" variable for
> the special 'buf' metadata within the _bt_insertonpg() function; so
> there might be an ambiguity between the possibly-soon-to-be-left page
> (the target page) and the *child* left page that needs to have its
> flag cleared. Although I still think you should change the name of
> "lpageop" (further details follow).
>
> Consider this:
>
>>           /* release buffers */
>> +         if (!P_ISLEAF(lpageop))
>> +             _bt_relbuf(rel, cbuf);
>
> (Rebased, so this looks a bit different to your original version IIRC).
>
> This is checking that the _bt_insertonpg() target page (whose special
> data lpageop points to) is not a leaf page, and releasing the *child*
> left page if it isn't. This is pretty confusing. So I suggest naming
> "lpageop" "tpageop" ('t' for target, a terminology already used in the
> comments above the function).

I don't know, the buf/page/lpageop variable names are used in many 
functions in nbtinsert.c. Perhaps they should all be renamed, but I 
think it's fine as it is, and not this patch's responsibility anyway. 
(The lpageop name makes sense when the page has to be split, and the 
page becomes the left page. Otherwise, admittedly, not so much)

> Also, I suggest you change this existing code comment:
>
>   *        On entry, we must have the right buffer in which to do the
>   *        insertion, and the buffer must be pinned and write-locked.    On return,
>   *        we will have dropped both the pin and the lock on the buffer.
>
> Change "right" to "correct" here. Minor point, but "right" is a loaded word.

Fixed.

> But why are you doing new things while checking P_ISLEAF(lpageop) in
> _bt_insertonpg() to begin with? Would you not be better off doing
> things when there is a child buffer passed (e.g. "if
> (BufferIsValid(cbuf))..."), and only then asserting
> !P_ISLEAF(lpageop)? That seems to me to more naturally establish the
> context of "_bt_insertonpg() is called to insert downlink after
> split". Although maybe that conflates things within "XLOG stuff". Hmm.

Changed that way for the places where we deal with the incomplete-split 
flag.

I committed the patch now. Thanks for the review!

Before committing, I spotted a bug in the way the new page-deletion code 
interacts with this: there was a check that the page that's about to be 
deleted was not marked with the INCOMPLETE_SPLIT flag, it was possible 
that the page was the right half of an incomplete split, and so didn't 
have a downlink. Vacuuming that failed with an "failed to re-find parent 
key" error. I fixed that by checking that the left sibling is also not 
marked with the flag.

It would be fairly easy to delete a page that is the right half of an 
incomplete split. Such a page doesn't have a downlink pointing to it, so 
just skip removing it, and instead clear the INCOMPLETE_SPLIT flag in 
the left sibling. But deleting incompletely split pages isn't important 
from a disk-space point of view, they should be extremely rare, so I 
decided to just punt.

- Heikki



pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: pg_archivecleanup bug
Next
From: Alvaro Herrera
Date:
Subject: Re: pg_archivecleanup bug