Re: Failure while inserting parent tuple to B-tree is not fun - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Failure while inserting parent tuple to B-tree is not fun
Date
Msg-id 5285071B.1040100@vmware.com
Whole thread Raw
In response to Re: Failure while inserting parent tuple to B-tree is not fun  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Responses Re: Failure while inserting parent tuple to B-tree is not fun  (Peter Geoghegan <pg@heroku.com>)
List pgsql-hackers
On 05.11.2013 15:07, Heikki Linnakangas wrote:
> On 25.10.2013 22:13, Heikki Linnakangas wrote:
>> On 22.10.2013 19:55, Heikki Linnakangas wrote:
>>> I fixed the the same problem in GiST a few years back, by making it
>>> tolerate missing downlinks, and inserting them lazily. The B-tree code
>>> tolerates them already on scans, but gets confused on insertion, as seen
>>> above. I propose that we use the same approach I used with GiST, and add
>>> a flag to the page header to indicate "the downlink hasn't been inserted
>>> yet". When insertion (or vacuum) bumps into a flagged page, it can
>>> finish the incomplete action by inserting the downlink.
>>
>> This is what I came up with.
>>
>> One thing I'm not totally happy about is the way page deletions of
>> incompletely split pages are handled. Basically, it just bails out and
>> refuses to delete a page that is part of an incomplete split. That's
>> probably OK in practice, as incomplete splits should be very rare
>> anyway, but it's a bit dissatisfying to not handle the case because at
>> first glance it seems like it should be even simpler than usual to
>> delete a page that has no downlink. Nevertheless, I decided to just skip
>> that for now.
>>
>> After this patch, deleting the only child of a parent and the parent
>> itself is still a multi-WAL-record operation that needs to be tracked
>> during recovery, and completed at the end of recovery. I'd like to
>> eliminate that too, but that's another patch.
>
> Here's a new version of this, which uses a similar technique to handle
> page deletions, eliminating the "incomplete action" tracking code
> altogether (from btree). When an internal page is marked as half-dead,
> its right sibling is atomically marked with a
> "left-sibling-is-half-dead" flag. Whenever an insertion encounters a
> page with that flag set, it will finish the deletion of the left sibling
> before proceeding with the insertion.
>
> This needs a lot more testing, but I wanted to get this out for review,
> in case someone sees a fundamental problem with this.

Ok, here's a new version of the patch to handle incomplete B-tree
splits. I rejected the scheme I outlined above about handling half-dead
pages - instead, see the patch in the "Race condition in b-tree page
deletion" thread:
http://www.postgresql.org/message-id/5283A2FC.6040606@vmware.com. That
approach to handling half-dead pages makes the incomplete-split stuff a
lot easier, as half-dead pages don't need any special treatment wrt.
splits- This patch should be after that patch.

- Heikki

Attachment

pgsql-hackers by date:

Previous
From: Zhan Li
Date:
Subject: Re: Ideas of "printing out" the alternative paths
Next
From: Andres Freund
Date:
Subject: Replication Node Identifiers and crashsafe Apply Progress