Re: _bt_split(), and the risk of OOM before its critical section - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: _bt_split(), and the risk of OOM before its critical section
Date
Msg-id CAH2-Wz=D8SHWrkzeVW2u_=XnRA=XenMu=N3VkirxAKY1N0vWzg@mail.gmail.com
Whole thread Raw
In response to Re: _bt_split(), and the risk of OOM before its critical section  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: _bt_split(), and the risk of OOM before its critical section  (Peter Geoghegan <pg@bowt.ie>)
List pgsql-hackers
On Mon, May 6, 2019 at 4:11 PM Peter Geoghegan <pg@bowt.ie> wrote:
> I am tempted to move the call to _bt_truncate() out of _bt_split()
> entirely on HEAD, possibly relocating it to
> nbtsplitloc.c/_bt_findsplitloc(). That way, there is a clearer
> separation between how split points are chosen, suffix truncation, and
> the mechanical process of executing a legal page split.

I decided against that -- better to make it clear how truncation deals
with space overhead within _bt_split(). Besides, the resource
management around sharing a maybe-palloc()'d high key across module
boundaries seems complicated, and best avoided.

Attached draft patch for HEAD fixes the bug by organizing _bt_split()
into clear phases. _bt_split() now works as follows, which is a little
different:

*  An initial phase that is entirely concerned with the left page temp
buffer itself -- initializes its special area.

* Suffix truncation to get left page's new high key, and then add it
to left page.

* A phase that is mostly concerned with initializing the right page
special area, but also finishes off one or two details about the left
page that needed to be delayed. This is also where the "shadow
critical section" begins. Note also that this is where
_bt_vacuum_cycleid() is called, because its contract actually
*requires* that caller has a buffer lock on both pages at once. This
should not be changed on the grounds that _bt_vacuum_cycleid() might
fail (nor for any other reason).

* Add new high key to right page if needed. (No change, other than the
fact that it happens later now.)

* Add other items to both leftpage and rightpage. Critical section
that copies leftpage into origpage buffer. (No changes here.)

I suppose I'm biased, but I prefer the new approach anyway. Adding the
left high key first, and then the right high key seems simpler and
more logical. It emphasizes the similarities and differences between
leftpage and rightpage. Furthermore, this approach fixes the
theoretical risk of leaving behind a minimally-initialized nbtree page
that has existed since 2010. We don't allocated *any* memory after the
point that a new rightpage buffer is acquired.

I suppose that this will need to be backpatched.

Thoughts?
--
Peter Geoghegan

Attachment

pgsql-hackers by date:

Previous
From: Melanie Plageman
Date:
Subject: Re: accounting for memory used for BufFile during hash joins
Next
From: Thomas Munro
Date:
Subject: Re: We're leaking predicate locks in HEAD