Re: Should the nbtree page split REDO routine's locking work more like the locking on the primary? - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Should the nbtree page split REDO routine's locking work more like the locking on the primary?
Date
Msg-id CAH2-Wz=wCroUbaTvWZt82oVu+zp=ym=5PDT3ZDzV9Ko0qYFCZg@mail.gmail.com
Whole thread Raw
In response to Re: Should the nbtree page split REDO routine's locking work more like the locking on the primary?  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Should the nbtree page split REDO routine's locking work more like the locking on the primary?  (Peter Geoghegan <pg@bowt.ie>)
List pgsql-hackers
On Thu, Aug 6, 2020 at 6:08 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> +1 for making this more like what happens in original execution ("on the
> primary", to use your wording).  Perhaps what you suggest here is still
> not enough like the original execution, but it sounds closer.

It won't be the same as the original execution, exactly -- I am only
thinking of holding on to same-level page locks (the original page,
its new right sibling, and the original right sibling). I suppose that
it's possible to go further than this in one rarer case (when clearing
incomplete split flag one level down), but for the most part it isn't
even possible to follow original execution's approach to locking in
every detail. Clearly it's not okay for the startup process to hold
buffer locks across replay of the first and second phase of a split,
but that's what it would take to follow original execution 100%
faithfully -- there are two WAL records involved.

I am quite confident that there won't be any remaining problems
provided we follow the original execution's approach to locking within
each level of the tree -- that's enough. Anything that runs during
recovery won't care about cross-level differences, aside from the
obvious (scans may have to move right to recover from concurrent
splits).

> As the commit message for 3bbf668d explains, the initial situation for
> all the replay code was that it executed by itself in crash recovery and
> didn't need to bother with locks at all.  I think that it did take some
> locks even then, but that was because of code sharing with the primary
> execution path rather than being something we wanted.

Makes sense.

-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: display offset along with block number in vacuum errors
Next
From: Andy Fan
Date:
Subject: Re: FailedAssertion("pd_idx == pinfo->nparts", File: "execPartition.c", Line: 1689)