Re: Inadequate thought about buffer locking during hot standby replay - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Inadequate thought about buffer locking during hot standby replay
Date
Msg-id CA+U5nM+p6ze4PMd6YvcMDrVru01_OcYZ=43T7EzsoqQgEpt8eQ@mail.gmail.com
Whole thread Raw
In response to Inadequate thought about buffer locking during hot standby replay  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Inadequate thought about buffer locking during hot standby replay  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On 9 November 2012 23:24, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> During normal running, operations such as btree page splits are
> extremely careful about the order in which they acquire and release
> buffer locks, if they're doing something that concurrently modifies
> multiple pages.
>
> During WAL replay, that all goes out the window.  Even if an individual
> WAL-record replay function does things in the right order for "standard"
> cases, RestoreBkpBlocks has no idea what it's doing.  So if one or more
> of the referenced pages gets treated as a full-page image, we are left
> with no guarantee whatsoever about what order the pages are restored in.
> That never mattered when the code was originally designed, but it sure
> matters during Hot Standby when other queries might be able to see the
> intermediate states.
>
> I can't prove that this is the cause of bug #7648, but it's fairly easy
> to see that it could explain the symptom.  You only need to assume that
> the page-being-split had been handled as a full-page image, and that the
> new right-hand page had gotten allocated by extending the relation.
> Then there will be an interval just after RestoreBkpBlocks does its
> thing where the updated left-hand sibling is in the index and is not
> locked in any way, but its right-link points off the end of the index.
> If a few indexscans come along before the replay process gets to
> continue, you'd get exactly the reported errors.
>
> I'm inclined to think that we need to fix this by getting rid of
> RestoreBkpBlocks per se, and instead having the per-WAL-record restore
> routines dictate when each full-page image is restored (and whether or
> not to release the buffer lock immediately).  That's not going to be a
> small change unfortunately :-(

No, but it looks like a clear bug scenario and a clear resolution also.

I'll start looking at it.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



pgsql-hackers by date:

Previous
From: Noah Misch
Date:
Subject: Re: [PATCH] Patch to compute Max LSN of Data Pages
Next
From: Bruce Momjian
Date:
Subject: Re: Further pg_upgrade analysis for many tables