Re: Hot standby queries see transient all-zeros pages - Mailing list pgsql-hackers
From | Thomas Munro |
---|---|
Subject | Re: Hot standby queries see transient all-zeros pages |
Date | |
Msg-id | CA+hUKGLVoihg6T5XCDBeUYsTHLO_+id=c9HarwRGRCQB=76KqA@mail.gmail.com Whole thread Raw |
In response to | Re: Hot standby queries see transient all-zeros pages (Andres Freund <andres@anarazel.de>) |
Responses |
Re: Hot standby queries see transient all-zeros pages
|
List | pgsql-hackers |
On Sat, Dec 14, 2024 at 11:41 AM Andres Freund <andres@anarazel.de> wrote: > On 2024-05-12 10:16:58 -0700, Noah Misch wrote: > > The use of RBM_ZERO_AND_LOCK is incompatible with that. See a similar > > argument at https://postgr.es/m/flat/5101.1328219790@sss.pgh.pa.us that led > > me to the cause. Adding a 10ms sleep just after RBM_ZERO_AND_LOCK, I got 2 > > failures in 7 runs of 027_stream_regress.pl, at Assert(ItemIdIsNormal(lpp)) > > in heapgettup_pagemode(). In the core file, lpp pointed into an all-zeros > > page. RestoreBkpBlocks() had been doing RBM_ZERO years before hot standby > > existed, but it wasn't a bug until queries could run concurrently. > > Yep, this does sound like a problem. This was a bug of mine that was fixed later in another thread, which Noah might have mistaken for intended-but-wrong behaviour, because the name is weird IMHO. https://www.postgresql.org/message-id/flat/7ed10231-ce47-03d5-d3f9-4aea0dc7d5a4%40gmail.com > Hm. Leaving RBM_ZERO_AND_LOCK aside, is it actually always safe to do > RestoreBlockImage() into a buffer that currently is pinned? Not sure if > there's actually all that much guarantee what transient state one can read > when reading a page concurrently to a memcpy(). I suspect it's practically > rare to see a problem, but one could imagine an memcpy implementation that > uses non-temporal writes, which afaict would leave you open to seeing quite > random states when reading concurrently, as the cache coherence protocol > doesn't protect anymore. I also found that strange, while working on commit e656657f. I observed and pondered the behaviour you mentioned while testing that: you're scanning a page holding only a pin, which on a primary would allow others to add more (invisible to you) tuples that you shouldn't attempt to access (based on visibility checks performed while you held the lock), and in recovery the exact same thing is possible, except that if it happens to include a FPI then it *will* overwrite the tuples that you are scanning... with the exact same ones-and-zeros. But I couldn't think of anything specifically wrong with it, ie that might break. Well, maybe it's not exactly the same ones-and-zeros in some hint-bit scenario, but if that is considered acceptable on a primary too (well maybe you're getting rid of it), but what's the difference? Oh... that's a plain store and you're thinking that memcpy() might be allowed to use some weirder cache coherency modes for speed, ... sounds likely to break a lot of stuff, is that a real thing?! > > I suspect the fix is to add a ReadBufferMode specified as, "If the block is > > already in shared_buffers, do RBM_NORMAL and exclusive-lock the buffer. > > Otherwise, do RBM_ZERO_AND_LOCK." > > I think that should work. At least in the current code it looks near trivial > to implement, although the branch differences are going to be annoying. > As usual the hardest part would probably be the naming. Maybe > RBM_ZERO_ON_MISS_LOCK? RBM_LOCK_ZERO_ON_MISS? RBM_DWIM? That's already how RBM_ZERO_AND_LOCK works, and you are the third victim of its confusing name counting me (that's how I broke it in the vectored stuff) and Noah IIUC what happened here.
pgsql-hackers by date: