Home > mailing lists

Re: Hot standby queries see transient all-zeros pages - Mailing list pgsql-hackers

From	Thomas Munro
Subject	Re: Hot standby queries see transient all-zeros pages
Date	December 14, 2024 03:41:40
Msg-id	CA+hUKGLVoihg6T5XCDBeUYsTHLO_+id=c9HarwRGRCQB=76KqA@mail.gmail.com Whole thread Raw
In response to	Re: Hot standby queries see transient all-zeros pages (Andres Freund <andres@anarazel.de>)
Responses	Re: Hot standby queries see transient all-zeros pages
List	pgsql-hackers

Tree view

On Sat, Dec 14, 2024 at 11:41 AM Andres Freund <andres@anarazel.de> wrote:
> On 2024-05-12 10:16:58 -0700, Noah Misch wrote:
> > The use of RBM_ZERO_AND_LOCK is incompatible with that.  See a similar
> > argument at https://postgr.es/m/flat/5101.1328219790@sss.pgh.pa.us that led
> > me to the cause.  Adding a 10ms sleep just after RBM_ZERO_AND_LOCK, I got 2
> > failures in 7 runs of 027_stream_regress.pl, at Assert(ItemIdIsNormal(lpp))
> > in heapgettup_pagemode().  In the core file, lpp pointed into an all-zeros
> > page.  RestoreBkpBlocks() had been doing RBM_ZERO years before hot standby
> > existed, but it wasn't a bug until queries could run concurrently.
>
> Yep, this does sound like a problem.

This was a bug of mine that was fixed later in another thread, which
Noah might have mistaken for intended-but-wrong behaviour, because the
name is weird IMHO.

https://www.postgresql.org/message-id/flat/7ed10231-ce47-03d5-d3f9-4aea0dc7d5a4%40gmail.com

> Hm. Leaving RBM_ZERO_AND_LOCK aside, is it actually always safe to do
> RestoreBlockImage() into a buffer that currently is pinned? Not sure if
> there's actually all that much guarantee what transient state one can read
> when reading a page concurrently to a memcpy(). I suspect it's practically
> rare to see a problem, but one could imagine an memcpy implementation that
> uses non-temporal writes, which afaict would leave you open to seeing quite
> random states when reading concurrently, as the cache coherence protocol
> doesn't protect anymore.

I also found that strange, while working on commit e656657f.  I
observed and pondered the behaviour you mentioned while testing that:
you're scanning a page holding only a pin, which on a primary would
allow others to add more (invisible to you) tuples that you shouldn't
attempt to access (based on visibility checks performed while you held
the lock), and in recovery the exact same thing is possible, except
that if it happens to include a FPI then it *will* overwrite the
tuples that you are scanning... with the exact same ones-and-zeros.
But I couldn't think of anything specifically wrong with it, ie that
might break.  Well, maybe it's not exactly the same ones-and-zeros in
some hint-bit scenario, but if that is considered acceptable on a
primary too (well maybe you're getting rid of it), but what's the
difference?  Oh... that's a plain store and you're thinking that
memcpy() might be allowed to use some weirder cache coherency modes
for speed, ... sounds likely to break a lot of stuff, is that a real
thing?!

> > I suspect the fix is to add a ReadBufferMode specified as, "If the block is
> > already in shared_buffers, do RBM_NORMAL and exclusive-lock the buffer.
> > Otherwise, do RBM_ZERO_AND_LOCK."
>
> I think that should work. At least in the current code it looks near trivial
> to implement, although the branch differences are going to be annoying.
> As usual the hardest part would probably be the naming. Maybe
> RBM_ZERO_ON_MISS_LOCK? RBM_LOCK_ZERO_ON_MISS? RBM_DWIM?

That's already how RBM_ZERO_AND_LOCK works, and you are the third
victim of its confusing name counting me (that's how I broke it in the
vectored stuff) and Noah IIUC what happened here.

pgsql-hackers by date:

From: Noah Misch
Date: 14 December 2024, 03:38:05
Subject: Re: Hot standby queries see transient all-zeros pages

From: Thomas Munro
Date: 14 December 2024, 03:45:33
Subject: Re: checkpointer: PANIC: could not fsync file: No such file or directory

Re: Hot standby queries see transient all-zeros pages - Mailing list pgsql-hackers

Previous

Next