Re: Atomic ops for unlogged LSN - Mailing list pgsql-hackers

From Stephen Frost
Subject Re: Atomic ops for unlogged LSN
Date
Msg-id ZUpqMrERC+48/a3p@tamriel.snowman.net
Whole thread Raw
In response to Re: Atomic ops for unlogged LSN  (Nathan Bossart <nathandbossart@gmail.com>)
Responses Re: Atomic ops for unlogged LSN
List pgsql-hackers
Greetings,

* Nathan Bossart (nathandbossart@gmail.com) wrote:
> On Tue, Nov 07, 2023 at 12:57:32AM +0000, John Morris wrote:
> > I incorporated your suggestions and added a few more. The changes are
> > mainly related to catching potential errors if some basic assumptions
> > aren’t met.
>
> Hm.  Could we move that to a separate patch?  We've lived without these
> extra checks for a very long time, and I'm not aware of any related issues,
> so I'm not sure it's worth the added complexity.  And IMO it'd be better to
> keep it separate from the initial atomics conversion, anyway.

I do see the value in adding in an Assert though I don't want to throw
away the info about what the recent unlogged LSN was when we crash.  As
that basically boils down to a one-line addition, I don't think it
really needs to be in a separate patch.

> > I found the comment about cache coherency a bit confusing. We are dealing
> > with a single address, so there should be no memory ordering or coherency
> > issues. (Did I misunderstand?) I see it more as a race condition. Rather
> > than merely explaining why it shouldn’t happen, the new version verifies
> > the assumptions and throws an Assert() if something goes wrong.
>
> I was thinking of the comment for pg_atomic_read_u32() that I cited earlier
> [0].  This comment also notes that pg_atomic_read_u32/64() has no barrier
> semantics.  My interpretation of that comment is that these functions
> provide no guarantee that the value returned is the most up-to-date value.

There seems to be some serious misunderstanding about what is happening
here.  The value written into the control file for unlogged LSN during
normal operation does *not* need to be the most up-to-date value and
talking about it as if it needs to be the absolutely most up-to-date and
correct value is, if anything, adding to the confusion, not reducing
confusion.  The reason to write in anything other than a zero during
these routine checkpoints for unlogged LSN is entirely for forensics
purposes, not because we'll ever actually use the value- during crash
recovery and backup/restore, we're going to reset the unlogged LSN
counter anyway and we're going to throw away all of unlogged table
contents across the entire system.

We only care about the value of the unlogged LSN being correct during
normal shutdown when we're writing out the shutdown checkpoint, but by
that time everything else has been shut down and the value absolutely
should not be changing.

Thanks,

Stephen

Attachment

pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: meson documentation build open issues
Next
From: Tom Lane
Date:
Subject: Re: meson documentation build open issues