Thread: Atomic ops for unlogged LSN

Atomic ops for unlogged LSN

From
John Morris
Date:

This is a short patch which cleans up code for unlogged LSNs. It replaces the existing spinlock with

atomic ops.  It could provide a performance benefit for future uses of

unlogged LSNS,  but for now it is simply a cleaner implementation.

Attachment

Re: Atomic ops for unlogged LSN

From
Michael Paquier
Date:
On Tue, May 23, 2023 at 08:24:45PM +0000, John Morris wrote:
> This is a short patch which cleans up code for unlogged LSNs. It
> replaces the existing spinlock with atomic ops.  It could provide a
> performance benefit for future uses of unlogged LSNS,  but for now
> it is simply a cleaner implementation.

Seeing the code paths where gistGetFakeLSN() is called, I guess that
the answer will be no, still are you seeing a measurable difference in
some cases?

-   /* increment the unloggedLSN counter, need SpinLock */
-   SpinLockAcquire(&XLogCtl->ulsn_lck);
-   nextUnloggedLSN = XLogCtl->unloggedLSN++;
-   SpinLockRelease(&XLogCtl->ulsn_lck);
-
-   return nextUnloggedLSN;
+   return pg_atomic_fetch_add_u64(&XLogCtl->unloggedLSN, 1);

Spinlocks provide a full memory barrier, which may not the case with
add_u64() or read_u64().  Could memory ordering be a problem in these
code paths?
--
Michael

Attachment

Re: Atomic ops for unlogged LSN

From
Robert Haas
Date:
On Tue, May 23, 2023 at 6:26 PM Michael Paquier <michael@paquier.xyz> wrote:
> Spinlocks provide a full memory barrier, which may not the case with
> add_u64() or read_u64().  Could memory ordering be a problem in these
> code paths?

It could be a huge problem if what you say were true, but unless I'm
missing something, the comments in atomics.h say that it isn't. The
comments for the 64-bit functions say to look at the 32-bit functions,
and there it says:

/*
 * pg_atomic_add_fetch_u32 - atomically add to variable
 *
 * Returns the value of ptr after the arithmetic operation.
 *
 * Full barrier semantics.
 */

Which is probably a good thing, because I'm not sure what good it
would be to have an operation that both reads and modifies an atomic
variable but has no barrier semantics.

That's not to say that I entirely understand the point of this patch.
It seems like a generally reasonable thing to do AFAICT but somehow I
wonder if there's more to the story here, because it doesn't feel like
it would be easy to measure the benefit of this patch in isolation.
That's not a reason to reject it, though, just something that makes me
curious.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: Atomic ops for unlogged LSN

From
Andres Freund
Date:
Hi,

On 2023-05-24 08:22:08 -0400, Robert Haas wrote:
> On Tue, May 23, 2023 at 6:26 PM Michael Paquier <michael@paquier.xyz> wrote:
> > Spinlocks provide a full memory barrier, which may not the case with
> > add_u64() or read_u64().  Could memory ordering be a problem in these
> > code paths?
> 
> It could be a huge problem if what you say were true, but unless I'm
> missing something, the comments in atomics.h say that it isn't. The
> comments for the 64-bit functions say to look at the 32-bit functions,
> and there it says:
> 
> /*
>  * pg_atomic_add_fetch_u32 - atomically add to variable
>  *
>  * Returns the value of ptr after the arithmetic operation.
>  *
>  * Full barrier semantics.
>  */
> 
> Which is probably a good thing, because I'm not sure what good it
> would be to have an operation that both reads and modifies an atomic
> variable but has no barrier semantics.

I was a bit confused by Michael's comment as well, due to the section of code
quoted. But he does have a point: pg_atomic_read_u32() does indeed *not* have
barrier semantics (it'd be way more expensive), and the patch does contain
this hunk:

> @@ -6784,9 +6775,7 @@ CreateCheckPoint(int flags)
>       * unused on non-shutdown checkpoints, but seems useful to store it always
>       * for debugging purposes.
>       */
> -    SpinLockAcquire(&XLogCtl->ulsn_lck);
> -    ControlFile->unloggedLSN = XLogCtl->unloggedLSN;
> -    SpinLockRelease(&XLogCtl->ulsn_lck);
> +    ControlFile->unloggedLSN = pg_atomic_read_u64(&XLogCtl->unloggedLSN);
>  
>      UpdateControlFile();
>      LWLockRelease(ControlFileLock);

So we indeed loose some "barrier strength" - but I think that's fine. For one,
it's a debugging-only value. But more importantly, I don't see what reordering
the barrier could prevent - a barrier is useful for things like sequencing two
memory accesses to happen in the intended order - but unloggedLSN doesn't
interact with another variable that's accessed within the ControlFileLock
afaict.


> That's not to say that I entirely understand the point of this patch.
> It seems like a generally reasonable thing to do AFAICT but somehow I
> wonder if there's more to the story here, because it doesn't feel like
> it would be easy to measure the benefit of this patch in isolation.
> That's not a reason to reject it, though, just something that makes me
> curious.

I doubt it's a meaningful, if even measurable win. But removing atomic ops and
reducing shared memory space isn't a bad thing, even if there's no immediate
benefit...

Greetings,

Andres Freund



Re: Atomic ops for unlogged LSN

From
Michael Paquier
Date:
On Wed, May 24, 2023 at 02:49:58PM -0700, Andres Freund wrote:
> I was a bit confused by Michael's comment as well, due to the section of code
> quoted. But he does have a point: pg_atomic_read_u32() does indeed *not* have
> barrier semantics (it'd be way more expensive), and the patch does contain
> this hunk:

Thanks for the correction.  The part about _add was incorrect.

> So we indeed loose some "barrier strength" - but I think that's fine. For one,
> it's a debugging-only value. But more importantly, I don't see what reordering
> the barrier could prevent - a barrier is useful for things like sequencing two
> memory accesses to happen in the intended order - but unloggedLSN doesn't
> interact with another variable that's accessed within the ControlFileLock
> afaict.

This stuff is usually tricky enough that I am never completely sure
whether it is fine without reading again README.barrier, which is
where unloggedLSN is saved in the control file under its LWLock.
Something that I find confusing in the patch is that it does not
document the reason why this is OK.
--
Michael

Attachment

Re: Atomic ops for unlogged LSN

From
Nathan Bossart
Date:
On Thu, May 25, 2023 at 07:41:21AM +0900, Michael Paquier wrote:
> On Wed, May 24, 2023 at 02:49:58PM -0700, Andres Freund wrote:
>> So we indeed loose some "barrier strength" - but I think that's fine. For one,
>> it's a debugging-only value.

Is it?  I see uses in GiST indexing (62401db), so it's not immediately
obvious to me how it is debugging-only.  If it is, then I think this patch
ought to clearly document it so that nobody else tries to use it for
non-debugging-only stuff.

>> But more importantly, I don't see what reordering
>> the barrier could prevent - a barrier is useful for things like sequencing two
>> memory accesses to happen in the intended order - but unloggedLSN doesn't
>> interact with another variable that's accessed within the ControlFileLock
>> afaict.
> 
> This stuff is usually tricky enough that I am never completely sure
> whether it is fine without reading again README.barrier, which is
> where unloggedLSN is saved in the control file under its LWLock.
> Something that I find confusing in the patch is that it does not
> document the reason why this is OK.

My concern would be whether GetFakeLSNForUnloggedRel or CreateCheckPoint
might see an old value of unloggedLSN.  From the following note in
README.barrier, it sounds like this would be a problem even if we ensured
full barrier semantics:

    3. No ordering guarantees.  While memory barriers ensure that any given
    process performs loads and stores to shared memory in order, they don't
    guarantee synchronization.  In the queue example above, we can use memory
    barriers to be sure that readers won't see garbage, but there's nothing to
    say whether a given reader will run before or after a given writer.  If this
    matters in a given situation, some other mechanism must be used instead of
    or in addition to memory barriers.

IIUC we know that shared memory accesses cannot be reordered to precede
aquisition or follow release of a spinlock (thanks to 0709b7e), which is
why this isn't a problem in the current implementation.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: Atomic ops for unlogged LSN

From
Stephen Frost
Date:
Greetings,

* Nathan Bossart (nathandbossart@gmail.com) wrote:
> On Thu, May 25, 2023 at 07:41:21AM +0900, Michael Paquier wrote:
> > On Wed, May 24, 2023 at 02:49:58PM -0700, Andres Freund wrote:
> >> So we indeed loose some "barrier strength" - but I think that's fine. For one,
> >> it's a debugging-only value.
>
> Is it?  I see uses in GiST indexing (62401db), so it's not immediately
> obvious to me how it is debugging-only.  If it is, then I think this patch
> ought to clearly document it so that nobody else tries to use it for
> non-debugging-only stuff.

I don't see it as a debugging value.  I'm not sure where that came
from..?  We do use it in places and if anything, I expect it to be used
more, not less, in the future as a persistant generally increasing
value (could go backwards on a crash-restart but in such case all
unlogged tables are truncated).

> >> But more importantly, I don't see what reordering
> >> the barrier could prevent - a barrier is useful for things like sequencing two
> >> memory accesses to happen in the intended order - but unloggedLSN doesn't
> >> interact with another variable that's accessed within the ControlFileLock
> >> afaict.
> >
> > This stuff is usually tricky enough that I am never completely sure
> > whether it is fine without reading again README.barrier, which is
> > where unloggedLSN is saved in the control file under its LWLock.
> > Something that I find confusing in the patch is that it does not
> > document the reason why this is OK.
>
> My concern would be whether GetFakeLSNForUnloggedRel or CreateCheckPoint
> might see an old value of unloggedLSN.  From the following note in
> README.barrier, it sounds like this would be a problem even if we ensured
> full barrier semantics:
>
>     3. No ordering guarantees.  While memory barriers ensure that any given
>     process performs loads and stores to shared memory in order, they don't
>     guarantee synchronization.  In the queue example above, we can use memory
>     barriers to be sure that readers won't see garbage, but there's nothing to
>     say whether a given reader will run before or after a given writer.  If this
>     matters in a given situation, some other mechanism must be used instead of
>     or in addition to memory barriers.
>
> IIUC we know that shared memory accesses cannot be reordered to precede
> aquisition or follow release of a spinlock (thanks to 0709b7e), which is
> why this isn't a problem in the current implementation.

The concern around unlogged LSN, imv anyway, is less about shared memory
access and making sure that all callers understand that it can move
backwards on a crash/restart.  I don't think that's an issue for current
users but we just need to make sure to try and comment sufficiently
regarding that such that new users understand that about this particular
value.

Thanks,

Stephen

Attachment

Re: Atomic ops for unlogged LSN

From
Nathan Bossart
Date:
On Mon, Jun 12, 2023 at 07:24:18PM -0400, Stephen Frost wrote:
> * Nathan Bossart (nathandbossart@gmail.com) wrote:
>> Is it?  I see uses in GiST indexing (62401db), so it's not immediately
>> obvious to me how it is debugging-only.  If it is, then I think this patch
>> ought to clearly document it so that nobody else tries to use it for
>> non-debugging-only stuff.
> 
> I don't see it as a debugging value.  I'm not sure where that came
> from..?  We do use it in places and if anything, I expect it to be used
> more, not less, in the future as a persistant generally increasing
> value (could go backwards on a crash-restart but in such case all
> unlogged tables are truncated).

This is my understanding as well.

>> My concern would be whether GetFakeLSNForUnloggedRel or CreateCheckPoint
>> might see an old value of unloggedLSN.  From the following note in
>> README.barrier, it sounds like this would be a problem even if we ensured
>> full barrier semantics:

Never mind.  I think I'm forgetting that the atomics support in Postgres
deals with cache coherency.

> The concern around unlogged LSN, imv anyway, is less about shared memory
> access and making sure that all callers understand that it can move
> backwards on a crash/restart.  I don't think that's an issue for current
> users but we just need to make sure to try and comment sufficiently
> regarding that such that new users understand that about this particular
> value.

Right.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: Atomic ops for unlogged LSN

From
Stephen Frost
Date:
Greetings,

* Nathan Bossart (nathandbossart@gmail.com) wrote:
> On Mon, Jun 12, 2023 at 07:24:18PM -0400, Stephen Frost wrote:
> > * Nathan Bossart (nathandbossart@gmail.com) wrote:
> >> Is it?  I see uses in GiST indexing (62401db), so it's not immediately
> >> obvious to me how it is debugging-only.  If it is, then I think this patch
> >> ought to clearly document it so that nobody else tries to use it for
> >> non-debugging-only stuff.
> >
> > I don't see it as a debugging value.  I'm not sure where that came
> > from..?  We do use it in places and if anything, I expect it to be used
> > more, not less, in the future as a persistant generally increasing
> > value (could go backwards on a crash-restart but in such case all
> > unlogged tables are truncated).
>
> This is my understanding as well.
>
> >> My concern would be whether GetFakeLSNForUnloggedRel or CreateCheckPoint
> >> might see an old value of unloggedLSN.  From the following note in
> >> README.barrier, it sounds like this would be a problem even if we ensured
> >> full barrier semantics:
>
> Never mind.  I think I'm forgetting that the atomics support in Postgres
> deals with cache coherency.
>
> > The concern around unlogged LSN, imv anyway, is less about shared memory
> > access and making sure that all callers understand that it can move
> > backwards on a crash/restart.  I don't think that's an issue for current
> > users but we just need to make sure to try and comment sufficiently
> > regarding that such that new users understand that about this particular
> > value.
>
> Right.

Awesome.  Was there any other feedback on this change which basically is
just getting rid of a spinlock and replacing it with using atomics?
It's still in needs-review status but there's been a number of
comments/reviews (drive-by, at least) but without any real ask for any
changes to be made.

Thanks!

Stephen

Attachment

Re: Atomic ops for unlogged LSN

From
Nathan Bossart
Date:
On Mon, Jul 17, 2023 at 07:08:03PM -0400, Stephen Frost wrote:
> Awesome.  Was there any other feedback on this change which basically is
> just getting rid of a spinlock and replacing it with using atomics?
> It's still in needs-review status but there's been a number of
> comments/reviews (drive-by, at least) but without any real ask for any
> changes to be made.

LGTM

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: Atomic ops for unlogged LSN

From
Andres Freund
Date:
Hi,

On 2023-07-17 16:15:52 -0700, Nathan Bossart wrote:
> On Mon, Jul 17, 2023 at 07:08:03PM -0400, Stephen Frost wrote:
> > Awesome.  Was there any other feedback on this change which basically is
> > just getting rid of a spinlock and replacing it with using atomics?
> > It's still in needs-review status but there's been a number of
> > comments/reviews (drive-by, at least) but without any real ask for any
> > changes to be made.
> 
> LGTM

Why don't we just use a barrier when around reading the value? It's not like
CreateCheckPoint() is frequent?

Greetings,

Andres Freund



Re: Atomic ops for unlogged LSN

From
John Morris
Date:

>> Why don't we just use a barrier when around reading the value? It's not like
>> CreateCheckPoint() is frequent?

 

One reason is that a barrier isn’t needed, and adding unnecessary barriers can also be confusing.

 

With respect to the “debug only” comment in the original code, whichever value is written to the structure during a checkpoint will be reset when restarting. Nobody will ever see that value. We could just as easily write a zero.

 

Shutting down is a different story. It isn’t stated, but we require the unlogged LSN be quiescent before shutting down. This patch doesn’t change that requirement.

 

We could also argue that memory ordering doesn’t matter when filling in a conventional, unlocked structure.  But the fact we have only two cases 1) checkpoint where the value is ignored, and 2) shutdown where it is quiescent, makes the additional argument unnecessary.

 

Would you be more comfortable if I added comments describing the two cases? My intent was to be minimalistic, but the original code could use better explanation.

Re: Atomic ops for unlogged LSN

From
Bharath Rupireddy
Date:
On Thu, Jul 20, 2023 at 12:25 AM John Morris
<john.morris@crunchydata.com> wrote:
>
> >> Why don't we just use a barrier when around reading the value? It's not like
> >> CreateCheckPoint() is frequent?
>
> One reason is that a barrier isn’t needed, and adding unnecessary barriers can also be confusing.
>
> With respect to the “debug only” comment in the original code, whichever value is written to the structure during a
checkpointwill be reset when restarting. Nobody will ever see that value. We could just as easily write a zero. 

>
> Shutting down is a different story. It isn’t stated, but we require the unlogged LSN be quiescent before shutting
down.This patch doesn’t change that requirement. 
>
> We could also argue that memory ordering doesn’t matter when filling in a conventional, unlocked structure.  But the
factwe have only two cases 1) checkpoint where the value is ignored, and 2) shutdown where it is quiescent, makes the
additionalargument unnecessary. 
>
> Would you be more comfortable if I added comments describing the two cases? My intent was to be minimalistic, but the
originalcode could use better explanation. 

Here, the case for unloggedLSN is that there can exist multiple
backends incrementing/writing it (via GetFakeLSNForUnloggedRel), and
there can exist one reader at a time in CreateCheckPoint with
LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);. Is incrementing
unloggedLSN atomically (with an implied memory barrier from
pg_atomic_fetch_add_u64) helping out synchronize multiple writers and
readers? With a spinlock, writers-readers synchronization is
guaranteed. With an atomic variable, it is guaranteed that the readers
don't see a torn-value, but no synchronization is provided.

If the above understanding is right, what happens if readers see an
old unloggedLSN value - reader here stores the old unloggedLSN value
to control file and the server restarts (clean exit). So, the old
value is loaded back to unloggedLSN upon restart and the callers of
GetFakeLSNForUnloggedRel() will see an old/repeated value too. Is it a
problem?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Atomic ops for unlogged LSN

From
John Morris
Date:


> what happens if …  reader here stores the old unloggedLSN value
> to control file and the server restarts (clean exit). So, the old
>value is loaded back to unloggedLSN upon restart and the callers of
> GetFakeLSNForUnloggedRel() will see an old/repeated value too. Is it a
> problem?

 

First, a clarification. The value being saved is the “next” unlogged LSN,
not one which has already been used.
(we are doing “fetch and add”,  not “add and fetch”)

 

You have a good point about shutdown and startup.  It is vital we
don’t repeat an unlogged LSN. This situation could easily happen
If other readers were active while we were shutting down.

 

>With an atomic variable, it is guaranteed that the readers
>don't see a torn-value, but no synchronization is provided.

The atomic increment also ensures the sequence
of values is correct, specifically we don’t see
repeated values like we might with a conventional increment.
As a side effect, the instruction enforces a memory barrier, but we are not
relying on a barrier in this case.

Re: Atomic ops for unlogged LSN

From
John Morris
Date:

Based on your feedback, I’ve updated the patch with additional comments.

  1. Explain the two cases when writing to the control file, and
  2. a bit more emphasis on unloggedLSNs not being valid after a crash.

 

Thanks to y’all.

  • John
Attachment

Re: Atomic ops for unlogged LSN

From
Bharath Rupireddy
Date:
On Fri, Jul 21, 2023 at 4:43 AM John Morris <john.morris@crunchydata.com> wrote:
>
> Based on your feedback, I’ve updated the patch with additional comments.
>
> Explain the two cases when writing to the control file, and
> a bit more emphasis on unloggedLSNs not being valid after a crash.

Given that the callers already have to deal with the unloggedLSN being
reset after a crash, meaning, they can't expect an always increasing
value from unloggedLSN, I think converting it to an atomic variable
seems a reasonable change. The other advantage we get here is the
freeing up shared memory space for spinlock ulsn_lck.

The attached v2 patch LGTM and marked the CF entry RfC -
https://commitfest.postgresql.org/43/4330/.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Atomic ops for unlogged LSN

From
John Morris
Date:

Keeping things up to date.  Here is a rebased patch with no changes from previous one.

 

  • John Morris
Attachment

Re: Atomic ops for unlogged LSN

From
Nathan Bossart
Date:
On Thu, Oct 26, 2023 at 03:00:58PM +0000, John Morris wrote:
> Keeping things up to date.  Here is a rebased patch with no changes from previous one.

This patch looks a little different than the last version I see posted [0].
That last version of the patch (which appears to be just about committable)
still applies for me, too.

[0] https://postgr.es/m/BYAPR13MB2677ED1797C81779D17B414CA03EA%40BYAPR13MB2677.namprd13.prod.outlook.com

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: Atomic ops for unlogged LSN

From
John Morris
Date:
  • This patch looks a little different than the last version I see posted [0].
    That last version of the patch (which appears to be just about committable)

My oops – I was looking at the wrong commit. The newer patch has already been committed,  so pretend that last message didn’t happen. Thanks,

   John

Re: Atomic ops for unlogged LSN

From
John Morris
Date:

Here is what I meant to do earlier. As it turns out, this patch has not been merged yet.

 

This is a rebased version . Even though I labelled it “v3”, there should be no changes from “v2”.

Attachment

Re: Atomic ops for unlogged LSN

From
Nathan Bossart
Date:
On Wed, Nov 01, 2023 at 09:15:20PM +0000, John Morris wrote:
> This is a rebased version . Even though I labelled it “v3”, there should be no changes from “v2”.

Thanks.  I think this is almost ready, but I have to harp on the
pg_atomic_read_u64() business once more.  The relevant comment in atomics.h
has this note:

 * The read is guaranteed to return a value as it has been written by this or
 * another process at some point in the past. There's however no cache
 * coherency interaction guaranteeing the value hasn't since been written to
 * again.

However unlikely, this seems to suggest that CreateCheckPoint() could see
an old value with your patch.  Out of an abundance of caution, I'd
recommend changing this to pg_atomic_compare_exchange_u64() like
pg_atomic_read_u64_impl() does in generic.h.

@@ -4635,7 +4629,6 @@ XLOGShmemInit(void)
 
     SpinLockInit(&XLogCtl->Insert.insertpos_lck);
     SpinLockInit(&XLogCtl->info_lck);
-    SpinLockInit(&XLogCtl->ulsn_lck);
 }

Shouldn't we do the pg_atomic_init_u64() here?  We can still set the
initial value in StartupXLOG(), but it might be safer to initialize the
variable where we are initializing the other shared memory stuff.

Since this isn't a tremendously performance-sensitive area, IMHO we should
code defensively to eliminate any doubts about correctness and to make it
easier to reason about.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: Atomic ops for unlogged LSN

From
Bharath Rupireddy
Date:
On Thu, Nov 2, 2023 at 9:10 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
>
> On Wed, Nov 01, 2023 at 09:15:20PM +0000, John Morris wrote:
> > This is a rebased version . Even though I labelled it “v3”, there should be no changes from “v2”.
>
> Thanks.  I think this is almost ready, but I have to harp on the
> pg_atomic_read_u64() business once more.  The relevant comment in atomics.h
> has this note:
>
>  * The read is guaranteed to return a value as it has been written by this or
>  * another process at some point in the past. There's however no cache
>  * coherency interaction guaranteeing the value hasn't since been written to
>  * again.
>
> However unlikely, this seems to suggest that CreateCheckPoint() could see
> an old value with your patch.  Out of an abundance of caution, I'd
> recommend changing this to pg_atomic_compare_exchange_u64() like
> pg_atomic_read_u64_impl() does in generic.h.

+1. pg_atomic_read_u64 provides no barrier semantics whereas
pg_atomic_compare_exchange_u64 does. Without the barrier, it might
happen that the value is read while the other backend is changing it.
I think something like below providing full barrier semantics looks
fine to me:

XLogRecPtr ulsn;

pg_atomic_compare_exchange_u64(&XLogCtl->unloggedLSN, &ulsn, 0);
ControlFile->unloggedLSN = ulsn;

> @@ -4635,7 +4629,6 @@ XLOGShmemInit(void)
>
>         SpinLockInit(&XLogCtl->Insert.insertpos_lck);
>         SpinLockInit(&XLogCtl->info_lck);
> -       SpinLockInit(&XLogCtl->ulsn_lck);
>  }
>
> Shouldn't we do the pg_atomic_init_u64() here?  We can still set the
> initial value in StartupXLOG(), but it might be safer to initialize the
> variable where we are initializing the other shared memory stuff.

I think no one accesses the unloggedLSN in between
CreateSharedMemoryAndSemaphores -> XLOGShmemInit and StartupXLOG.
However, I see nothing wrong in doing
pg_atomic_init_u64(&XLogCtl->unloggedLSN, InvalidXLogRecPtr); in
XLOGShmemInit.

> Since this isn't a tremendously performance-sensitive area, IMHO we should
> code defensively to eliminate any doubts about correctness and to make it
> easier to reason about.

Right.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Atomic ops for unlogged LSN

From
Nathan Bossart
Date:
On Wed, Nov 01, 2023 at 10:40:06PM -0500, Nathan Bossart wrote:
> Since this isn't a tremendously performance-sensitive area, IMHO we should
> code defensively to eliminate any doubts about correctness and to make it
> easier to reason about.

Concretely, like this.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

Re: Atomic ops for unlogged LSN

From
John Morris
Date:

I incorporated your suggestions and added a few more. The changes are mainly related to catching potential errors if some basic assumptions aren’t met.

 

There are basically 3 assumptions. Stating them as conditions we want to avoid.

  • We should not get an unlogged LSN before reading the control file.
  • We should not get an unlogged LSN when shutting down.
  • The unlogged LSN written out during a checkpoint shouldn’t be used.

 

Your suggestion addressed the first problem, and it took only minor changes to address the other two.

 

The essential idea is, we set a value of zero in each of the 3 situations. Then we throw an Assert() If somebody tries to allocate an unlogged LSN with the value zero.

 

I found the comment about cache coherency a bit confusing. We are dealing with a single address, so there should be no memory ordering or coherency issues. (Did I misunderstand?) I see it more as a race condition. Rather than merely explaining why it shouldn’t happen, the new version verifies the assumptions and throws an Assert() if something goes wrong.

 

Attachment

Re: Atomic ops for unlogged LSN

From
Nathan Bossart
Date:
On Tue, Nov 07, 2023 at 12:57:32AM +0000, John Morris wrote:
> I incorporated your suggestions and added a few more. The changes are
> mainly related to catching potential errors if some basic assumptions
> aren’t met.

Hm.  Could we move that to a separate patch?  We've lived without these
extra checks for a very long time, and I'm not aware of any related issues,
so I'm not sure it's worth the added complexity.  And IMO it'd be better to
keep it separate from the initial atomics conversion, anyway.

> I found the comment about cache coherency a bit confusing. We are dealing
> with a single address, so there should be no memory ordering or coherency
> issues. (Did I misunderstand?) I see it more as a race condition. Rather
> than merely explaining why it shouldn’t happen, the new version verifies
> the assumptions and throws an Assert() if something goes wrong.

I was thinking of the comment for pg_atomic_read_u32() that I cited earlier
[0].  This comment also notes that pg_atomic_read_u32/64() has no barrier
semantics.  My interpretation of that comment is that these functions
provide no guarantee that the value returned is the most up-to-date value.
But my interpretation could be wrong, and maybe this is meant to highlight
that the value might change before we can use the return value in a
compare/exchange or something.

I spent a little time earlier today reviewing the various underlying
implementations, but apparently I need to spend some more time looking at
those...

[0] https://postgr.es/m/20231102034006.GA85609%40nathanxps13

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: Atomic ops for unlogged LSN

From
Stephen Frost
Date:
Greetings,

* Nathan Bossart (nathandbossart@gmail.com) wrote:
> On Tue, Nov 07, 2023 at 12:57:32AM +0000, John Morris wrote:
> > I incorporated your suggestions and added a few more. The changes are
> > mainly related to catching potential errors if some basic assumptions
> > aren’t met.
>
> Hm.  Could we move that to a separate patch?  We've lived without these
> extra checks for a very long time, and I'm not aware of any related issues,
> so I'm not sure it's worth the added complexity.  And IMO it'd be better to
> keep it separate from the initial atomics conversion, anyway.

I do see the value in adding in an Assert though I don't want to throw
away the info about what the recent unlogged LSN was when we crash.  As
that basically boils down to a one-line addition, I don't think it
really needs to be in a separate patch.

> > I found the comment about cache coherency a bit confusing. We are dealing
> > with a single address, so there should be no memory ordering or coherency
> > issues. (Did I misunderstand?) I see it more as a race condition. Rather
> > than merely explaining why it shouldn’t happen, the new version verifies
> > the assumptions and throws an Assert() if something goes wrong.
>
> I was thinking of the comment for pg_atomic_read_u32() that I cited earlier
> [0].  This comment also notes that pg_atomic_read_u32/64() has no barrier
> semantics.  My interpretation of that comment is that these functions
> provide no guarantee that the value returned is the most up-to-date value.

There seems to be some serious misunderstanding about what is happening
here.  The value written into the control file for unlogged LSN during
normal operation does *not* need to be the most up-to-date value and
talking about it as if it needs to be the absolutely most up-to-date and
correct value is, if anything, adding to the confusion, not reducing
confusion.  The reason to write in anything other than a zero during
these routine checkpoints for unlogged LSN is entirely for forensics
purposes, not because we'll ever actually use the value- during crash
recovery and backup/restore, we're going to reset the unlogged LSN
counter anyway and we're going to throw away all of unlogged table
contents across the entire system.

We only care about the value of the unlogged LSN being correct during
normal shutdown when we're writing out the shutdown checkpoint, but by
that time everything else has been shut down and the value absolutely
should not be changing.

Thanks,

Stephen

Attachment

Re: Atomic ops for unlogged LSN

From
Nathan Bossart
Date:
On Tue, Nov 07, 2023 at 11:47:46AM -0500, Stephen Frost wrote:
> We only care about the value of the unlogged LSN being correct during
> normal shutdown when we're writing out the shutdown checkpoint, but by
> that time everything else has been shut down and the value absolutely
> should not be changing.

I agree that's all true.  I'm trying to connect how this scenario ensures
we see the most up-to-date value in light of this comment above
pg_atomic_read_u32():

 * The read is guaranteed to return a value as it has been written by this or
 * another process at some point in the past. There's however no cache
 * coherency interaction guaranteeing the value hasn't since been written to
 * again.

Is there something special about all other backends being shut down that
ensures this returns the most up-to-date value and not something from "some
point in the past" as the stated contract for this function seems to
suggest?

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: Atomic ops for unlogged LSN

From
Andres Freund
Date:
Hi,

On 2023-11-07 11:02:49 -0600, Nathan Bossart wrote:
> On Tue, Nov 07, 2023 at 11:47:46AM -0500, Stephen Frost wrote:
> > We only care about the value of the unlogged LSN being correct during
> > normal shutdown when we're writing out the shutdown checkpoint, but by
> > that time everything else has been shut down and the value absolutely
> > should not be changing.
> 
> I agree that's all true.  I'm trying to connect how this scenario ensures
> we see the most up-to-date value in light of this comment above
> pg_atomic_read_u32():
> 
>  * The read is guaranteed to return a value as it has been written by this or
>  * another process at some point in the past. There's however no cache
>  * coherency interaction guaranteeing the value hasn't since been written to
>  * again.
> 
> Is there something special about all other backends being shut down that
> ensures this returns the most up-to-date value and not something from "some
> point in the past" as the stated contract for this function seems to
> suggest?

Practically yes - getting to the point of writing the shutdown checkpoint
implies having gone through a bunch of code that implies memory barriers
(spinlocks, lwlocks).

However, even if there's likely some other implied memory barrier that we
could piggyback on, the patch much simpler to understand if it doesn't change
coherency rules. There's no way the overhead could matter.

Greetings,

Andres Freund



Re: Atomic ops for unlogged LSN

From
Andres Freund
Date:
Hi,

On 2023-11-07 00:57:32 +0000, John Morris wrote:
> I found the comment about cache coherency a bit confusing. We are dealing
> with a single address, so there should be no memory ordering or coherency
> issues. (Did I misunderstand?) I see it more as a race condition.

IMO cache coherency covers the value a single variable has in different
threads / processes.

In fact, the only reason there effectively is a guarantee that you're not
seeing an outdated unloggedLSN value during shutdown checkpoints, even without
the spinlock or full barrier atomic op, is that the LWLockAcquire(), a few
lines above this, would prevent both the compiler and CPU from moving the read
of unloggedLSN to much earlier.  Obviously that lwlock has a different
address...


If the patch just had done the minimal conversion, it'd already have been
committed...  Even if there'd be a performance reason to get rid of the memory
barrier around reading unloggedLSN in CreateCheckPoint(), I'd do the
conversion in a separate commit.

Greetings,

Andres Freund



Re: Atomic ops for unlogged LSN

From
Nathan Bossart
Date:
On Tue, Nov 07, 2023 at 04:58:16PM -0800, Andres Freund wrote:
> On 2023-11-07 11:02:49 -0600, Nathan Bossart wrote:
>> Is there something special about all other backends being shut down that
>> ensures this returns the most up-to-date value and not something from "some
>> point in the past" as the stated contract for this function seems to
>> suggest?
> 
> Practically yes - getting to the point of writing the shutdown checkpoint
> implies having gone through a bunch of code that implies memory barriers
> (spinlocks, lwlocks).

Sure.

> However, even if there's likely some other implied memory barrier that we
> could piggyback on, the patch much simpler to understand if it doesn't change
> coherency rules. There's no way the overhead could matter.

I wonder if it's worth providing a set of "locked read" functions.  Those
could just do a compare/exchange with 0 in the generic implementation.  For
patches like this one where the overhead really shouldn't matter, I'd
encourage folks to use those to make it easy to reason about correctness.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: Atomic ops for unlogged LSN

From
Nathan Bossart
Date:
On Thu, Nov 09, 2023 at 03:27:33PM -0600, Nathan Bossart wrote:
> I wonder if it's worth providing a set of "locked read" functions.  Those
> could just do a compare/exchange with 0 in the generic implementation.  For
> patches like this one where the overhead really shouldn't matter, I'd
> encourage folks to use those to make it easy to reason about correctness.

I moved this proposal to a new thread [0].

[0] https://postgr.es/m/20231110205128.GB1315705%40nathanxps13

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: Atomic ops for unlogged LSN

From
Nathan Bossart
Date:
Here is a new version of the patch that uses the new atomic read/write
functions with full barriers that were added in commit bd5132d.  Thoughts?

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

Re: Atomic ops for unlogged LSN

From
Stephen Frost
Date:
Greetings,

* Nathan Bossart (nathandbossart@gmail.com) wrote:
> Here is a new version of the patch that uses the new atomic read/write
> functions with full barriers that were added in commit bd5132d.  Thoughts?

Saw that commit go in- glad to see it.  Thanks for updating this patch
too.  The changes look good to me.

Thanks again,

Stephen

Attachment

Re: Atomic ops for unlogged LSN

From
John Morris
Date:

Looks good to me.

  • John

 

From: Nathan Bossart <nathandbossart@gmail.com>
Date: Thursday, February 29, 2024 at 8:34
AM
To: Andres Freund <andres@anarazel.de>
Cc: Stephen Frost <sfrost@snowman.net>, John Morris <john.morris@crunchydata.com>, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>, Michael Paquier <michael@paquier.xyz>, Robert Haas <robertmhaas@gmail.com>, pgsql-hackers@postgresql.org <pgsql-hackers@postgresql.org>
Subject: Re: Atomic ops for unlogged LSN

Here is a new version of the patch that uses the new atomic read/write
functions with full barriers that were added in commit bd5132d.  Thoughts?

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: Atomic ops for unlogged LSN

From
Bharath Rupireddy
Date:
On Thu, Feb 29, 2024 at 10:04 PM Nathan Bossart
<nathandbossart@gmail.com> wrote:
>
> Here is a new version of the patch that uses the new atomic read/write
> functions with full barriers that were added in commit bd5132d.  Thoughts?

Thanks for getting the other patch in. The attached v6 patch LGTM.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Atomic ops for unlogged LSN

From
Nathan Bossart
Date:
Committed.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com