Thread: Atomic ops for unlogged LSN
This is a short patch which cleans up code for unlogged LSNs. It replaces the existing spinlock with
atomic ops. It could provide a performance benefit for future uses of
unlogged LSNS, but for now it is simply a cleaner implementation.
Attachment
On Tue, May 23, 2023 at 08:24:45PM +0000, John Morris wrote: > This is a short patch which cleans up code for unlogged LSNs. It > replaces the existing spinlock with atomic ops. It could provide a > performance benefit for future uses of unlogged LSNS, but for now > it is simply a cleaner implementation. Seeing the code paths where gistGetFakeLSN() is called, I guess that the answer will be no, still are you seeing a measurable difference in some cases? - /* increment the unloggedLSN counter, need SpinLock */ - SpinLockAcquire(&XLogCtl->ulsn_lck); - nextUnloggedLSN = XLogCtl->unloggedLSN++; - SpinLockRelease(&XLogCtl->ulsn_lck); - - return nextUnloggedLSN; + return pg_atomic_fetch_add_u64(&XLogCtl->unloggedLSN, 1); Spinlocks provide a full memory barrier, which may not the case with add_u64() or read_u64(). Could memory ordering be a problem in these code paths? -- Michael
Attachment
On Tue, May 23, 2023 at 6:26 PM Michael Paquier <michael@paquier.xyz> wrote: > Spinlocks provide a full memory barrier, which may not the case with > add_u64() or read_u64(). Could memory ordering be a problem in these > code paths? It could be a huge problem if what you say were true, but unless I'm missing something, the comments in atomics.h say that it isn't. The comments for the 64-bit functions say to look at the 32-bit functions, and there it says: /* * pg_atomic_add_fetch_u32 - atomically add to variable * * Returns the value of ptr after the arithmetic operation. * * Full barrier semantics. */ Which is probably a good thing, because I'm not sure what good it would be to have an operation that both reads and modifies an atomic variable but has no barrier semantics. That's not to say that I entirely understand the point of this patch. It seems like a generally reasonable thing to do AFAICT but somehow I wonder if there's more to the story here, because it doesn't feel like it would be easy to measure the benefit of this patch in isolation. That's not a reason to reject it, though, just something that makes me curious. -- Robert Haas EDB: http://www.enterprisedb.com
Hi, On 2023-05-24 08:22:08 -0400, Robert Haas wrote: > On Tue, May 23, 2023 at 6:26 PM Michael Paquier <michael@paquier.xyz> wrote: > > Spinlocks provide a full memory barrier, which may not the case with > > add_u64() or read_u64(). Could memory ordering be a problem in these > > code paths? > > It could be a huge problem if what you say were true, but unless I'm > missing something, the comments in atomics.h say that it isn't. The > comments for the 64-bit functions say to look at the 32-bit functions, > and there it says: > > /* > * pg_atomic_add_fetch_u32 - atomically add to variable > * > * Returns the value of ptr after the arithmetic operation. > * > * Full barrier semantics. > */ > > Which is probably a good thing, because I'm not sure what good it > would be to have an operation that both reads and modifies an atomic > variable but has no barrier semantics. I was a bit confused by Michael's comment as well, due to the section of code quoted. But he does have a point: pg_atomic_read_u32() does indeed *not* have barrier semantics (it'd be way more expensive), and the patch does contain this hunk: > @@ -6784,9 +6775,7 @@ CreateCheckPoint(int flags) > * unused on non-shutdown checkpoints, but seems useful to store it always > * for debugging purposes. > */ > - SpinLockAcquire(&XLogCtl->ulsn_lck); > - ControlFile->unloggedLSN = XLogCtl->unloggedLSN; > - SpinLockRelease(&XLogCtl->ulsn_lck); > + ControlFile->unloggedLSN = pg_atomic_read_u64(&XLogCtl->unloggedLSN); > > UpdateControlFile(); > LWLockRelease(ControlFileLock); So we indeed loose some "barrier strength" - but I think that's fine. For one, it's a debugging-only value. But more importantly, I don't see what reordering the barrier could prevent - a barrier is useful for things like sequencing two memory accesses to happen in the intended order - but unloggedLSN doesn't interact with another variable that's accessed within the ControlFileLock afaict. > That's not to say that I entirely understand the point of this patch. > It seems like a generally reasonable thing to do AFAICT but somehow I > wonder if there's more to the story here, because it doesn't feel like > it would be easy to measure the benefit of this patch in isolation. > That's not a reason to reject it, though, just something that makes me > curious. I doubt it's a meaningful, if even measurable win. But removing atomic ops and reducing shared memory space isn't a bad thing, even if there's no immediate benefit... Greetings, Andres Freund
On Wed, May 24, 2023 at 02:49:58PM -0700, Andres Freund wrote: > I was a bit confused by Michael's comment as well, due to the section of code > quoted. But he does have a point: pg_atomic_read_u32() does indeed *not* have > barrier semantics (it'd be way more expensive), and the patch does contain > this hunk: Thanks for the correction. The part about _add was incorrect. > So we indeed loose some "barrier strength" - but I think that's fine. For one, > it's a debugging-only value. But more importantly, I don't see what reordering > the barrier could prevent - a barrier is useful for things like sequencing two > memory accesses to happen in the intended order - but unloggedLSN doesn't > interact with another variable that's accessed within the ControlFileLock > afaict. This stuff is usually tricky enough that I am never completely sure whether it is fine without reading again README.barrier, which is where unloggedLSN is saved in the control file under its LWLock. Something that I find confusing in the patch is that it does not document the reason why this is OK. -- Michael
Attachment
On Thu, May 25, 2023 at 07:41:21AM +0900, Michael Paquier wrote: > On Wed, May 24, 2023 at 02:49:58PM -0700, Andres Freund wrote: >> So we indeed loose some "barrier strength" - but I think that's fine. For one, >> it's a debugging-only value. Is it? I see uses in GiST indexing (62401db), so it's not immediately obvious to me how it is debugging-only. If it is, then I think this patch ought to clearly document it so that nobody else tries to use it for non-debugging-only stuff. >> But more importantly, I don't see what reordering >> the barrier could prevent - a barrier is useful for things like sequencing two >> memory accesses to happen in the intended order - but unloggedLSN doesn't >> interact with another variable that's accessed within the ControlFileLock >> afaict. > > This stuff is usually tricky enough that I am never completely sure > whether it is fine without reading again README.barrier, which is > where unloggedLSN is saved in the control file under its LWLock. > Something that I find confusing in the patch is that it does not > document the reason why this is OK. My concern would be whether GetFakeLSNForUnloggedRel or CreateCheckPoint might see an old value of unloggedLSN. From the following note in README.barrier, it sounds like this would be a problem even if we ensured full barrier semantics: 3. No ordering guarantees. While memory barriers ensure that any given process performs loads and stores to shared memory in order, they don't guarantee synchronization. In the queue example above, we can use memory barriers to be sure that readers won't see garbage, but there's nothing to say whether a given reader will run before or after a given writer. If this matters in a given situation, some other mechanism must be used instead of or in addition to memory barriers. IIUC we know that shared memory accesses cannot be reordered to precede aquisition or follow release of a spinlock (thanks to 0709b7e), which is why this isn't a problem in the current implementation. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Greetings, * Nathan Bossart (nathandbossart@gmail.com) wrote: > On Thu, May 25, 2023 at 07:41:21AM +0900, Michael Paquier wrote: > > On Wed, May 24, 2023 at 02:49:58PM -0700, Andres Freund wrote: > >> So we indeed loose some "barrier strength" - but I think that's fine. For one, > >> it's a debugging-only value. > > Is it? I see uses in GiST indexing (62401db), so it's not immediately > obvious to me how it is debugging-only. If it is, then I think this patch > ought to clearly document it so that nobody else tries to use it for > non-debugging-only stuff. I don't see it as a debugging value. I'm not sure where that came from..? We do use it in places and if anything, I expect it to be used more, not less, in the future as a persistant generally increasing value (could go backwards on a crash-restart but in such case all unlogged tables are truncated). > >> But more importantly, I don't see what reordering > >> the barrier could prevent - a barrier is useful for things like sequencing two > >> memory accesses to happen in the intended order - but unloggedLSN doesn't > >> interact with another variable that's accessed within the ControlFileLock > >> afaict. > > > > This stuff is usually tricky enough that I am never completely sure > > whether it is fine without reading again README.barrier, which is > > where unloggedLSN is saved in the control file under its LWLock. > > Something that I find confusing in the patch is that it does not > > document the reason why this is OK. > > My concern would be whether GetFakeLSNForUnloggedRel or CreateCheckPoint > might see an old value of unloggedLSN. From the following note in > README.barrier, it sounds like this would be a problem even if we ensured > full barrier semantics: > > 3. No ordering guarantees. While memory barriers ensure that any given > process performs loads and stores to shared memory in order, they don't > guarantee synchronization. In the queue example above, we can use memory > barriers to be sure that readers won't see garbage, but there's nothing to > say whether a given reader will run before or after a given writer. If this > matters in a given situation, some other mechanism must be used instead of > or in addition to memory barriers. > > IIUC we know that shared memory accesses cannot be reordered to precede > aquisition or follow release of a spinlock (thanks to 0709b7e), which is > why this isn't a problem in the current implementation. The concern around unlogged LSN, imv anyway, is less about shared memory access and making sure that all callers understand that it can move backwards on a crash/restart. I don't think that's an issue for current users but we just need to make sure to try and comment sufficiently regarding that such that new users understand that about this particular value. Thanks, Stephen
Attachment
On Mon, Jun 12, 2023 at 07:24:18PM -0400, Stephen Frost wrote: > * Nathan Bossart (nathandbossart@gmail.com) wrote: >> Is it? I see uses in GiST indexing (62401db), so it's not immediately >> obvious to me how it is debugging-only. If it is, then I think this patch >> ought to clearly document it so that nobody else tries to use it for >> non-debugging-only stuff. > > I don't see it as a debugging value. I'm not sure where that came > from..? We do use it in places and if anything, I expect it to be used > more, not less, in the future as a persistant generally increasing > value (could go backwards on a crash-restart but in such case all > unlogged tables are truncated). This is my understanding as well. >> My concern would be whether GetFakeLSNForUnloggedRel or CreateCheckPoint >> might see an old value of unloggedLSN. From the following note in >> README.barrier, it sounds like this would be a problem even if we ensured >> full barrier semantics: Never mind. I think I'm forgetting that the atomics support in Postgres deals with cache coherency. > The concern around unlogged LSN, imv anyway, is less about shared memory > access and making sure that all callers understand that it can move > backwards on a crash/restart. I don't think that's an issue for current > users but we just need to make sure to try and comment sufficiently > regarding that such that new users understand that about this particular > value. Right. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Greetings, * Nathan Bossart (nathandbossart@gmail.com) wrote: > On Mon, Jun 12, 2023 at 07:24:18PM -0400, Stephen Frost wrote: > > * Nathan Bossart (nathandbossart@gmail.com) wrote: > >> Is it? I see uses in GiST indexing (62401db), so it's not immediately > >> obvious to me how it is debugging-only. If it is, then I think this patch > >> ought to clearly document it so that nobody else tries to use it for > >> non-debugging-only stuff. > > > > I don't see it as a debugging value. I'm not sure where that came > > from..? We do use it in places and if anything, I expect it to be used > > more, not less, in the future as a persistant generally increasing > > value (could go backwards on a crash-restart but in such case all > > unlogged tables are truncated). > > This is my understanding as well. > > >> My concern would be whether GetFakeLSNForUnloggedRel or CreateCheckPoint > >> might see an old value of unloggedLSN. From the following note in > >> README.barrier, it sounds like this would be a problem even if we ensured > >> full barrier semantics: > > Never mind. I think I'm forgetting that the atomics support in Postgres > deals with cache coherency. > > > The concern around unlogged LSN, imv anyway, is less about shared memory > > access and making sure that all callers understand that it can move > > backwards on a crash/restart. I don't think that's an issue for current > > users but we just need to make sure to try and comment sufficiently > > regarding that such that new users understand that about this particular > > value. > > Right. Awesome. Was there any other feedback on this change which basically is just getting rid of a spinlock and replacing it with using atomics? It's still in needs-review status but there's been a number of comments/reviews (drive-by, at least) but without any real ask for any changes to be made. Thanks! Stephen
Attachment
On Mon, Jul 17, 2023 at 07:08:03PM -0400, Stephen Frost wrote: > Awesome. Was there any other feedback on this change which basically is > just getting rid of a spinlock and replacing it with using atomics? > It's still in needs-review status but there's been a number of > comments/reviews (drive-by, at least) but without any real ask for any > changes to be made. LGTM -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Hi, On 2023-07-17 16:15:52 -0700, Nathan Bossart wrote: > On Mon, Jul 17, 2023 at 07:08:03PM -0400, Stephen Frost wrote: > > Awesome. Was there any other feedback on this change which basically is > > just getting rid of a spinlock and replacing it with using atomics? > > It's still in needs-review status but there's been a number of > > comments/reviews (drive-by, at least) but without any real ask for any > > changes to be made. > > LGTM Why don't we just use a barrier when around reading the value? It's not like CreateCheckPoint() is frequent? Greetings, Andres Freund
>> Why don't we just use a barrier when around reading the value? It's not like
>> CreateCheckPoint() is frequent?
One reason is that a barrier isn’t needed, and adding unnecessary barriers can also be confusing.
With respect to the “debug only” comment in the original code, whichever value is written to the structure during a checkpoint will be reset when restarting. Nobody will ever see that value. We could just as easily write a zero.
Shutting down is a different story. It isn’t stated, but we require the unlogged LSN be quiescent before shutting down. This patch doesn’t change that requirement.
We could also argue that memory ordering doesn’t matter when filling in a conventional, unlocked structure. But the fact we have only two cases 1) checkpoint where the value is ignored, and 2) shutdown where it is quiescent, makes the additional argument unnecessary.
Would you be more comfortable if I added comments describing the two cases? My intent was to be minimalistic, but the original code could use better explanation.
On Thu, Jul 20, 2023 at 12:25 AM John Morris <john.morris@crunchydata.com> wrote: > > >> Why don't we just use a barrier when around reading the value? It's not like > >> CreateCheckPoint() is frequent? > > One reason is that a barrier isn’t needed, and adding unnecessary barriers can also be confusing. > > With respect to the “debug only” comment in the original code, whichever value is written to the structure during a checkpointwill be reset when restarting. Nobody will ever see that value. We could just as easily write a zero. > > Shutting down is a different story. It isn’t stated, but we require the unlogged LSN be quiescent before shutting down.This patch doesn’t change that requirement. > > We could also argue that memory ordering doesn’t matter when filling in a conventional, unlocked structure. But the factwe have only two cases 1) checkpoint where the value is ignored, and 2) shutdown where it is quiescent, makes the additionalargument unnecessary. > > Would you be more comfortable if I added comments describing the two cases? My intent was to be minimalistic, but the originalcode could use better explanation. Here, the case for unloggedLSN is that there can exist multiple backends incrementing/writing it (via GetFakeLSNForUnloggedRel), and there can exist one reader at a time in CreateCheckPoint with LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);. Is incrementing unloggedLSN atomically (with an implied memory barrier from pg_atomic_fetch_add_u64) helping out synchronize multiple writers and readers? With a spinlock, writers-readers synchronization is guaranteed. With an atomic variable, it is guaranteed that the readers don't see a torn-value, but no synchronization is provided. If the above understanding is right, what happens if readers see an old unloggedLSN value - reader here stores the old unloggedLSN value to control file and the server restarts (clean exit). So, the old value is loaded back to unloggedLSN upon restart and the callers of GetFakeLSNForUnloggedRel() will see an old/repeated value too. Is it a problem? -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
> what happens if … reader here stores the old unloggedLSN value
> to control file and the server restarts (clean exit). So, the old
>value is loaded back to unloggedLSN upon restart and the callers of
> GetFakeLSNForUnloggedRel() will see an old/repeated value too. Is it a
> problem?
First, a clarification. The value being saved is the “next” unlogged LSN,
not one which has already been used.
(we are doing “fetch and add”, not “add and fetch”)
You have a good point about shutdown and startup. It is vital we
don’t repeat an unlogged LSN. This situation could easily happen
If other readers were active while we were shutting down.
>With an atomic variable, it is guaranteed that the readers
>don't see a torn-value, but no synchronization is provided.
The atomic increment also ensures the sequence
of values is correct, specifically we don’t see
repeated values like we might with a conventional increment.
As a side effect, the instruction enforces a memory barrier, but we are not
relying on a barrier in this case.
Based on your feedback, I’ve updated the patch with additional comments.
- Explain the two cases when writing to the control file, and
- a bit more emphasis on unloggedLSNs not being valid after a crash.
Thanks to y’all.
- John
Attachment
On Fri, Jul 21, 2023 at 4:43 AM John Morris <john.morris@crunchydata.com> wrote: > > Based on your feedback, I’ve updated the patch with additional comments. > > Explain the two cases when writing to the control file, and > a bit more emphasis on unloggedLSNs not being valid after a crash. Given that the callers already have to deal with the unloggedLSN being reset after a crash, meaning, they can't expect an always increasing value from unloggedLSN, I think converting it to an atomic variable seems a reasonable change. The other advantage we get here is the freeing up shared memory space for spinlock ulsn_lck. The attached v2 patch LGTM and marked the CF entry RfC - https://commitfest.postgresql.org/43/4330/. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Keeping things up to date. Here is a rebased patch with no changes from previous one.
- John Morris
Attachment
On Thu, Oct 26, 2023 at 03:00:58PM +0000, John Morris wrote: > Keeping things up to date. Here is a rebased patch with no changes from previous one. This patch looks a little different than the last version I see posted [0]. That last version of the patch (which appears to be just about committable) still applies for me, too. [0] https://postgr.es/m/BYAPR13MB2677ED1797C81779D17B414CA03EA%40BYAPR13MB2677.namprd13.prod.outlook.com -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
- This patch looks a little different than the last version I see posted [0].
That last version of the patch (which appears to be just about committable)
My oops – I was looking at the wrong commit. The newer patch has already been committed, so pretend that last message didn’t happen. Thanks,
John
Here is what I meant to do earlier. As it turns out, this patch has not been merged yet.
This is a rebased version . Even though I labelled it “v3”, there should be no changes from “v2”.
Attachment
On Wed, Nov 01, 2023 at 09:15:20PM +0000, John Morris wrote: > This is a rebased version . Even though I labelled it “v3”, there should be no changes from “v2”. Thanks. I think this is almost ready, but I have to harp on the pg_atomic_read_u64() business once more. The relevant comment in atomics.h has this note: * The read is guaranteed to return a value as it has been written by this or * another process at some point in the past. There's however no cache * coherency interaction guaranteeing the value hasn't since been written to * again. However unlikely, this seems to suggest that CreateCheckPoint() could see an old value with your patch. Out of an abundance of caution, I'd recommend changing this to pg_atomic_compare_exchange_u64() like pg_atomic_read_u64_impl() does in generic.h. @@ -4635,7 +4629,6 @@ XLOGShmemInit(void) SpinLockInit(&XLogCtl->Insert.insertpos_lck); SpinLockInit(&XLogCtl->info_lck); - SpinLockInit(&XLogCtl->ulsn_lck); } Shouldn't we do the pg_atomic_init_u64() here? We can still set the initial value in StartupXLOG(), but it might be safer to initialize the variable where we are initializing the other shared memory stuff. Since this isn't a tremendously performance-sensitive area, IMHO we should code defensively to eliminate any doubts about correctness and to make it easier to reason about. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
On Thu, Nov 2, 2023 at 9:10 AM Nathan Bossart <nathandbossart@gmail.com> wrote: > > On Wed, Nov 01, 2023 at 09:15:20PM +0000, John Morris wrote: > > This is a rebased version . Even though I labelled it “v3”, there should be no changes from “v2”. > > Thanks. I think this is almost ready, but I have to harp on the > pg_atomic_read_u64() business once more. The relevant comment in atomics.h > has this note: > > * The read is guaranteed to return a value as it has been written by this or > * another process at some point in the past. There's however no cache > * coherency interaction guaranteeing the value hasn't since been written to > * again. > > However unlikely, this seems to suggest that CreateCheckPoint() could see > an old value with your patch. Out of an abundance of caution, I'd > recommend changing this to pg_atomic_compare_exchange_u64() like > pg_atomic_read_u64_impl() does in generic.h. +1. pg_atomic_read_u64 provides no barrier semantics whereas pg_atomic_compare_exchange_u64 does. Without the barrier, it might happen that the value is read while the other backend is changing it. I think something like below providing full barrier semantics looks fine to me: XLogRecPtr ulsn; pg_atomic_compare_exchange_u64(&XLogCtl->unloggedLSN, &ulsn, 0); ControlFile->unloggedLSN = ulsn; > @@ -4635,7 +4629,6 @@ XLOGShmemInit(void) > > SpinLockInit(&XLogCtl->Insert.insertpos_lck); > SpinLockInit(&XLogCtl->info_lck); > - SpinLockInit(&XLogCtl->ulsn_lck); > } > > Shouldn't we do the pg_atomic_init_u64() here? We can still set the > initial value in StartupXLOG(), but it might be safer to initialize the > variable where we are initializing the other shared memory stuff. I think no one accesses the unloggedLSN in between CreateSharedMemoryAndSemaphores -> XLOGShmemInit and StartupXLOG. However, I see nothing wrong in doing pg_atomic_init_u64(&XLogCtl->unloggedLSN, InvalidXLogRecPtr); in XLOGShmemInit. > Since this isn't a tremendously performance-sensitive area, IMHO we should > code defensively to eliminate any doubts about correctness and to make it > easier to reason about. Right. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Wed, Nov 01, 2023 at 10:40:06PM -0500, Nathan Bossart wrote: > Since this isn't a tremendously performance-sensitive area, IMHO we should > code defensively to eliminate any doubts about correctness and to make it > easier to reason about. Concretely, like this. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Attachment
I incorporated your suggestions and added a few more. The changes are mainly related to catching potential errors if some basic assumptions aren’t met.
There are basically 3 assumptions. Stating them as conditions we want to avoid.
- We should not get an unlogged LSN before reading the control file.
- We should not get an unlogged LSN when shutting down.
- The unlogged LSN written out during a checkpoint shouldn’t be used.
Your suggestion addressed the first problem, and it took only minor changes to address the other two.
The essential idea is, we set a value of zero in each of the 3 situations. Then we throw an Assert() If somebody tries to allocate an unlogged LSN with the value zero.
I found the comment about cache coherency a bit confusing. We are dealing with a single address, so there should be no memory ordering or coherency issues. (Did I misunderstand?) I see it more as a race condition. Rather than merely explaining why it shouldn’t happen, the new version verifies the assumptions and throws an Assert() if something goes wrong.
Attachment
On Tue, Nov 07, 2023 at 12:57:32AM +0000, John Morris wrote: > I incorporated your suggestions and added a few more. The changes are > mainly related to catching potential errors if some basic assumptions > aren’t met. Hm. Could we move that to a separate patch? We've lived without these extra checks for a very long time, and I'm not aware of any related issues, so I'm not sure it's worth the added complexity. And IMO it'd be better to keep it separate from the initial atomics conversion, anyway. > I found the comment about cache coherency a bit confusing. We are dealing > with a single address, so there should be no memory ordering or coherency > issues. (Did I misunderstand?) I see it more as a race condition. Rather > than merely explaining why it shouldn’t happen, the new version verifies > the assumptions and throws an Assert() if something goes wrong. I was thinking of the comment for pg_atomic_read_u32() that I cited earlier [0]. This comment also notes that pg_atomic_read_u32/64() has no barrier semantics. My interpretation of that comment is that these functions provide no guarantee that the value returned is the most up-to-date value. But my interpretation could be wrong, and maybe this is meant to highlight that the value might change before we can use the return value in a compare/exchange or something. I spent a little time earlier today reviewing the various underlying implementations, but apparently I need to spend some more time looking at those... [0] https://postgr.es/m/20231102034006.GA85609%40nathanxps13 -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Greetings, * Nathan Bossart (nathandbossart@gmail.com) wrote: > On Tue, Nov 07, 2023 at 12:57:32AM +0000, John Morris wrote: > > I incorporated your suggestions and added a few more. The changes are > > mainly related to catching potential errors if some basic assumptions > > aren’t met. > > Hm. Could we move that to a separate patch? We've lived without these > extra checks for a very long time, and I'm not aware of any related issues, > so I'm not sure it's worth the added complexity. And IMO it'd be better to > keep it separate from the initial atomics conversion, anyway. I do see the value in adding in an Assert though I don't want to throw away the info about what the recent unlogged LSN was when we crash. As that basically boils down to a one-line addition, I don't think it really needs to be in a separate patch. > > I found the comment about cache coherency a bit confusing. We are dealing > > with a single address, so there should be no memory ordering or coherency > > issues. (Did I misunderstand?) I see it more as a race condition. Rather > > than merely explaining why it shouldn’t happen, the new version verifies > > the assumptions and throws an Assert() if something goes wrong. > > I was thinking of the comment for pg_atomic_read_u32() that I cited earlier > [0]. This comment also notes that pg_atomic_read_u32/64() has no barrier > semantics. My interpretation of that comment is that these functions > provide no guarantee that the value returned is the most up-to-date value. There seems to be some serious misunderstanding about what is happening here. The value written into the control file for unlogged LSN during normal operation does *not* need to be the most up-to-date value and talking about it as if it needs to be the absolutely most up-to-date and correct value is, if anything, adding to the confusion, not reducing confusion. The reason to write in anything other than a zero during these routine checkpoints for unlogged LSN is entirely for forensics purposes, not because we'll ever actually use the value- during crash recovery and backup/restore, we're going to reset the unlogged LSN counter anyway and we're going to throw away all of unlogged table contents across the entire system. We only care about the value of the unlogged LSN being correct during normal shutdown when we're writing out the shutdown checkpoint, but by that time everything else has been shut down and the value absolutely should not be changing. Thanks, Stephen
Attachment
On Tue, Nov 07, 2023 at 11:47:46AM -0500, Stephen Frost wrote: > We only care about the value of the unlogged LSN being correct during > normal shutdown when we're writing out the shutdown checkpoint, but by > that time everything else has been shut down and the value absolutely > should not be changing. I agree that's all true. I'm trying to connect how this scenario ensures we see the most up-to-date value in light of this comment above pg_atomic_read_u32(): * The read is guaranteed to return a value as it has been written by this or * another process at some point in the past. There's however no cache * coherency interaction guaranteeing the value hasn't since been written to * again. Is there something special about all other backends being shut down that ensures this returns the most up-to-date value and not something from "some point in the past" as the stated contract for this function seems to suggest? -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Hi, On 2023-11-07 11:02:49 -0600, Nathan Bossart wrote: > On Tue, Nov 07, 2023 at 11:47:46AM -0500, Stephen Frost wrote: > > We only care about the value of the unlogged LSN being correct during > > normal shutdown when we're writing out the shutdown checkpoint, but by > > that time everything else has been shut down and the value absolutely > > should not be changing. > > I agree that's all true. I'm trying to connect how this scenario ensures > we see the most up-to-date value in light of this comment above > pg_atomic_read_u32(): > > * The read is guaranteed to return a value as it has been written by this or > * another process at some point in the past. There's however no cache > * coherency interaction guaranteeing the value hasn't since been written to > * again. > > Is there something special about all other backends being shut down that > ensures this returns the most up-to-date value and not something from "some > point in the past" as the stated contract for this function seems to > suggest? Practically yes - getting to the point of writing the shutdown checkpoint implies having gone through a bunch of code that implies memory barriers (spinlocks, lwlocks). However, even if there's likely some other implied memory barrier that we could piggyback on, the patch much simpler to understand if it doesn't change coherency rules. There's no way the overhead could matter. Greetings, Andres Freund
Hi, On 2023-11-07 00:57:32 +0000, John Morris wrote: > I found the comment about cache coherency a bit confusing. We are dealing > with a single address, so there should be no memory ordering or coherency > issues. (Did I misunderstand?) I see it more as a race condition. IMO cache coherency covers the value a single variable has in different threads / processes. In fact, the only reason there effectively is a guarantee that you're not seeing an outdated unloggedLSN value during shutdown checkpoints, even without the spinlock or full barrier atomic op, is that the LWLockAcquire(), a few lines above this, would prevent both the compiler and CPU from moving the read of unloggedLSN to much earlier. Obviously that lwlock has a different address... If the patch just had done the minimal conversion, it'd already have been committed... Even if there'd be a performance reason to get rid of the memory barrier around reading unloggedLSN in CreateCheckPoint(), I'd do the conversion in a separate commit. Greetings, Andres Freund
On Tue, Nov 07, 2023 at 04:58:16PM -0800, Andres Freund wrote: > On 2023-11-07 11:02:49 -0600, Nathan Bossart wrote: >> Is there something special about all other backends being shut down that >> ensures this returns the most up-to-date value and not something from "some >> point in the past" as the stated contract for this function seems to >> suggest? > > Practically yes - getting to the point of writing the shutdown checkpoint > implies having gone through a bunch of code that implies memory barriers > (spinlocks, lwlocks). Sure. > However, even if there's likely some other implied memory barrier that we > could piggyback on, the patch much simpler to understand if it doesn't change > coherency rules. There's no way the overhead could matter. I wonder if it's worth providing a set of "locked read" functions. Those could just do a compare/exchange with 0 in the generic implementation. For patches like this one where the overhead really shouldn't matter, I'd encourage folks to use those to make it easy to reason about correctness. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
On Thu, Nov 09, 2023 at 03:27:33PM -0600, Nathan Bossart wrote: > I wonder if it's worth providing a set of "locked read" functions. Those > could just do a compare/exchange with 0 in the generic implementation. For > patches like this one where the overhead really shouldn't matter, I'd > encourage folks to use those to make it easy to reason about correctness. I moved this proposal to a new thread [0]. [0] https://postgr.es/m/20231110205128.GB1315705%40nathanxps13 -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Here is a new version of the patch that uses the new atomic read/write functions with full barriers that were added in commit bd5132d. Thoughts? -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Attachment
Greetings, * Nathan Bossart (nathandbossart@gmail.com) wrote: > Here is a new version of the patch that uses the new atomic read/write > functions with full barriers that were added in commit bd5132d. Thoughts? Saw that commit go in- glad to see it. Thanks for updating this patch too. The changes look good to me. Thanks again, Stephen
Attachment
Looks good to me.
- John
From: Nathan Bossart <nathandbossart@gmail.com>
Date: Thursday, February 29, 2024 at 8:34 AM
To: Andres Freund <andres@anarazel.de>
Cc: Stephen Frost <sfrost@snowman.net>, John Morris <john.morris@crunchydata.com>, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>, Michael Paquier <michael@paquier.xyz>, Robert Haas <robertmhaas@gmail.com>, pgsql-hackers@postgresql.org <pgsql-hackers@postgresql.org>
Subject: Re: Atomic ops for unlogged LSN
Here is a new version of the patch that uses the new atomic read/write
functions with full barriers that were added in commit bd5132d. Thoughts?
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
On Thu, Feb 29, 2024 at 10:04 PM Nathan Bossart <nathandbossart@gmail.com> wrote: > > Here is a new version of the patch that uses the new atomic read/write > functions with full barriers that were added in commit bd5132d. Thoughts? Thanks for getting the other patch in. The attached v6 patch LGTM. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Committed. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com