Re: LogwrtResult contended spinlock - Mailing list pgsql-hackers

From Bharath Rupireddy
Subject Re: LogwrtResult contended spinlock
Date
Msg-id CALj2ACXrePj4E6ocKr-+b=rjT-8yeMmHnEeWQP1bc-WXETfTVw@mail.gmail.com
Whole thread Raw
In response to Re: LogwrtResult contended spinlock  (Jeff Davis <pgsql@j-davis.com>)
Responses Re: LogwrtResult contended spinlock
List pgsql-hackers
On Sat, Feb 17, 2024 at 2:24 AM Jeff Davis <pgsql@j-davis.com> wrote:
>
> Though it looks like we can remove the non-shared LogwrtResult
> entirely. Andres expressed some concern here:
>
> https://www.postgresql.org/message-id/20210130020211.rtu5ir3dpjrbiats@alap3.anarazel.de
>
> But then seemed to favor removing it here:
>
> https://www.postgresql.org/message-id/20240213001150.4uqzh7tinuhvoopl@awork3.anarazel.de
>
> I'm inclined to think we can get rid of the non-shared copy.

The local copy of LogwrtResult is so frequently used in the backends,
if we were to replace it with atomic accesses, won't the atomic reads
be costly and start showing up in perf profiles? In any case, I prefer
discussing this separately once we get to a conclusion on converting
shared memory Write and Flush ptrs to atomics.

> A few other comments:
>
>  * Should GetFlushRecPtr()/GetXLogWriteRecPtr() use a read memory
> barrier?
>  * Why did you add pg_memory_barrier() right before a spinlock
> acquisition?
>  * Is it an invariant that Write >= Flush at all times? Are there
> guaranteed to be write barriers in the right place to ensure that?

I'll continue to think about these points.

> I would also like it if we could add a new "Copy" pointer indicating
> how much WAL data has been copied to the WAL buffers. That would be set
> by WaitXLogInsertionsToFinish() so that subsequent calls are cheap.
> Attached a patch (0003) for illustration purposes. It adds to the size
> of XLogCtlData, but it's fairly large already, so I'm not sure if
> that's a problem. If we do add this, there would be an invariant that
> Copy >= Write at all times.

Thanks. I have a few comments on v11j patches.

1. I guess we need to initialize the new atomics with
pg_atomic_init_u64 initially in XLOGShmemInit:

2. I guess we need to update both the Write and Flush local copies in
AdvanceXLInsertBuffer. Overall, I guess we need to update both the
Write and Flush local copies whenever we previously read LogwrtResult,
no? Missing to do that caused regression tests to fail and since we
were able to catch it, we ended up with v11j-0002-fixup.patch. There
might be cases where we aren't able to catch if we didn't update both
Write and Flush local copies. I'd suggest we just wrap both of these
under a macro:

     LogwrtResult.Flush = pg_atomic_read_u64(&XLogCtl->LogwrtResult.Flush);
     LogwrtResult.Write = pg_atomic_read_u64(&XLogCtl->LogwrtResult.Write);

and just use it wherever we did LogwrtResult = XLogCtl->LogwrtResult;
previously but not under spinlock.

3.
@@ -6371,7 +6373,9 @@ GetFlushRecPtr(TimeLineID *insertTLI)
 {
     Assert(XLogCtl->SharedRecoveryState == RECOVERY_STATE_DONE);

+    pg_read_barrier();
     LogwrtResult.Flush = pg_atomic_read_u64(&XLogCtl->LogwrtResult.Flush);
+    LogwrtResult.Write = pg_atomic_read_u64(&XLogCtl->LogwrtResult.Write);

Do we need a read barrier here to not reorder things when more than
one process is accessing the flush and write ptrs? If at all, a read
barrier is warranted here, we can use atomic read with full barrier
sematices as proposed here (when that's in) -
https://www.postgresql.org/message-id/20231127210030.GA140335%40nathanxps13.

4.
+    /*
+     * Update local and shared status.  This is OK to do without any locks
+     * because no other process can be reading or writing WAL yet.
+     */
     LogwrtResult.Write = LogwrtResult.Flush = EndOfLog;

+    pg_atomic_write_u64(&XLogCtl->LogwrtResult.Write, EndOfLog);
+    pg_atomic_write_u64(&XLogCtl->LogwrtResult.Flush, EndOfLog);
     XLogCtl->LogwrtRqst.Write = EndOfLog;

pg_atomic_write_u64 here seems fine to me as no other process is
active writing WAL yet, otherwise, we need write with full barrier
something like pg_write_barrier + pg_atomic_write_u64 or
pg_atomic_exchange_u64 like in LWLockUpdateVar or use atomic write
with full barrier sematices as proposed here (when that's in) -
https://www.postgresql.org/message-id/20231127210030.GA140335%40nathanxps13.

5. I guess we'd better use pg_atomic_read_u64_impl and
pg_atomic_compare_exchange_u64_impl in pg_atomic_monotonic_advance_u64
to reduce one level of function indirections. Of course, we need the
pointer alignments that pg_atomic_compare_exchange_u64 in the new
monotonic function.

6.
+ * Full barrier semantics (even when value is unchanged).

+    currval = pg_atomic_read_u64(ptr);
+    if (currval >= target_)
+    {
+        pg_memory_barrier();

Perhaps, we can use atomic read with full barrier sematices proposed
here - https://www.postgresql.org/message-id/20231127210030.GA140335%40nathanxps13

7.
+static inline void
+pg_atomic_monotonic_advance_u64(volatile pg_atomic_uint64 *ptr, uint64 target_)

Just for the sake of completeness, do we need
pg_atomic_monotonic_advance_u32 as well?

8. I'd split the addition of these new monotonic functions into 0001,
then 0002 adding XLogwrtAtomic and 0003 adding Copy to XLogwrtAtomic.

9.
+    copyptr = pg_atomic_read_u64(&XLogCtl->LogwrtResult.Copy);
+    if (startptr + count > copyptr)
+        ereport(WARNING,
+                (errmsg("request to read past end of generated WAL;
request %X/%X, current position %X/%X",
+                        LSN_FORMAT_ARGS(startptr + count),
LSN_FORMAT_ARGS(copyptr))));

Any specific reason for this to be a WARNING rather than an ERROR?

I've addressed some of my review comments (#1, #5, #7 and #8) above,
merged the v11j-0002-fixup.patch into 0002, ran pgindent. Please see
the attached v12 patch set. FWIW, CF bot is happy with these patches
and also tests with --enable-atomics=no are clean.

BTW, I couldn't find a CF entry for this, so I've registered one -
https://commitfest.postgresql.org/47/4846/.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

pgsql-hackers by date:

Previous
From: Andy Fan
Date:
Subject: Re: Shared detoast Datum proposal
Next
From: Bharath Rupireddy
Date:
Subject: Re: Introduce XID age and inactive timeout based replication slot invalidation