Re: archive status ".ready" files may be created too early - Mailing list pgsql-hackers

From Kyotaro Horiguchi
Subject Re: archive status ".ready" files may be created too early
Date
Msg-id 20210803.113728.1879643826551343338.horikyota.ntt@gmail.com
Whole thread Raw
In response to Re: archive status ".ready" files may be created too early  ("Bossart, Nathan" <bossartn@amazon.com>)
Responses Re: archive status ".ready" files may be created too early  ("Bossart, Nathan" <bossartn@amazon.com>)
List pgsql-hackers
At Mon, 2 Aug 2021 23:28:19 +0000, "Bossart, Nathan" <bossartn@amazon.com> wrote in 
> On 8/2/21, 2:42 PM, "Alvaro Herrera" <alvherre@alvh.no-ip.org> wrote:
> > I think it's okay to make lastNotifiedSeg protected by just info_lck,
> > and RecordBoundaryMap protected by just ArchNotifyLock.  It's okay to
> > acquire the spinlock inside the lwlock-protected area, as long as we
> > make sure never to do the opposite.  (And we sure don't want to hold
> > info_lck long enough that a LWLock acquisition would occur in the
> > meantime).  So I modified things that way, and also added another
> > function to set the seg if it's unset, with a single spinlock
> > acquisition (rather than acqquire, read, release, acqquire, set,
> > release, which sounds like it would have trouble behaving.)
> 
> The patch looks good to me.

+    for (seg = flushed_seg; seg > last_notified; seg--)
+    {
+        RecordBoundaryEntry *entry;
+
+        entry = (RecordBoundaryEntry *) hash_search(RecordBoundaryMap,
+                                                    (void *) &seg, HASH_FIND,

I'm afraid that using hash to store boundary info is too much. Isn't a
ring buffer enough for this use?  In that case it is enough to
remember only the end LSN of the segment spanning records.  It is easy
to expand the buffer if needed.

+    if (!XLogSegNoIsInvalid(latest_boundary_seg))

It is a matter of taste, but I see latest_boundary_seg !=
InvalidXLogSegNo more frequentlyl, maybe to avoid double negation.


@@ -1167,10 +1195,33 @@ XLogInsertRecord(XLogRecData *rdata,
         SpinLockRelease(&XLogCtl->info_lck);
     }
 
+    /*
+     * Record the record boundary if we crossed the segment boundary.  This is
...
+    XLByteToSeg(StartPos, StartSeg, wal_segment_size);
+    XLByteToSeg(EndPos, EndSeg, wal_segment_size);
+
+    if (StartSeg != EndSeg && XLogArchivingActive())
+    {

The immediately prceding if block is for cross-page records. So we can
reduce the overhaed by the above calculations by moving it to the
preceding if-block.


+RegisterRecordBoundaryEntry(XLogSegNo seg, XLogRecPtr pos)

The seg is restricted to the segment that pos resides on.  The caller
is free from caring that restriction if the function takes only pos.
It adds a small overhead to calculate segment number from the LSN but
I think it doesn't matter so much. (Or if we don't use hash, that
calculation is not required at all).


@@ -2602,7 +2855,7 @@ XLogWrite(XLogwrtRqst WriteRqst, bool flexible)
                 LogwrtResult.Flush = LogwrtResult.Write;    /* end of page */
 
                 if (XLogArchivingActive())
-                    XLogArchiveNotifySeg(openLogSegNo);
+                    SetLastNotifiedSegmentIfInvalid(openLogSegNo - 1);

Is it safe?  If server didn't notified of WAL files for recent 3
finished segments in the previous server life, they need to be
archived this life time. But this omits maybe all of the tree.
(I didn't confirm that behavior..)

> > I find it highly suspicious that the patch does an archiver notify (i.e.
> > creation of the .ready file) in XLogInsertRecord().  Is that a sane
> > thing to do?  Sounds to me like that should be attempted in XLogFlush
> > only.  This appeared after Kyotaro's patch at [1] and before your patch
> > at [2].
> 
> I believe my worry was that we'd miss notifying a segment as soon as
> possible if the record was somehow flushed prior to registering the
> record boundary in the map.  If that's actually impossible, then I
> would agree that the extra call to NotifySegmentsReadyForArchive() is
> unnecessary.

I don't think that XLogWrite(up to LSN=X) can happen before
XLogInsert(endpos = X) ends.

> >> I think the main downside of making lastNotifiedSeg an atomic is that
> >> the value we first read in NotifySegmentsReadyForArchive() might not
> >> be up-to-date, which means we might hold off creating .ready files
> >> longer than necessary.
> >
> > I'm not sure I understand how this would be a problem.  If we block
> > somebody from setting a newer value, they'll just set the value
> > immediately after we release the lock.  Will we reread the value
> > afterwards to see if it changed?
> 
> I think you are right.  If we see an old value for lastNotifiedSeg,
> the worst case is that we take the ArchNotifyLock, read
> lastNotifiedSeg again (which should then be up-to-date), and then

Agreed.

> basically do nothing.  I suspect initializing lastNotifiedSeg might
> still be a little tricky, though.  Do you think it is important to try
> to avoid this spinlock for lastNotifiedSeg?  IIUC it's acquired at the
> end of every call to XLogWrite() already, and we'd still need to
> acquire it for the flush pointer, anyway.

As mentioned above, I think it needs more cosideration.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Dividing line between before_shmem_exit and on_shmem_exit?
Next
From: Soumyadeep Chakraborty
Date:
Subject: Re: A micro-optimisation for ProcSendSignal()