Re: .ready and .done files considered harmful - Mailing list pgsql-hackers

From Bossart, Nathan
Subject Re: .ready and .done files considered harmful
Date
Msg-id 03826EAC-E5AE-40AB-B56B-8BEED1F5E534@amazon.com
Whole thread Raw
In response to Re: .ready and .done files considered harmful  (Dipesh Pandit <dipesh.pandit@gmail.com>)
Responses Re: .ready and .done files considered harmful
List pgsql-hackers
On 8/25/21, 4:11 AM, "Dipesh Pandit" <dipesh.pandit@gmail.com> wrote:
> An alternate approach could be to force a directory scan at checkpoint to
> break the infinite wait for a .ready file which is being missed due to the 
> fact that it is created out of order. This will make sure that the file
> gets archived within the checkpoint boundaries.

I think this is a good idea.

> Please find attached patch v11.

Thanks for the new version of the patch.

+    /*
+     * History files or a .ready file created out of order requires archiver to
+     * perform a full directory scan.
+     */
+    if (IsTLHistoryFileName(xlog) || IsBackupHistoryFileName(xlog) ||
+            fileOutOfOrder)
+        PgArchEnableDirScan();

I think we should force a directory scan for everything that isn't a
regular WAL file.  IOW we can use !IsXLogFileName(xlog) instead of
enumerating all the different kinds of files we might want to archive.

+    /*
+     * Segment number of the most recent .ready file found by archiver,
+     * protected by WALArchiveLock.
+     */
+    XLogSegNo    lastReadySegNo;
 } PgArchData;
 
+/*
+ * Segment number and timeline ID to identify the next file in a WAL sequence
+ */
+typedef struct readyXLogState
+{
+    XLogSegNo    lastSegNo;
+    TimeLineID    lastTLI;
+} readyXLogState;

lastSegNo and lastReadySegNo appear to be the same thing.  Couldn't we
just use the value in PgArchData?

+    return (curSegNo < lastSegNo) ? true : false;

I think this needs to be <=.  If the two values are equal,
pgarch_readyXlog() may have just completed a directory scan and might
be just about to set PgArch->lastSegNo to a greater value.

+    LWLockAcquire(WALArchiveLock, LW_EXCLUSIVE);
+    PgArch->lastReadySegNo = segNo;
+    LWLockRelease(WALArchiveLock);

IMO we should just use a spinlock instead of introducing a new LWLock.
It looks like you really only need the lock for a couple of simple
functions.  I still think protecting PgArch->dirScan with a spinlock
is a good idea, if for no other reason than it makes it easier to
reason about this logic.

+        if (stat(xlogready, &st) == 0)

I think we should ERROR if stat() fails for any other reason than
ENOENT.

+        ishistory = IsTLHistoryFileName(basename) ||
+            IsBackupHistoryFileName(basename);

I suspect we still want to prioritize timeline history files over
backup history files.  TBH I find the logic below this point for
prioritizing history files to be difficult to follow, and I think we
should refactor it into some kind of archive priority comparator
function.

+            /*
+             * Reset the flag only when we found a regular WAL file to make
+             * sure that we are done with processing history files.
+             */
+            PgArch->dirScan = false;

I think we really want to unset dirScan before we start the directory
scan, and then we set it to true afterwards if we didn't find a
regular WAL file.  If someone requests a directory scan in the middle
of an ongoing directory scan, we don't want to lose that request.

I attached two patches that demonstrate what I'm thinking this change
should look like.  One is my take on the keep-trying-the-next-file
approach, and the other is a new version of the multiple-files-per-
readdir approach (with handling for "cheating" archive commands).  I
personally feel that the multiple-files-per-readdir approach winds up
being a bit cleaner and more resilient than the keep-trying-the-next-
file approach.  However, the keep-trying-the-next-file approach will
certainly be more efficient (especially for the extreme cases
discussed in this thread), and I don't have any concrete concerns with
this approach that seem impossible to handle.

Nathan


Attachment

pgsql-hackers by date:

Previous
From: Jaime Casanova
Date:
Subject: Re: Numeric x^y for negative x
Next
From: Michael Paquier
Date:
Subject: Re: Postgres Win32 build broken?