Re: .ready and .done files considered harmful - Mailing list pgsql-hackers

From Robert Haas
Subject Re: .ready and .done files considered harmful
Date
Msg-id CA+Tgmoaq5eBFoj--Xn92Mz5H+YomkE5xyv-_ETB1L-P7BjRCDA@mail.gmail.com
Whole thread Raw
In response to Re: .ready and .done files considered harmful  (Dipesh Pandit <dipesh.pandit@gmail.com>)
Responses Re: .ready and .done files considered harmful  (Dipesh Pandit <dipesh.pandit@gmail.com>)
List pgsql-hackers
On Tue, Jul 27, 2021 at 3:43 AM Dipesh Pandit <dipesh.pandit@gmail.com> wrote:
> and updated a new patch. Please find the attached patch v4.

Some review:

        /*
+        * If archiver is active, send notification that timeline has switched.
+        */
+       if (XLogArchivingActive() && ArchiveRecoveryRequested &&
+               IsUnderPostmaster)
+               PgArchNotifyTLISwitch();

There are a few other places in xlog.c that are conditional on
XLogArchivingActive(), but none of them test ArchiveRecoveryRequested
or IsUnderPostmaster. It appears to me that PgArchStartupAllowed()
controls whether the archiver runs, and that's not contingent on
ArchiveRecoveryRequested and indeed couldn't be, since it's running in
the postmaster where that variable wouldn't be initialized. So why do
we care about ArchiveRecoveryRequested here? This is not entirely a
rhetorical question; maybe there's some reason we should care. If so,
the comment ought to mention it. If not, the test should go away.

IsUnderPostmaster does make a difference, but I think that test could
be placed inside PgArchNotifyTLISwitch() rather than putting it here
in StartupXLOG(). In fact, I think the test could be removed entirely,
since if PgArchNotifyTLISwitch() is called in single-user mode, it
will presumably just discover that arch_pgprocno == INVALID_PGPROCNO,
so it will simply do nothing even without the special-case code.

+       pqsignal(SIGINT, pgarch_timeline_switch);

I don't think it's great that we're using up SIGINT for this purpose.
There aren't that many signals available at the O/S level that we can
use for our purposes, and we generally try to multiplex them at the
application layer, e.g. by setting a latch or a flag in shared memory,
rather than using a separate signal. Can we do something of that sort
here? Or maybe we don't even need a signal. ThisTimeLineID is already
visible in shared memory, so why not just have the archiver just check
and see whether it's changed, say via a new accessor function
GetCurrentTimeLineID()? I guess there could be a concern about the
expensive of that, because we'd probably be taking a spinlock or an
lwlock for every cycle, but I don't think it's probably that bad,
because I doubt we can archive much more than a double-digit number of
files per second even with a very fast archive_command, and contention
on a lock generally requires a five digit number of acquisitions per
second. It would be worth testing to see if we can see a problem here,
but I'm fairly hopeful that it's not an issue. If we do feel that it's
important to avoid repeatedly taking a lock, let's see if we can find
a way to do it without dedicating a signal to this purpose.

+        *
+        * "nextLogSegNo" identifies the next log file to be archived in a log
+        * sequence and the flag "dirScan" specifies a full directory
scan to find
+        * the next log file.
         */
-       while (pgarch_readyXlog(xlog))
+       while (pgarch_readyXlog(xlog, &dirScan, &nextLogSegNo))

I do not like this very much. dirScan and nextLogSegNo aren't clearly
owned either by pgarch_ArchiverCopyLoop() or by pgarch_readyXlog(),
since both functions modify both variables, in each case
conditionally, while also relying on the way that the other function
manipulates them. Essentially these are global variables in disguise.
There's a third, related variable too, which is handled differently:

+       static TimeLineID curFileTLI = 0;

This is really the same kind of thing as the other two, but because
pgarch_readyXlog() happens not to need this one, you just made it
static inside pgarch_readyXlog() instead of passing it back and forth.

The problem with all this is that you can't understand either function
in isolation. Unless you read them both together and look at all of
the ways these three variables are manipulated, you can't really
understand the logic. And there's really no reason why that needs to
be true. The job of cleaning timeline_switch and setting dirScan could
be done entirely within pgarch_readyXlog(), and so could the job of
incrementing nextLogSegNo, because we're not going to again call
pgarch_readyXlog() unless archiving succeeded.

Also note that the TLI which is stored in curFileTLI corresponds to
the segment number stored in nextLogSegNo, yet one of them has "cur"
for "current" in the name and the other has "next". It would be easier
to read the code if the names were chosen more consistently.

My tentative idea as to how to clean this up is: declare a new struct
with a name like readyXlogState and members lastTLI and lastSegNo.
Have pgarch_ArchiverCopyLoop() declare a variable of this type, zero
it, pass it as a parameter to pgarch_readyXlog(), and otherwise leave
it alone. Then let pgarch_readyXlog() do all of the manipulation of
the values stored therein.

+       /*
+        * Fall-back to directory scan
+        *
+        * open xlog status directory and read through list of xlogs
that have the
+        * .ready suffix, looking for earliest file. It is possible to optimise
+        * this code, though only a single file is expected on the vast majority
+        * of calls, so....
+        */

You've moved this comment from its original location, but the trouble
is that the comment is 100% false. In fact, the whole reason why you
wrote this patch is *because* this comment is 100% false. In fact it
is not difficult to create cases where each scan finds many files, and
the purpose of the patch is precisely to optimize the code that the
person who wrote this thought didn't need optimizing. Now it may take
some work to figure out what we want to say here exactly, but
preserving the comment as it's written here is certainly misleading.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: "Bossart, Nathan"
Date:
Subject: Re: Use WaitLatch for {pre, post}_auth_delay instead of pg_usleep
Next
From: Mark Dilger
Date:
Subject: Why does the owner of a publication need CREATE privileges on the database?