Re: [Patch] ALTER SYSTEM READ ONLY - Mailing list pgsql-hackers

From Robert Haas
Subject Re: [Patch] ALTER SYSTEM READ ONLY
Date
Msg-id CA+TgmoaScsuDcTYa_-pkBnVP5Xcb_vx0BX2qn6grnTt=+mmH6g@mail.gmail.com
Whole thread Raw
In response to Re: [Patch] ALTER SYSTEM READ ONLY  (Amul Sul <sulamul@gmail.com>)
Responses Re: [Patch] ALTER SYSTEM READ ONLY
List pgsql-hackers
On Mon, Sep 20, 2021 at 11:20 AM Amul Sul <sulamul@gmail.com> wrote:
> Ok, understood, I have separated my changes into 0001 and 0002 patch,
> and the refactoring patches start from 0003.

I think it would be better in the other order, with the refactoring
patches at the beginning of the series.

> In the 0001 patch, I have copied ArchiveRecoveryRequested to shared
> memory as said previously. Coping ArchiveRecoveryRequested value to
> shared memory is not really interesting, and I think somehow we should
> reuse existing variable, (perhaps, with some modification of the
> information it can store, e.g. adding one more enum value for
> SharedRecoveryState or something else), thinking on the same.
>
> In addition to that, I tried to turn down the scope of
> ArchiveRecoveryRequested global variable. Now, this is a static
> variable, and the scope is limited to xlog.c file like
> LocalXLogInsertAllowed and can be accessed through the newly added
> function ArchiveRecoveryIsRequested() (like PromoteIsTriggered()). Let
> me know what you think about the approach.

I'm not sure yet whether I like this or not, but it doesn't seem like
a terrible idea. You spelled UNKNOWN wrong, though, which does seem
like a terrible idea. :-) "acccsed" is not correct either.

Also, the new comments for ArchiveRecoveryRequested /
ARCHIVE_RECOVERY_REQUEST_* are really not very clear. All you did is
substitute the new terminology into the existing comment, but that
means that the purpose of the new "unknown" value is not at all clear.

Consider the following two patch fragments:

+ * SharedArchiveRecoveryRequested indicates whether an archive recovery is
+ * requested. Protected by info_lck.
...
+ * Remember archive recovery request in shared memory state.  A lock is not
+ * needed since we are the only ones who updating this.

These two comments directly contradict each other.

+ SpinLockAcquire(&XLogCtl->info_lck);
+ XLogCtl->SharedArchiveRecoveryRequested = false;
+ ArchiveRecoveryRequested = ARCHIVE_RECOVERY_REQUEST_UNKOWN;
+ SpinLockRelease(&XLogCtl->info_lck);

This seems odd to me. In the first place, there doesn't seem to be any
value in clearing this -- we're just expending extra CPU cycles to get
rid of a value that wouldn't be used anyway. In the second place, if
somehow someone checked the value after this point, with this code,
they might get the wrong answer, whereas if you just deleted this,
they would get the right answer.

> In 0002 patch is a mixed one where I tried to remove the dependencies
> on global variables and local variables belonging to StartupXLOG(). I
> am still worried about the InRecovery value that needs to be deduced
> afterward inside XLogAcceptWrites().  Currently, relying on
> ControlFile->state != DB_SHUTDOWNED check but I think that will not be
> good for ASRO where we plan to skip XLogAcceptWrites() work only and
> let the StartupXLOG() do the rest of the work as it is where it will
> going to update ControlFile's DBState to DB_IN_PRODUCTION, then we
> might need some ugly kludge to call PerformRecoveryXLogAction() in
> checkpointer irrespective of DBState, which makes me a bit
> uncomfortable.

I think that replacing the if (InRecovery) test with if
(ControlFile->state != DB_SHUTDOWNED) is probably just plain wrong. I
mean, there are three separate places where we set InRecovery = true.
One of those executes if ControlFile->state != DB_SHUTDOWNED, matching
what you have here, but it also can happen if checkPoint.redo <
RecPtr, or if read_backup_label is true and ReadCheckpointRecord
returns non-NULL. Now maybe you're going to tell me that in those
other two cases we can't reach here anyway, but I don't see off-hand
why that should be true, and even if it is true, it seems like kind of
a fragile thing to rely on. I think we need to rely on something in
shared memory that is more explicitly connected to the question of
whether we are in recovery.

The other part of this patch has to do with whether we can use the
return value of GetLastSegSwitchData as a substitute for relying on
EndOfLog. Now as you have it, you end up creating a local variable
called EndOfLog that shadows another such variable in an outer scope,
which probably would not make anyone who noticed things in such a
state very happy. However, that will naturally get fixed if you
reorder the patches as per above, so let's turn to the central
question: is this a good way of getting EndOfLog? The value that would
be in effect at the time this code is executed is set here:

    XLogBeginRead(xlogreader, LastRec);
    record = ReadRecord(xlogreader, PANIC, false);
    EndOfLog = EndRecPtr;

Subsequently we do this:

    /* start the archive_timeout timer and LSN running */
    XLogCtl->lastSegSwitchTime = (pg_time_t) time(NULL);
    XLogCtl->lastSegSwitchLSN = EndOfLog;

So at that point the value that GetLastSegSwitchData() would return
has to match what's in the existing variable. But later XLogWrite()
will change the value. So the question boils down to whether
XLogWrite() could have been called between the assignment just above
and when this code runs. And the answer seems to pretty clear be yes,
because just above this code, we might have done
CreateEndOfRecoveryRecord() or RequestCheckpoint(), and just above
that, we did UpdateFullPageWrites(). So I don't think this is right.

> > (3) CheckpointStats, which is called from RemoveXlogFile which is
> > called from RemoveNonParentXlogFiles which is called from
> > CleanupAfterArchiveRecovery which is called from XLogAcceptWrites.
> > This last one is actually pretty weird already in the existing code.
> > It sort of looks like RemoveXlogFile() only expects to be called from
> > the checkpointer (or a standalone backend) so that it can update
> > CheckpointStats and have that just work, but actually it's also called
> > from the startup process when a timeline switch happens. I don't know
> > whether the fact that the increments to ckpt_segs_recycled get lost in
> > that case should be considered an intentional behavior that should be
> > preserved or an inadvertent mistake. >
>
> Maybe I could be wrong, but I think that is intentional.  It removes
> pre-allocated or bogus files of the old timeline which are not
> supposed to be considered in stats. The comments for
> CheckpointStatsData might not be clear but comment at the calling
> RemoveNonParentXlogFiles() place inside StartupXLOG hints the same:

Sure, I'm not saying the files are being removed by accident. I'm
saying it may be accidental that the removals are (I think) not going
to make it into the stats.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: relation OID in ReorderBufferToastReplace error message
Next
From: Tom Lane
Date:
Subject: Recent cpluspluscheck failures