Re: [Patch] ALTER SYSTEM READ ONLY - Mailing list pgsql-hackers

From Rushabh Lathia
Subject Re: [Patch] ALTER SYSTEM READ ONLY
Date
Msg-id CAGPqQf2n0aeeKMv+mA0zpgkqkamEVMCZE_VL74hxCe9hSNOQNg@mail.gmail.com
Whole thread Raw
In response to Re: [Patch] ALTER SYSTEM READ ONLY  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: [Patch] ALTER SYSTEM READ ONLY  (Amul Sul <sulamul@gmail.com>)
List pgsql-hackers


On Fri, Oct 1, 2021 at 2:29 AM Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Sep 30, 2021 at 7:59 AM Amul Sul <sulamul@gmail.com> wrote:
> To find the value of InRecovery after we clear it, patch still uses
> ControlFile's DBState, but now the check condition changed to a more
> specific one which is less confusing.
>
> In casual off-list discussion, the point was made to check
> SharedRecoveryState to find out the InRecovery value afterward, and
> check that using RecoveryInProgress().  But we can't depend on
> SharedRecoveryState because at the start it gets initialized to
> RECOVERY_STATE_CRASH irrespective of InRecovery that happens later.
> Therefore, we can't use RecoveryInProgress() which always returns
> true if SharedRecoveryState != RECOVERY_STATE_DONE.

Uh, this change has crept into 0002, but it should be in 0004 with the
rest of the changes to remove dependencies on variables specific to
the startup process. Like I said before, we should really be trying to
separate code movement from functional changes. Also, 0002 doesn't
actually apply for me. Did you generate these patches with 'git
format-patch'?

[rhaas pgsql]$ patch -p1 <
~/Downloads/v36-0001-Refactor-some-end-of-recovery-code-out-of-Startu.patch
patching file src/backend/access/transam/xlog.c
Hunk #1 succeeded at 889 (offset 9 lines).
Hunk #2 succeeded at 939 (offset 12 lines).
Hunk #3 succeeded at 5734 (offset 37 lines).
Hunk #4 succeeded at 8038 (offset 70 lines).
Hunk #5 succeeded at 8248 (offset 70 lines).
[rhaas pgsql]$ patch -p1 <
~/Downloads/v36-0002-Postpone-some-end-of-recovery-operations-relatin.patch
patching file src/backend/access/transam/xlog.c
Reversed (or previously applied) patch detected!  Assume -R? [n]
Apply anyway? [n] y
Hunk #1 FAILED at 7954.
Hunk #2 succeeded at 8079 (offset 70 lines).
1 out of 2 hunks FAILED -- saving rejects to file
src/backend/access/transam/xlog.c.rej
[rhaas pgsql]$ git reset --hard
HEAD is now at b484ddf4d2 Treat ETIMEDOUT as indicating a
non-recoverable connection failure.
[rhaas pgsql]$ patch -p1 <
~/Downloads/v36-0002-Postpone-some-end-of-recovery-operations-relatin.patch
patching file src/backend/access/transam/xlog.c
Reversed (or previously applied) patch detected!  Assume -R? [n]
Apply anyway? [n]
Skipping patch.
2 out of 2 hunks ignored -- saving rejects to file
src/backend/access/transam/xlog.c.rej


I tried to apply the patch on the master branch head and it's failing
with conflicts.

Later applied patch on below commit and it got applied cleanly:

commit 7d1aa6bf1c27bf7438179db446f7d1e72ae093d0
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date:   Mon Sep 27 18:48:01 2021 -0400

    Re-enable contrib/bloom's TAP tests.
   
rushabh@rushabh:postgresql$ git apply v36-0001-Refactor-some-end-of-recovery-code-out-of-Startu.patch
rushabh@rushabh:postgresql$ git apply v36-0002-Postpone-some-end-of-recovery-operations-relatin.patch
rushabh@rushabh:postgresql$ git apply v36-0003-Create-XLogAcceptWrites-function-with-code-from-.patch
v36-0003-Create-XLogAcceptWrites-function-with-code-from-.patch:34: space before tab in indent.
  /*
v36-0003-Create-XLogAcceptWrites-function-with-code-from-.patch:38: space before tab in indent.
  */
v36-0003-Create-XLogAcceptWrites-function-with-code-from-.patch:39: space before tab in indent.
  Insert->fullPageWrites = lastFullPageWrites;
warning: 3 lines add whitespace errors.
rushabh@rushabh:postgresql$ git apply v36-0004-Remove-dependencies-on-startup-process-specifica.patch
 
There are whitespace errors on patch 0003.
 
It seems to me that the approach you're pursuing here can't work,
because the long-term goal is to get to a place where, if the system
starts up read-only, XLogAcceptWrites() might not be called until
later, after StartupXLOG() has exited. But in that case the control
file state would be DB_IN_PRODUCTION. But my idea of using
RecoveryInProgress() won't work either, because we set
RECOVERY_STATE_DONE just after we set DB_IN_PRODUCTION. Put
differently, the question we want to answer is not "are we in recovery
now?" but "did we perform recovery?". After studying the code a bit, I
think a good test might be
!XLogRecPtrIsInvalid(XLogCtl->lastReplayedEndRecPtr). If InRecovery
gets set to true, then we're certain to enter the if (InRecovery)
block that contains the main redo loop. And that block unconditionally
does XLogCtl->lastReplayedEndRecPtr = XLogCtl->replayEndRecPtr. I
think that replayEndRecPtr can't be 0 because it's supposed to
represent the record we're pretending to have last replayed, as
explained by the comments. And while lastReplayedEndRecPtr will get
updated later as we replay more records, I think it will never be set
back to 0. It's only going to increase, as we replay more records. On
the other hand if InRecovery = false then we'll never change it, and
it seems that it starts out as 0.

I was hoping to have more time today to comment on 0004, but the day
seems to have gotten away from me. One quick thought is that it looks
a bit strange to be getting EndOfLog from GetLastSegSwitchData() which
returns lastSegSwitchLSN while getting EndOfLogTLI from replayEndTLI
... because there's also replayEndRecPtr, which seems to go with
replayEndTLI. It feels like we should use a source for the TLI that
clearly matches the source for the corresponding LSN, unless there's
some super-good reason to do otherwise.

--
Robert Haas
EDB: http://www.enterprisedb.com




--
Rushabh Lathia

pgsql-hackers by date:

Previous
From: Bharath Rupireddy
Date:
Subject: replace InvalidXid(a macro that doesn't exist) with InvalidTransactionId(a macro that exists) in code comments
Next
From: Michael Paquier
Date:
Subject: Re: Incorrect snapshots while promoting hot standby node when 2PC is used