On Mon, Oct 30, 2023 at 3:09 AM Michael Paquier <michael@paquier.xyz> wrote:
> I have been reviewing the patch, and applied portions of it as of
> dc5bd388 and 1ffdc03c and they're quite independent pieces. After
> that, the remaining bits of the patch to change the behavior is now
> straight-forward. I have written a commit message for it, while on
> it, as per the attached.
I would encourage some caution here.
In a vacuum, I'm in favor of this, and for the same reasons as you,
namely, that the huge pile of Booleans that we use to control recovery
is confusing, and it's difficult to make sure that all the code paths
are adequately tested, and I think some of the things that actually
work here are not documented.
But in practice, I think there is a possibility of something like this
backfiring very hard. Notice that the first two people who commented
on the thread saw the error and immediately removed backup_label even
though that's 100% wrong. It shows how utterly willing users are to
remove backup_label for any reason or no reason at all. If we convert
cases where things would have worked into cases where people nuke
backup_label and then it appears to work, we're going to be worse off
in the long run, no matter how crazy the idea of removing backup_label
may seem to us.
Also, Andres just recently mentioned to me that he uses this procedure
of starting a server with a backup_label but no recovery.signal or
standby.signal file regularly, and thinks other people do too. I was
surprised, since I've never done that, except maybe when I was a noob
and didn't have a clue. But Andres is far from a noob.
--
Robert Haas
EDB: http://www.enterprisedb.com