Re: [Patch] ALTER SYSTEM READ ONLY - Mailing list pgsql-hackers

From Robert Haas
Subject Re: [Patch] ALTER SYSTEM READ ONLY
Date
Msg-id CA+TgmobHeJ0Bg0QmgTtpO2AvmJOQOgafHf__UNcCQKA2ybyYQw@mail.gmail.com
Whole thread Raw
In response to Re: [Patch] ALTER SYSTEM READ ONLY  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [Patch] ALTER SYSTEM READ ONLY
Re: [Patch] ALTER SYSTEM READ ONLY
Re: [Patch] ALTER SYSTEM READ ONLY
Re: [Patch] ALTER SYSTEM READ ONLY
List pgsql-hackers
On Wed, Jun 17, 2020 at 10:58 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Aside from the points you mention, such a switch would break autovacuum.
> It would break the ability for scans to do HOT-chain cleanup, which would
> likely lead to some odd behaviors (if, eg, somebody flips the switch
> between where that's supposed to happen and where an update needs to
> happen on the same page).  It would break the ability for indexscans to do
> killed-tuple marking, which is critical for performance in some scenarios.
> It would break the ability to set tuple hint bits, which is even more
> critical for performance.  It'd possibly break, or at least complicate,
> logic in index AMs to deal with index format updates --- I'm fairly sure
> there are places that will try to update out-of-date data structures
> rather than cope with the old structure, even in nominally read-only
> searches.

This seems like pretty dubious hand-waving. Of course, things that
write WAL are going to be broken by a switch that prevents writing
WAL; but if they were not, there would be no purpose in having such a
switch, so that's not really an argument. But you seem to have mixed
in some things that don't require writing WAL, and claimed without
evidence that those would somehow also be broken. I don't think that's
the case, but even if it were, so what? We live with all of these
restrictions on standbys anyway.

> I also think that putting such a thing into ALTER SYSTEM has got big
> logical problems.  Someday we will probably want to have ALTER SYSTEM
> write WAL so that standby servers can absorb the settings changes.
> But if writing WAL is disabled, how can you ever turn the thing off again?

I mean, the syntax that we use for a feature like this is arbitrary. I
picked this one, so I like it, but it can easily be changed if other
people want something else. The rest of this argument doesn't seem to
me to make very much sense. The existing ALTER SYSTEM functionality to
modify a text configuration file isn't replicated today and I'm not
sure why we should make it so, considering that replication generally
only considers things that are guaranteed to be the same on the master
and the standby, which this is not. But even if we did, that has
nothing to do with whether some functionality that changes the system
state without changing a text file ought to also be replicated. This
is a piece of cluster management functionality and it makes no sense
to replicate it. And no right-thinking person would ever propose to
change a feature that renders the system read-only in such a way that
it was impossible to deactivate it. That would be nuts.

> Lastly, the arguments in favor seem pretty bogus.  HA switchover normally
> involves just killing the primary server, not expecting that you can
> leisurely issue some commands to it first.

Yeah, that's exactly the problem I want to fix. If you kill the master
server, then you have interrupted service, even for read-only queries.
That sucks. Also, even if you don't care about interrupting service on
the master, it's actually sorta hard to guarantee a clean switchover.
The walsenders are supposed to send all the WAL from the master before
exiting, but if the connection is broken for some reason, then the
master is down and the standbys can't stream the rest of the WAL. You
can start it up again, but then you might generate more WAL. You can
try to copy the WAL around manually from one pg_wal directory to
another, but that's not a very nice thing for users to need to do
manually, and seems buggy and error-prone.

And how do you figure out where the WAL ends on the master and make
sure that the standby replayed it all? If the master is up, it's easy:
you just use the same queries you use all the time. If the master is
down, you have to use some different technique that involves manually
examining files or scrutinizing pg_controldata output. It's actually
very difficult to get this right.

> Commands that involve a whole
> bunch of subtle interlocking --- and, therefore, aren't going to work if
> anything has gone wrong already anywhere in the server --- seem like a
> particularly poor thing to be hanging your HA strategy on.

It's important not to conflate controlled switchover with failover.
When there's a failover, you have to accept some risk of data loss or
service interruption; but a controlled switchover does not need to
carry the same risks and there are plenty of systems out there where
it doesn't.

> I also wonder
> what this accomplishes that couldn't be done much more simply by killing
> the walsenders.

Killing the walsenders does nothing ... the clients immediately reconnect.

> In short, I see a huge amount of complexity here, an ongoing source of
> hard-to-identify, hard-to-fix bugs, and not very much real usefulness.

I do think this is complex and the risk of bugs that are hard to
identify or hard to fix certainly needs to be considered. I
strenuously disagree with the idea that there is not very much real
usefulness. Getting failover set up in a way that actually works
robustly is, in my experience, one of the two or three most serious
challenges my employer's customers face today. The core server support
we provide for that is breathtakingly primitive, and it's urgent that
we do better. Cloud providers are moving users from PostgreSQL to
their own forks of PostgreSQL in vast numbers in large part because
users don't want to deal with this crap, and the cloud providers have
made it so they don't have to. People running PostgreSQL themselves
need complex third-party tools and even then the experience isn't as
good as what a major cloud provider would offer. This patch is not
going to fix that, but I think it's a step in the right direction, and
I hope others will agree.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: improving basebackup.c's file-reading code
Next
From: Andrew Dunstan
Date:
Subject: Re: language cleanups in code and docs