Re: [Patch] ALTER SYSTEM READ ONLY - Mailing list pgsql-hackers

From Robert Haas
Subject Re: [Patch] ALTER SYSTEM READ ONLY
Date
Msg-id CA+TgmoYz9Cx=hFsbG1V5P6-UmF6hcJM5HLd9EW+4HRro5kWRwg@mail.gmail.com
Whole thread Raw
In response to Re: [Patch] ALTER SYSTEM READ ONLY  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: [Patch] ALTER SYSTEM READ ONLY
Re: [Patch] ALTER SYSTEM READ ONLY
List pgsql-hackers
On Thu, Jun 18, 2020 at 5:55 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> For buffer replacement, many-a-times we have to also perform
> XLogFlush, what do we do for that?  We can't proceed without doing
> that and erroring out from there means stopping read-only query from
> the user perspective.

I think we should stop WAL writes, then XLogFlush() once, then declare
the system R/O. After that there might be more XLogFlush() calls but
there won't be any new WAL, so they won't do anything.

> > But there's no reason for the checkpointer to do it: it shouldn't try
> > to checkpoint, and therefore it shouldn't write dirty pages either.
>
> What is the harm in doing the checkpoint before we put the system into
> READ ONLY state?  The advantage is that we can at least reduce the
> recovery time if we allow writing checkpoint record.

Well, as Andres says in
http://postgr.es/m/20200617180546.yucxtiupvxghxss6@alap3.anarazel.de
it can take a really long time.

> > Interesting question. I was thinking that we should probably teach the
> > autovacuum launcher to stop launching workers while the system is in a
> > READ ONLY state, but what about existing workers? Anything that
> > generates invalidation messages, acquires an XID, or writes WAL has to
> > be blocked in a read-only state; but I'm not sure to what extent the
> > first two of those things would be a problem for vacuuming an unlogged
> > table. I think you couldn't truncate it, at least, because that
> > acquires an XID.
> >
>
> If the truncate operation errors out, then won't the system will again
> trigger a new autovacuum worker for the same relation as we update
> stats at the end?

Not if we do what I said in that paragraph. If we're not launching new
workers we can't again trigger a worker for the same relation.

> Also, in general for regular tables, if there is an
> error while it tries to WAL, it could again trigger the autovacuum
> worker for the same relation.  If this is true then unnecessarily it
> will generate a lot of dirty pages and don't think it will be good for
> the system to behave that way?

I don't see how this would happen. VACUUM can't really dirty pages
without writing WAL, can it? And, anyway, if there's an error, we're
not going to try again for the same relation unless we launch new
workers.

> > What I think should happen is that the end-of-recovery checkpoint
> > should be skipped, and then if the system is put back into read-write
> > mode later we should do it then.
>
> But then if we have to perform recovery again, it will start from the
> previous checkpoint.  I think we have to live with it.

Yeah. I don't think it's that bad. The case where you shut down the
system while it's read-only should be a somewhat unusual one. Normally
you would mark it read only and then promote a standby and shut the
old master down (or demote it). But what you want is that if it does
happen to go down for some reason before all the WAL is streamed, you
can bring it back up and finish streaming the WAL without generating
any new WAL.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: "Winfield, Steven"
Date:
Subject: Mark btree_gist functions as PARALLEL SAFE
Next
From: Robert Haas
Date:
Subject: Re: [Patch] ALTER SYSTEM READ ONLY