On Wed, Sep 19, 2018 at 02:20:34PM +0900, Michael Paquier wrote:
> On Tue, Sep 18, 2018 at 04:15:42PM +0900, Kyotaro HORIGUCHI wrote:
>> My latest patch tries to remove the window by imposing all
>> responsibility to apply config file changes to the shared FPW
>> flag on the checkpointer. RecoveryInProgress() is changed to be
>> called prior to UpdateFullPageWrites on the way doing that.
>
> I still need to look at that in details. That may be better than what I
> am proposing. At quick glance what I propose is more simple, and does
> not need enforcing a checkpoint using SIGINT.
I have finally looked at the patch set from Horiguchi-san here:
https://www.postgresql.org/message-id/20180828.193436.253621888.horiguchi.kyotaro@lab.ntt.co.jp
And actually this is very close to what my proposal does, except for a
couple of caveats. Here are my notes:
1) The issue with palloc() happening in critical sections is able to go
away, by making the logic behind UpdateFullPageWrites() more complicated
than necessary. With the proposed patch, UpdateFullPageWrites() never
gets called by the checkpointer until the startup process has done its
business. I would have believed that keeping the check to
RecoveryInProgress() directly in UpdateFullPageWrites() makes the logic
more simple. That's actually what my proposal does. With your patch,
it would be even possible to add an assertion so as this never gets
called while in recovery.
2) At the end of recovery, if there is a crash before the checkpointer
is able to update the shared parameters it needs to work on
away, then no XLOG_CHANGE_PARAMETER record is generated. This is not a
problem for normal cases, but there is one scenario where this is a
problem: at the end of recovery if the bgwriter is not started, then the
startup process creates a checkpoint by itself, and it would miss the
record generation.
3) SIGINT is abused of, in such a way that the checkpointer may generate
two checkpoints where only one is needed post-recovery.
4) SyncRepUpdateSyncStandbysDefined() would get called even without
SIGHUP being reached. This feels also like a future trap waiting to
bite as well.
When I see your patch, I actually see the same kind of logic as what I
propose which is summarized in two points, and that's a good thing:
a) Avoid the allocation in the critical section.
b) Avoid two processes to call UpdateFullPageWrites at the same time.
Now the points mentioned above make what you are proposing weaker in my
opinion, and 2) is an actual bug, side effect of the proposed patch.
--
Michael