Re: Problem while setting the fpw with SIGHUP - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: Problem while setting the fpw with SIGHUP
Date
Msg-id 20180919052034.GI1650@paquier.xyz
Whole thread Raw
In response to Re: Problem while setting the fpw with SIGHUP  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
Responses Re: Problem while setting the fpw with SIGHUP
Re: Problem while setting the fpw with SIGHUP
List pgsql-hackers
On Tue, Sep 18, 2018 at 04:15:42PM +0900, Kyotaro HORIGUCHI wrote:
> My latest patch tries to remove the window by imposing all
> responsibility to apply config file changes to the shared FPW
> flag on the checkpointer. RecoveryInProgress() is changed to be
> called prior to UpdateFullPageWrites on the way doing that.

I still need to look at that in details.  That may be better than what I
am proposing.  At quick glance what I propose is more simple, and does
not need enforcing a checkpoint using SIGINT.

> InRecoery is turned off after the last UpdateFullPageWrite() and
> before SharedRecoveryInProgress is turned off. So it is still
> leaving the window. Thus, checkpointer stops flipping the value
> before SharedRecoveryInProgress's turning off. (I don't
> understand InRecovery condition..) However, this lets
> checkpointer omit reloading after UpdateFullPagesWrite() in
> startup until SharedRecoveryInProgress is tunred off.

That won't matter in this case, as RecoveryInProgress() gets called out
of the critical section in the previous patch I sent, so there is no
failure.  Let's not forget that the first issue is the allocation in the
critical section.  The second issue is that UpdateFullPageWrites may be
called concurrently across multiple processes, which is not something it
is designed for.  The second issue gets addressed in my proposal my
making sure that the checkpointer never tries to update full_page_writes
by himself until recovery has finished, and that the startup process is
the only updater once recovery ends.

> Agreed. "If we need to do that in the start process," we need to
> change the shared flag and issue FPW_CHANGE always when the
> database state is different from configuration file, regardless
> of what StartXLOG() did until the point.

Definitely my mistake here.  Attached is a patch to show what I have in
mind by making sure that the startup process generates a record even
after switching full_page_writes when started normally.  This removes
the condition based on InRecovery, and uses a new argument for
UpdateFullPageWrites() instead.  Your test case,as well as what I do
manually for testing, pass without triggering the assertion.

+   /* DEBUG: cause a reload */
+   {
+       struct stat b;
+       if (stat("/tmp/hoge", &b) == 0)
+       {
+           elog(LOG, "STARTUP SLEEP FOR 1s");
+           sleep(1);
+           elog(LOG, "DONE.");
+           DirectFunctionCall1(pg_reload_conf, 0);
+       }
+   }
The way you patch the backend this way is always nice to see so as it is
easy to reproduce bugs, and it actually helps in reproducing the
assertion failure correctly ;)
--
Michael

Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [HACKERS] Horrible CREATE DATABASE Performance in High Sierra
Next
From: Gavin Flower
Date:
Subject: Re: [HACKERS] Horrible CREATE DATABASE Performance in High Sierra