Re: Problem while setting the fpw with SIGHUP - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: Problem while setting the fpw with SIGHUP
Date
Msg-id 20180412050753.GA19289@paquier.xyz
Whole thread Raw
In response to Re: Problem while setting the fpw with SIGHUP  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
Responses Re: Problem while setting the fpw with SIGHUP
List pgsql-hackers
On Thu, Apr 12, 2018 at 10:34:30AM +0900, Kyotaro HORIGUCHI wrote:
> Checkpointer never calls CreateCheckPoint while
> RecoveryInProgress() == true. In other words, checkpointer is not
> an updator of shared FPW at the time StartupXLOG calls
> CreateCheckPoint for fallback_promote.

I have been able to spend a couple of hours on your patch, wrapping my
mind on your stuff.  So what I had in mind was something like this type
of scenario:
1) The startup process requires a restart point.
2) The checkpointer receives the request, and blocks before reading
RecoveryInProgress().
3) A fallback_promote is triggered, making the startup process call
CreateCheckpoint().
4) Startup process finishes checkpoint, updates Insert->fullPageWrites.
5) Checkpoint reads RecoveryInProgress to false, moves on with its
checkpoint.

> The comment may be somewhat confusing that it is written
> there. The point is that checkpointer and StartupXLOG are
> mutually excluded on updating shared FPW by
> SharedRecoveryInProgress flag.

Indeed.  I can see that it is the main key point of the patch.

> | * If full_page_writes has been turned off, issue XLOG_FPW_CHANGE before
> | * the flag actually takes effect. Checkpointer never calls this function
> | * before StartupXLOG() turns off SharedRecoveryInProgress so there's no
> | * window where checkpointer and startup processes - the only updators of
> | * the flag - can update shared FPW simultaneously. Thus no lock is
> | * required here. Both shared and local fullPageWrites do not change
> | * before the next reading below.

Yeah, this reduces the confusion.

(The latest patch is a mix of two patches.)

+        The default is <literal>on</literal>. The change of the parmeter takes
+        effect at the next checkpoint time.
s/parmeter/parameter/

By the way, I would vote for keeping track in WAL of full_page_writes
switched from off to on.  This is not used in the backend, but that's
still useful for debugging end-user issues.

Actually, I was wondering why a spin lock is not taken in
RecoveryInProgress when reading SharedRecoveryInProgress, but that's
from 1a3d1044 which added a memory barrier instead as guarantee...
--
Michael

Attachment

pgsql-hackers by date:

Previous
From: David Rowley
Date:
Subject: Re: [HACKERS] Runtime Partition Pruning
Next
From: Ashutosh Bapat
Date:
Subject: Re: Creation of wiki page for open items of v11