Re: [PATCHES] Infrastructure changes for recovery (v8) - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: [PATCHES] Infrastructure changes for recovery (v8)
Date
Msg-id 49217D3F.9050603@enterprisedb.com
Whole thread Raw
In response to Re: [PATCHES] Infrastructure changes for recovery (v8)  (Simon Riggs <simon@2ndQuadrant.com>)
Responses Re: [PATCHES] Infrastructure changes for recovery (v8)
Re: [PATCHES] Infrastructure changes for recovery (v8)
List pgsql-hackers
Simon Riggs wrote:
> @@ -3845,6 +3850,52 @@ sigusr1_handler(SIGNAL_ARGS)
>  
>         PG_SETMASK(&BlockSig);
>  
> +       if (CheckPostmasterSignal(PMSIGNAL_RECOVERY_START))
> +       {
> +               Assert(pmState == PM_STARTUP);
> +
> +               /*
> +                * Go to shutdown mode if a shutdown request was pending.
> +                */
> +               if (Shutdown > NoShutdown)
> +               {
> +                       pmState = PM_WAIT_BACKENDS;
> +                       /* PostmasterStateMachine logic does the rest */
> +               }
> +               else
> +               {
> +                       /*
> +                        * Startup process has entered recovery
> +                        */
> +                       pmState = PM_RECOVERY;


Hmm, I smell a race condition here:

1. Startup process goes into consistent state, and signals postmaster
2. Startup process finishes WAL replay and dies
3. Postmaster wakes up in reaper(), noting that the startup process 
dies, and goes into PM_RUN mode.
4. The above signal handler for postmaster is run, causing an assertion 
failure, or putting postmaster back into PM_RECOVERY mode if assertions 
are disabled.

Highly unlikely in practice, given how much code needs to run in the 
startup process between signaling the postmaster and exiting, but it 
seems theoretically possible. Do we care, and if we do, how can we fix it?

> +
> +                       /*
> +                        * Load the flat authorization file into postmaster's ca
> +                        * startup process won't have recomputed this from the d
> +                        * yet, so it may change following recovery.
> +                        */
> +                       load_role();

Is there a race condition here too, if the startup process is writing 
the auth file at the same time? I guess we'd have the same problem with 
flat files in general, so maybe there's something else that mitigates 
the problem?

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: PG_PAGE_LAYOUT_VERSION 5 - time for change
Next
From: Martijn van Oosterhout
Date:
Subject: Re: Block-level CRC checks