Re: END_OF_RECOVERY shutdowns and ResetUnloggedRelations() - Mailing list pgsql-hackers

From Andres Freund
Subject Re: END_OF_RECOVERY shutdowns and ResetUnloggedRelations()
Date
Msg-id 20140925204118.GE16581@awork2.anarazel.de
Whole thread Raw
In response to Re: END_OF_RECOVERY shutdowns and ResetUnloggedRelations()  (Abhijit Menon-Sen <ams@2ndQuadrant.com>)
Responses Re: END_OF_RECOVERY shutdowns and ResetUnloggedRelations()
Re: END_OF_RECOVERY shutdowns and ResetUnloggedRelations()
List pgsql-hackers
Hi,

On 2014-09-24 17:06:05 +0530, Abhijit Menon-Sen wrote:
> Hi Andres, Robert.
> 
> I've attached four patches here.

Cool. Will review.

> 1. Move the call to ResetUnloggedRelations(UNLOGGED_RELATION_INIT) to
>    earlier in StartupXLOG.
> 
> 2. Inside that function, issue fsync()s for the main forks we create by
>    copying the _init fork.

These two imo should definitely be backpatched.

> 3. A small fixup to add a const to "typedef char *FileName", because the
>    earlier patch gave me warnings about discarding const-ness. This is
>    consistent with many other functions in fd.c that take const char *.

I'm happy with consider that one for master (although I seem to recall having had
a patch for it rejected?), but I don't think we want to do that in the
backbranches.

> 4. Issue an fsync() on the data directory at startup if we need to
>    perform crash recovery.

I personally think this one should be backpatched too. Does anyone
believe differently?

> From d8726c06cdf11674661eac1d091cf7edd05c2a0c Mon Sep 17 00:00:00 2001
> From: Abhijit Menon-Sen <ams@2ndQuadrant.com>
> Date: Wed, 24 Sep 2014 14:43:18 +0530
> Subject: Call ResetUnloggedRelations(UNLOGGED_RELATION_INIT) earlier
> 
> We need to call this after recovery, but not after the END_OF_RECOVERY
> checkpoint is written. If we crash after that checkpoint, for example
> because of ENOSPC in PreallocXlogFiles, the checkpoint has already set
> the ControlFile->state to DB_SHUTDOWNED, so we don't enter recovery
> again at startup.
> 
> Because we did call ResetUnloggedRelations(UNLOGGED_RELATION_CLEANUP)
> in the first cleanup, this leaves the database with a bunch of _init
> forks for unlogged relations, but no main forks, leading to scary
> errors.
> 
> See thread from 20140912112246.GA4984@alap3.anarazel.de for details.

With a explanation. Shiny!

> ---
>  src/backend/access/transam/xlog.c | 16 ++++++++--------
>  1 file changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
> index 46eef5f..218f7fb 100644
> --- a/src/backend/access/transam/xlog.c
> +++ b/src/backend/access/transam/xlog.c
> @@ -6863,6 +6863,14 @@ StartupXLOG(void)
>      ShutdownWalRcv();
>  
>      /*
> +     * Reset initial contents of unlogged relations.  This has to be done
> +     * AFTER recovery is complete so that any unlogged relations created
> +     * during recovery also get picked up.
> +     */
> +    if (InRecovery)
> +        ResetUnloggedRelations(UNLOGGED_RELATION_INIT);
> +

+
* Also recovery shouldn't be regarded successful if the reset fails -* e.g. because of ENOSPC.

> +
> +        /*
> +         * copy_file() above has already called pg_flush_data() on the
> +         * files it created. Now we need to fsync those files, because
> +         * a checkpoint won't do it for us while we're in recovery.
> +         */

+

* Doing this in a separate pass is advantageous for performance reasons
* because it allows the kernel to perform all the flushes at once.

> +    /*
> +     * If we need to perform crash recovery, we issue an fsync on the
> +     * data directory to try to ensure that any data written before the
> +     * crash are flushed to disk. Otherwise a power failure in the near
> +     * future might mean that earlier unflushed writes are lost, but the
> +     * more recent data written to disk from here on are persisted.
> +     */
> +
> +    if (ControlFile->state != DB_SHUTDOWNED &&
> +        ControlFile->state != DB_SHUTDOWNED_IN_RECOVERY)
> +        fsync_fname(data_directory, true);
> +
>      if (ControlFile->state == DB_SHUTDOWNED)
>      {
>          /* This is the expected case, so don't be chatty in standalone mode */
> -- 

Unless I miss something this isn't sufficient. We need to fsync the
files in the data directory, not just the toplevel directory?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: B-Tree support function number 3 (strxfrm() optimization)
Next
From: Simon Riggs
Date:
Subject: Re: INSERT ... ON CONFLICT {UPDATE | IGNORE}