Re: checkpointer: PANIC: could not fsync file: No such file or directory - Mailing list pgsql-hackers

From Andres Freund
Subject Re: checkpointer: PANIC: could not fsync file: No such file or directory
Date
Msg-id nyj4k7yur5t27rtygvx2i2lrlp6rqfvvhoiiwx4fznynksf2et@4hj2sp42alpe
Whole thread Raw
In response to Re: checkpointer: PANIC: could not fsync file: No such file or directory  (Thomas Munro <thomas.munro@gmail.com>)
Responses Re: checkpointer: PANIC: could not fsync file: No such file or directory
List pgsql-hackers
Hi,

On 2019-12-13 17:41:56 +1300, Thomas Munro wrote:
> From 9609c9a153232fb2de169bf76158781d354c633b Mon Sep 17 00:00:00 2001
> From: Thomas Munro <tmunro@postgresql.org>
> Date: Fri, 13 Dec 2019 17:12:42 +1300
> Subject: [PATCH] Don't use _mdfd_getseg() in mdsyncfiletag().
> 
> _mdfd_getseg() opens all segments up to the requested one.  That
> causes problems for mdsyncfiletag(), if mdunlinkfork() has
> already unlinked other segment files.  Open the file we want
> directly by name instead, if we don't have it already.
> 
> The consequence of this bug was a rare panic in the checkpointer,
> made more likely if you saturated the sync request queue so that
> the SYNC_FORGET_REQUEST messages for a given relation were more
> likely to be absorbed in separate cycles by the checkpointer.
> 
> Back-patch to 12.  Defect in commit 3eb77eba.
> 
> Author: Thomas Munro
> Reported-by: Justin Pryzby
> Discussion: https://postgr.es/m/20191119115759.GI30362%40telsasoft.com
> ---
>  src/backend/storage/smgr/md.c | 56 ++++++++++++++++++++++++-----------
>  1 file changed, 39 insertions(+), 17 deletions(-)
> 
> diff --git a/src/backend/storage/smgr/md.c b/src/backend/storage/smgr/md.c
> index 8a9eaf6430..1a7b91b523 100644
> --- a/src/backend/storage/smgr/md.c
> +++ b/src/backend/storage/smgr/md.c
> @@ -1280,25 +1280,47 @@ int
>  mdsyncfiletag(const FileTag *ftag, char *path)
...
> -    /* Try to open the requested segment. */
> -    v = _mdfd_getseg(reln,
> -                     ftag->forknum,
> -                     ftag->segno * (BlockNumber) RELSEG_SIZE,
> -                     false,
> -                     EXTENSION_RETURN_NULL | EXTENSION_DONT_CHECK_SIZE);
> -    if (v == NULL)
> -        return -1;
> -
> -    /* Try to fsync the file. */
> -    return FileSync(v->mdfd_vfd, WAIT_EVENT_DATA_FILE_SYNC);
> +    return result;
>  }

This was the only user of EXTENSION_DONT_CHECK_SIZE. It's not generally safe,
as its comment says:

/*
 * Allow opening segments which are preceded by segments smaller than
 * RELSEG_SIZE, e.g. inactive segments (see above). Note that this breaks
 * mdnblocks() and related functionality henceforth - which currently is ok,
 * because this is only required in the checkpointer which never uses
 * mdnblocks().
 */

But since the above change checkpointer doesn't use EXTENSION_DONT_CHECK_SIZE
anymore.

Seems we should remove this code?

Greetings,

Andres



pgsql-hackers by date:

Previous
From: Vladlen Popolitov
Date:
Subject: Re: COPY performance on Windows
Next
From: Sergei Kornilov
Date:
Subject: Re: Track the amount of time waiting due to cost_delay