Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work - Mailing list pgsql-hackers

From Bossart, Nathan
Subject Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
Date
Msg-id 21F86990-CC52-42B2-8A85-A3DA110F57F1@amazon.com
Whole thread Raw
In response to Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
List pgsql-hackers
I took the liberty of adjusting Bharath's patch based on the latest
feedback.

On 1/19/22, 10:35 AM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
> We should be fsync'ing the directory not the files
> themselves, no?

I added a directory sync at the end of CheckPointLogicalRewriteHeap(),
which IIUC is enough.

> * Why is the check for "map-" prefix after, rather than before,
> the lstat?

I swapped these checks.  I stopped short of moving the sscanf() before
the lstat(), though, as 1) I don't think it will help very much and 2)
it seemed weird to start emitting "could not parse filename" logs for
non-regular files we presently skip silently.

> * Why is it okay to ignore lstat failure?  Seems like we might
> as well not even have the lstat.

I added error checking for lstat().

> * The sscanf on the file name would not notice trailing junk,
> such as an editor backup marker.  Is that okay?

I think this is okay.  The absolute worst that would happen would be
that the extra file would be deleted.  This might eventually become a
problem if files with the same prefix format were created by the
server.  However, CheckPointSnapBuild() already has this problem with
temporary files, and it claims not to need any extra handling:

         * temporary filenames from SnapBuildSerialize() include the LSN and
         * everything but are postfixed by .$pid.tmp. We can just remove them
         * the same as other files because there can be none that are
         * currently being written that are older than cutoff.

> (Actually, isn't the
> separate check for "map-" useless, given that sscanf will make
> the equivalent check?)

The only benefit I see from the extra "map-" check is that it'll avoid
"could not parse filename" logs for files that clearly aren't related
to the task at hand.  I don't know if this is expected during normal
operation at the moment.  I've left the "map-" check for now.

> I started out wondering why the patch didn't also change the loop's
> other ERROR conditions to LOG.  But we do want to ERROR if we're
> unable to sync transient state down to disk, and that is what
> the other steps (think they) are doing.  It might be worth a
> comment to point that out though, before someone decides that
> if these errors are just LOG level then the others can be too.

I added such a comment.

I also updated CheckPointSnapBuild() and UpdateLogicalMappings() with
similar adjustments.

Nathan


Attachment

pgsql-hackers by date:

Previous
From: Nikita Malakhov
Date:
Subject: Re: Pluggable toaster
Next
From: Robert Haas
Date:
Subject: autovacuum prioritization