On Thu, Feb 3, 2022 at 12:07 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
>
> On Wed, Feb 02, 2022 at 05:19:26PM +0530, Bharath Rupireddy wrote:
> > On Wed, Feb 2, 2022 at 5:25 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
> >> However, I'm not sure about the change to ReadDirExtended(). That might be
> >> okay for CheckPointSnapBuild(), which is just trying to remove old files,
> >> but CheckPointLogicalRewriteHeap() is responsible for ensuring that files
> >> are flushed to disk for the checkpoint. If we stop reading the directory
> >> after an error and let the checkpoint continue, isn't it possible that some
> >> mappings files won't be persisted to disk?
> >
> > Unless I mis-read your above statement, with LOG level in
> > ReadDirExtended, I don't think we stop reading the files in
> > CheckPointLogicalRewriteHeap. Am I missing something here?
>
> ReadDirExtended() has the following comment:
>
> * If elevel < ERROR, returns NULL after any error. With the normal coding
> * pattern, this will result in falling out of the loop immediately as
> * though the directory contained no (more) entries.
>
> If there is a problem reading the directory, we will LOG and then exit the
> loop. If we didn't scan through all the entries in the directory, there is
> a chance that we didn't fsync() all the files that need it.
Thanks. I get it. For syncing map files, we don't want to tolerate any
errors, whereas removal of the old map files (lesser than cutoff LSN)
can be tolerated in CheckPointLogicalRewriteHeap.
Here's the v7 version using ReadDir for CheckPointLogicalRewriteHeap.
Regards,
Bharath Rupireddy.