On Mon, Jan 31, 2022 at 10:42:54AM +0530, Bharath Rupireddy wrote:
> After an off-list discussion with Andreas, proposing here a patch that
> basically replaces ReadDir call with ReadDirExtended and gets rid of
> lstat entirely. With this chance, the checkpoint will only care about
> the snapshot and mapping files and not fail if it finds other files in
> the directories. Removing lstat enables us to make things faster as we
> avoid a bunch of extra system calls - one lstat call per each mapping
> or snapshot file.
I think removing the lstat() is probably reasonable. We currently aren't
doing proper error checking, and the chances of a non-regular file matching
the prefix are likely pretty low. In the worst case, we'll LOG or ERROR
when unlinking or fsyncing fails.
However, I'm not sure about the change to ReadDirExtended(). That might be
okay for CheckPointSnapBuild(), which is just trying to remove old files,
but CheckPointLogicalRewriteHeap() is responsible for ensuring that files
are flushed to disk for the checkpoint. If we stop reading the directory
after an error and let the checkpoint continue, isn't it possible that some
mappings files won't be persisted to disk?
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com