On Tue, Aug 9, 2022 at 9:20 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Thomas Munro <thomas.munro@gmail.com> writes:
> > On Tue, Aug 9, 2022 at 3:27 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> I have not tried to analyze the error-handling properties of 0001,
> >> but if it's being equally cavalier then it shouldn't be committed
> >> either.
>
> > 0001 does introduce new errors, as mentioned in the commit message, in
> > the form of elevel ERROR passed into get_dirent_type(), which might be
> > thrown if your OS has no d_type and lstat() fails (also if you asked
> > to follow symlinks, but in those cases errors were already thrown).
> > But in those cases, it seems at least a little fishy that we ignored
> > errors from the existing lstat().
>
> Hmmm ... I'll grant that ignoring lstat errors altogether isn't great.
> But should the replacement behavior be elog-LOG-and-press-on,
> or elog-ERROR-and-fail-the-surrounding-operation? I'm not in any
> hurry to believe that the latter is more appropriate without some
> analysis of what the callers are doing.
>
> The bottom line here is that I'm distrustful of behavioral changes
> introduced to simplify refactoring rather than to solve a live
> problem.
+1. I agree with Tom not to change elog-LOG to elog-ERROR and fail the
checkpoint operation. Because the checkpoint is more important than
why a single snapshot file (out thousands or even million files) isn't
removed at that moment. Also, I originally proposed to change
elog-ERROR to elog-LOG in CheckPointLogicalRewriteHeap for unlink()
failures for the same reason.
--
Bharath Rupireddy
RDS Open Source Databases: https://aws.amazon.com/rds/postgresql/