Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work - Mailing list pgsql-hackers

From Nathan Bossart
Subject Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
Date
Msg-id 20220708171439.GB2356733@nathanxps13
Whole thread Raw
In response to Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Responses Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
List pgsql-hackers
On Fri, Jul 08, 2022 at 09:39:10PM +0530, Bharath Rupireddy wrote:
> 0001 - there are many places where lstat/stat is being used - don't we
> need to replace all or most of them with get_dirent_type?

It's been a while since I wrote this one, but I believe my intent was to
replace as many [l]stat() calls in ReadDir()-style loops as possible with
get_dirent_type().  Are there any that I've missed?

> 0002 - I'm not quite happy with this patch, with the change,
> checkpoint errors out, if it can't remove just a file - the comments
> there says it all. Is there any strong reason for this change?

Andres noted several concerns upthread.  In short, ignoring unexpected
errors makes them harder to debug and likely masks bugs.

FWIW I agree that it is unfortunate that a relatively non-critical error
here leads to checkpoint failures, which can cause much worse problems down
the road.  I think this is one of the reasons for moving tasks like this
out of the checkpointer, as I'm trying to do with the proposed custodian
process [0].

[0] https://commitfest.postgresql.org/38/3448/

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: pg15b2: large objects lost on upgrade
Next
From: Nathan Bossart
Date:
Subject: Re: remove more archiving overhead