Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work - Mailing list pgsql-hackers

From Bharath Rupireddy
Subject Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
Date
Msg-id CALj2ACWSjE8AiJ4xaBS4ULQ8T6RevwsNhTD6yPBLugJBTD5LCw@mail.gmail.com
Whole thread Raw
In response to Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work  (Thomas Munro <thomas.munro@gmail.com>)
Responses Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
List pgsql-hackers
On Wed, Aug 10, 2022 at 2:11 PM Thomas Munro <thomas.munro@gmail.com> wrote:
>
> On Wed, Aug 10, 2022 at 7:15 PM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> > The main idea of using get_dirent_type() instead of lstat or stat is
> > to benefit from avoiding system calls on platforms where the directory
> > entry type is stored in dirent structure, but not to cause errors for
> > lstat or stat system calls failures where we previously didn't.
>
> Yeah, I get it... I'm OK with separating mechanical changes like
> lstat() -> get_dirent_type() from policy changes on error handling.  I
> initially thought the ERROR was surely better than what we're doing
> now (making extra system calls to avoid unlinking unexpected things,
> for which our only strategy on failure is to try to unlink the thing
> anyway), but it's better to discuss that separately.
>
> > I
> > think 0001 must be just replacing lstat or stat with get_dirent_type()
> > with elevel ERROR if lstat or stat failure is treated as ERROR
> > previously, otherwise with elevel LOG. I modified it as attached. Once
> > this patch gets committed, then we can continue the discussion as to
> > whether we make elog-LOG to elog-ERROR or vice-versa.
>
> Agreed, will look at this version tomorrow.

Thanks.

> > I'm tempted to use get_dirent_type() in RemoveXlogFile() but that
> > requires us to pass struct dirent * from RemoveOldXlogFiles() (as
> > attached 0002 for now), If done, this avoids one lstat() system call
> > per WAL file. IMO, that's okay to pass an extra function parameter to
> > RemoveXlogFile() as it is a static function and there can be many
> > (thousands of) WAL files at times, so saving system calls (lstat() *
> > number of WAL files) is definitely an advantage.
>
> -        lstat(path, &statbuf) == 0 && S_ISREG(statbuf.st_mode) &&
> +        get_dirent_type(path, xlde, false, LOG) != PGFILETYPE_REG &&
>
> I think you wanted ==, but maybe it'd be nicer to pass in a
> "recyclable" boolean to RemoveXlogFile()?

Ah, my bad, I corrected it now.

If you mean to do "bool recyclable = (get_dirent_type(path, xlde,
false, LOG) == PGFILETYPE_REG)"" and pass it as parameter to
RemoveXlogFile(), then we have to do get_dirent_type calls for every
WAL file even when any of (wal_recycle && *endlogSegNo <= recycleSegNo
&& XLogCtl->InstallXLogFileSegmentActive) is false which is
unnecessary and it's better to pass dirent structure as a parameter.

> If you're looking for more, rmtree() looks like another.

Are you suggesting to not use pgfnames() in rmtree() and do
opendir()-readdir()-get_dirent_type() directly? If yes, I don't think
it's a great idea because rmtree() has recursive calls and holding
opendir()-readdir() for all rmtree() recursive calls without
closedir() may eat up dynamic memory if there are deeper level
directories. I would like to leave it that way. Do you have any other
ideas in mind?

Please find the v15 patch, I merged the RemoveXlogFile() changes into 0001.

--
Bharath Rupireddy
RDS Open Source Databases: https://aws.amazon.com/rds/postgresql/

Attachment

pgsql-hackers by date:

Previous
From: Peter Smith
Date:
Subject: Re: Perform streaming logical transactions by background workers and parallel apply
Next
From: John Naylor
Date:
Subject: Re: NAMEDATALEN increase because of non-latin languages