Re: wrong fds used for refilenodes after pg_upgrade relfilenode changes Reply-To: - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: wrong fds used for refilenodes after pg_upgrade relfilenode changes Reply-To:
Date
Msg-id CA+hUKG+ge5j_r4bDxkzL-S2zmUWrgvnG6GiN3OeVjhvofcWXVg@mail.gmail.com
Whole thread Raw
In response to Re: wrong fds used for refilenodes after pg_upgrade relfilenode changes Reply-To:  (Thomas Munro <thomas.munro@gmail.com>)
Responses Re: wrong fds used for refilenodes after pg_upgrade relfilenode changes Reply-To:
List pgsql-hackers
On Wed, May 4, 2022 at 8:53 AM Thomas Munro <thomas.munro@gmail.com> wrote:
> Got some off-list clues: that's just distracting Perl cleanup noise
> after something else went wrong (thanks Robert), and now I'm testing a
> theory from Andres that we're missing a barrier on the redo side when
> replaying XLOG_DBASE_CREATE_FILE_COPY.  More soon.

Yeah, looks like that was the explanation.  Presumably in older
releases, recovery can fail with EACCES here, and since commit
e2f0f8ed we get ENOENT, because someone's got an unlinked file open,
and ReadDir() can still see it.  (I've wondered before if ReadDir()
should also hide zombie Windows directory entries, but that's kinda
independent and would only get us one step further, a later rmdir()
would still fail.)  Adding the barrier fixes the problem.  Assuming no
objections or CI failures show up, I'll consider pushing the first two
patches tomorrow.

Attachment

pgsql-hackers by date:

Previous
From: "David G. Johnston"
Date:
Subject: Re: SQL/JSON: FOR ORDINALITY bug
Next
From: Amit Kapila
Date:
Subject: Re: Perform streaming logical transactions by background workers and parallel apply