Re: wrong fds used for refilenodes after pg_upgrade relfilenode changes Reply-To: - Mailing list pgsql-hackers

From Robert Haas
Subject Re: wrong fds used for refilenodes after pg_upgrade relfilenode changes Reply-To:
Date
Msg-id CA+TgmoY5nLFZazkYUHd66D_zjDPE5c5sWjJujjb4x4vu74wV5g@mail.gmail.com
Whole thread Raw
In response to Re: wrong fds used for refilenodes after pg_upgrade relfilenode changes Reply-To:  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On Wed, Mar 2, 2022 at 3:00 PM Andres Freund <andres@anarazel.de> wrote:
> What I am stuck on is what we can do for the released branches. Data
> corruption after two consecutive ALTER DATABASE SET TABLESPACEs seems like
> something we need to address.

I think we should consider back-porting the ProcSignalBarrier stuff
eventually. I realize that it's not really battle-tested yet and I am
not saying we should do it right now, but I think that if we get these
changes into v15, back-porting it in let's say May of next year could
be a reasonable thing to do. Sure, there is some risk there, but on
the other hand, coming up with completely different fixes for the
back-branches is not risk-free either, nor is it clear that there is
any alternative fix that is nearly as good. In the long run, I am
fairly convinced that ProcSignalBarrier is the way forward not only
for this purpose but for other things as well, and everybody's got to
get on the train or be left behind.

Also, I am aware of multiple instances where the project waited a
long, long time to fix bugs because we didn't have a back-patchable
fix. I disagree with that on principle. A master-only fix now is
better than a back-patchable fix two or three years from now. Of
course a back-patchable fix now is better still, but we have to pick
from the options we have, not the ones we'd like to have.

<digressing a bit>

It seems to me that if we were going to try to construct an
alternative fix for the back-branches, it would have to be something
that didn't involve a new invalidation mechanism -- because the
ProcSignalBarrier stuff is an invalidation mechanism in effect, and I
feel that it can't be better to invent two new invalidation mechanisms
rather than one. And the only idea I have is trying to detect a
dangerous sequence of operations and just outright block it. We have
some cases sort of like that already - e.g. you can't prepare a
transaction if it's done certain things. But, the existing precedents
that occur to me are, I think, all cases where all of the related
actions are being performed in the same backend. It doesn't sound
crazy to me to have some rule like "you can't ALTER TABLESPACE on the
same tablespace in the same backend twice in a row without an
intervening checkpoint", or whatever, and install the book-keeping to
enforce that. But I don't think anything like that can work, both
because the two ALTER TABLESPACE commands could be performed in
different sessions, and also because an intervening checkpoint is no
guarantee of safety anyway, IIUC. So I'm just not really seeing a
reasonable strategy that isn't basically the barrier stuff.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: [PATCH] Expose port->authn_id to extensions and triggers
Next
From: Justin Pryzby
Date:
Subject: Re: libpq compression (part 2)