Home > mailing lists

Re: wrong fds used for refilenodes after pg_upgrade relfilenode changes Reply-To: - Mailing list pgsql-hackers

From	Thomas Munro
Subject	Re: wrong fds used for refilenodes after pg_upgrade relfilenode changes Reply-To:
Date	April 1, 2022 21:03:05
Msg-id	CA+hUKGJ0r1m76hHa7efQtD722SW64XSNfD+JqF3SrjtPgk9GaQ@mail.gmail.com Whole thread Raw
In response to	Re: wrong fds used for refilenodes after pg_upgrade relfilenode changes Reply-To: (Robert Haas <robertmhaas@gmail.com>)
Responses	Re: wrong fds used for refilenodes after pg_upgrade relfilenode changes Reply-To: Re: wrong fds used for refilenodes after pg_upgrade relfilenode changes Reply-To:
List	pgsql-hackers

Tree view

On Sat, Apr 2, 2022 at 2:52 AM Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Apr 1, 2022 at 2:04 AM Thomas Munro <thomas.munro@gmail.com> wrote:
> > The v1-0003 patch introduced smgropen_cond() to avoid the problem of
> > IssuePendingWritebacks(), which does desynchronised smgropen() calls
> > and could open files after the barrier but just before they are
> > unlinked.  Makes sense, but...
> >
> > 1.  For that to actually work, we'd better call smgrcloseall() when
> > PROCSIGNAL_BARRIER_SMGRRELEASE is handled; currently it only calls
> > closeAllVds().  Here's a patch for that.  But...
>
> What if we instead changed our approach to fixing
> IssuePendingWritebacks()?  Design sketch:
>
> 1. We add a new flag, bool smgr_zonk, to SmgrRelationData. Initially it's false.
> 2. Receipt of PROCSIGNAL_BARRIER_SMGRRELEASE sets smgr_zonk to true.
> 3. If you want, you can set it back to false any time you access the
> smgr while holding a relation lock.
> 4. IssuePendingWritebacks() uses smgropen_cond(), as today, but that
> function now refuses either to create a new smgr or to return one
> where smgr_zonk is true.
>
> I think this would be sufficient to guarantee that we never try to
> write back to a relfilenode if we haven't had a lock on it since the
> last PROCSIGNAL_BARRIER_SMGRRELEASE, which I think is the invariant we
> need here?
>
> My thinking here is that it's a lot easier to get away with marking
> the content of a data structure invalid than it is to get away with
> invalidating pointers to that data structure. If we can tinker with
> the design so that the SMgrRelationData doesn't actually go away but
> just gets a little bit more thoroughly invalidated, we could avoid
> having to audit the entire code base for code that keeps smgr handles
> around longer than would be possible with the design you demonstrate
> here.

Another idea would be to call a new function DropPendingWritebacks(),
and also tell all the SMgrRelation objects to close all their internal
state (ie the fds + per-segment objects) but not free the main
SMgrRelationData object, and for good measure also invalidate the
small amount of cached state (which hasn't been mentioned in this
thread, but it kinda bothers me that that state currently survives, so
it was one unspoken reason I liked the smgrcloseall() idea).  Since
DropPendingWritebacks() is in a rather different module, perhaps if we
were to do that we'd want to rename PROCSIGNAL_BARRIER_SMGRRELEASE to
something else, because it's not really a SMGR-only thing anymore.

pgsql-hackers by date:

From: "Euler Taveira"
Date: 01 April 2022, 20:59:51
Subject: merge documentation fix

From: Nathan Bossart
Date: 01 April 2022, 22:06:48
Subject: use has_privs_of_role() for pg_hba.conf

Re: wrong fds used for refilenodes after pg_upgrade relfilenode changes Reply-To: - Mailing list pgsql-hackers

Previous

Next