Re: md.c vs elog.c vs smgrreleaseall() in barrier - Mailing list pgsql-hackers

From Noah Misch
Subject Re: md.c vs elog.c vs smgrreleaseall() in barrier
Date
Msg-id 20250320004514.95.nmisch@google.com
Whole thread Raw
In response to Re: md.c vs elog.c vs smgrreleaseall() in barrier  (Andres Freund <andres@anarazel.de>)
Responses Re: md.c vs elog.c vs smgrreleaseall() in barrier
List pgsql-hackers
On Wed, Mar 19, 2025 at 06:45:20PM -0400, Andres Freund wrote:
> On 2025-03-19 12:55:53 -0700, Noah Misch wrote:
> > On Mon, Mar 17, 2025 at 07:52:02PM -0400, Andres Freund wrote:
> > > @@ -362,12 +397,16 @@ smgrreleaseall(void)
> > >      if (SMgrRelationHash == NULL)
> > >          return;
> > >  
> > > +    HOLD_INTERRUPTS();
> > 
> > Likely not important, but it's not clear to me why smgrdestroyall() and
> > smgrreleaseall() get HOLD_INTERRUPTS(), as opposed to relying on the holds in
> > smgrdestroy() and smgrrelease().  In contrast, smgrreleaserellocator() does
> > rely on smgrrelease() for the hold.
> 
> It didn't seem particularly safe to allow interrupts, which in turn could
> change the list of open relations, while iterating over a linked list / a
> hashtable.

Fair.

> > > @@ -434,6 +481,8 @@ smgrdosyncall(SMgrRelation *rels, int nrels)
> > >      if (nrels == 0)
> > >          return;
> > >  
> > > +    HOLD_INTERRUPTS();
> > > +
> > >      FlushRelationsAllBuffers(rels, nrels);
> > 
> > FlushRelationsAllBuffers() isn't part of smgr or md.c, so it's unlikely to
> > become sensitive to smgrrelease().  It may do a ton of I/O.  Hence, I'd
> > HOLD_INTERRUPTS() after FlushRelationsAllBuffers(), not before.
> 
> Hm - we never would want to process interrupts while in
> FlushRelationsAllBuffers() or such, would we?  I'm ok with changing it, I
> guess I just didn't see a reason not to use a wider scope.

If we get a query cancel or fast shutdown, it's better for the user to abort
the transaction rather than keep flushing.  smgrDoPendingSyncs() calls here
rather late in the pre-commit actions, so failing is still supposed to be
fine.  I think the code succeeds at making it fine to fail here.

> I guess I am a bit paranoid because gave me flashbacks to issues around
> smgrtruncate() failing after doing DropRelationBuffers(), that Thomas recently
> fixed (and I had worked on a few years before). But of course
> DropRelationBuffers() is way more dangerous than FlushRelationsAllBuffers().

Fair.

> > >          }
> > >      }
> > > +
> > > +    RESUME_INTERRUPTS();
> > >  }
> > >  
> > >  /*
> > > @@ -471,6 +522,8 @@ smgrdounlinkall(SMgrRelation *rels, int nrels, bool isRedo)
> > >      if (nrels == 0)
> > >          return;
> > >  
> > > +    HOLD_INTERRUPTS();
> > 
> > I would move this below DropRelationsAllBuffers(), for reasons like
> > FlushRelationsAllBuffers() above.
> 
> I think that'd be unsafe. Once we dropped buffers from the buffer pool we
> can't just continue without also unlinking the underlying relation, otherwise
> an older view of the data later can be "revived from the dead" after an error,
> causing all manner of corruption.
> 
> I suspect it's always called with interrupts held already though.

Ah, confirmed.  If I put this assert at the top of smgrdounlinkall(),
check-world passes:

    Assert(IsBinaryUpgrade || InRecovery || !INTERRUPTS_CAN_BE_PROCESSED());

> > > Subject: [PATCH v2 2/2] smgr: Make SMgrRelation initialization safer against
> > >  errors
> > > 
> > > In case the smgr_open callback failed, the ->pincount field would not be
> > > initialized and the relation would not be put onto the unpinned_relns list.
> > > 
> > > This buglet was introduced in 21d9c3ee4ef7. As that commit is only in HEAD, no
> > > need to backpatch.
> > > 
> > > Discussion: https://postgr.es/m/3vae7l5ozvqtxmd7rr7zaeq3qkuipz365u3rtim5t5wdkr6f4g@vkgf2fogjirl
> > > ---
> > >  src/backend/storage/smgr/smgr.c | 6 +++---
> > >  1 file changed, 3 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/src/backend/storage/smgr/smgr.c b/src/backend/storage/smgr/smgr.c
> > > index 53a09fe4aaa..24971304b85 100644
> > > --- a/src/backend/storage/smgr/smgr.c
> > > +++ b/src/backend/storage/smgr/smgr.c
> > > @@ -255,12 +255,12 @@ smgropen(RelFileLocator rlocator, ProcNumber backend)
> > >              reln->smgr_cached_nblocks[i] = InvalidBlockNumber;
> > >          reln->smgr_which = 0;    /* we only have md.c at present */
> > >  
> > > -        /* implementation-specific initialization */
> > > -        smgrsw[reln->smgr_which].smgr_open(reln);
> > > -
> > >          /* it is not pinned yet */
> > >          reln->pincount = 0;
> > >          dlist_push_tail(&unpinned_relns, &reln->node);
> > > +
> > > +        /* implementation-specific initialization */
> > > +        smgrsw[reln->smgr_which].smgr_open(reln);
> > >      }
> > 
> > I struggle to speculate about the merits of this, because mdopen() can't fail.
> > If mdopen() started to do things that could fail, mdnblocks() would be
> > reasonable in assuming those things are done.  Hence, the long-term direction
> > should be more like destroying the new smgr entry in the event of error.
> > 
> > I would not make this change.  I'd maybe add a comment that smgr_open
> > callbacks currently aren't allowed to fail, since smgropen() isn't ready to
> > clean up smgr-level state from a failed open.  How do you see it?
> 
> I see no disadvantage in the change - it seems strictly better to initialize
> pincount earlier. I agree that it might be a good idea to explicitly handle
> errors eventually, but that'd not be made harder by this change...

Okay.  I suppose if mdopen() gained the ability to fail and mdnblocks() also
gained the ability to cure said failure, this change would make that okay.



pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: [PoC] Federated Authn/z with OAUTHBEARER
Next
From: "David G. Johnston"
Date:
Subject: Re: Make COPY format extendable: Extract COPY TO format implementations