Re: md.c vs elog.c vs smgrreleaseall() in barrier - Mailing list pgsql-hackers

From Noah Misch
Subject Re: md.c vs elog.c vs smgrreleaseall() in barrier
Date
Msg-id 20250320201644.3d.nmisch@google.com
Whole thread Raw
In response to Re: md.c vs elog.c vs smgrreleaseall() in barrier  (Andres Freund <andres@anarazel.de>)
Responses Re: md.c vs elog.c vs smgrreleaseall() in barrier
List pgsql-hackers
On Thu, Mar 20, 2025 at 03:53:11PM -0400, Andres Freund wrote:
> I updated the patch with the following changes:
> 
> - Remove the assertion from smgrtruncate() - it would need to assert that it's
>   called in a critical section.
> 
>   Not sure why it's not already asserting that?
> 
>   The function header says:
>    * ... This function should normally
>    * be called in a critical section, but the current size must be checked
>    * outside the critical section, and no interrupts or smgr functions relating
>    * to this relation should be called in between.
> 
>   The "should normally" is bit weird imo, when would it be safe to *not* use
>   it in a critical section?

I expect it would be okay in recovery, which is a crypto-critical-section
IIRC.  All callers, including smgr_redo(), do have an explicit critical
section around the call.  Hence, I gather we're no longer relying on any
exceptions to this one's need for a critical section.

> - added comments about the reason for HOLD_INTERRUPTS to smgrdounlinkall(),
>   smgrdestroyall() and smgrreleaseall()

Perfect.

> I still am leaning against backpatching, but I'm not sure that's not just
> laziness.

It's also some risk reduction.  One of these smgr APIs might have a useful
interruptibility that we're now blocking.  (I'm not aware of one.)

> On 2025-03-19 17:45:14 -0700, Noah Misch wrote:
> > On Wed, Mar 19, 2025 at 06:45:20PM -0400, Andres Freund wrote:
> > > On 2025-03-19 12:55:53 -0700, Noah Misch wrote:
> > > > On Mon, Mar 17, 2025 at 07:52:02PM -0400, Andres Freund wrote:
> > > > > @@ -434,6 +481,8 @@ smgrdosyncall(SMgrRelation *rels, int nrels)
> > > > >      if (nrels == 0)
> > > > >          return;
> > > > >
> > > > > +    HOLD_INTERRUPTS();
> > > > > +
> > > > >      FlushRelationsAllBuffers(rels, nrels);
> > > >
> > > > FlushRelationsAllBuffers() isn't part of smgr or md.c, so it's unlikely to
> > > > become sensitive to smgrrelease().  It may do a ton of I/O.  Hence, I'd
> > > > HOLD_INTERRUPTS() after FlushRelationsAllBuffers(), not before.
> > >
> > > Hm - we never would want to process interrupts while in
> > > FlushRelationsAllBuffers() or such, would we?  I'm ok with changing it, I
> > > guess I just didn't see a reason not to use a wider scope.
> >
> > If we get a query cancel or fast shutdown, it's better for the user to abort
> > the transaction rather than keep flushing.  smgrDoPendingSyncs() calls here
> > rather late in the pre-commit actions, so failing is still supposed to be
> > fine.  I think the code succeeds at making it fine to fail here.
> 
> But we don't actually intentionally accept interrupts in
> FlushRelationsAllBuffers()?

Yes.  It would be reasonable for future work to add that.

> It would only happen as a side-effect of a
> non-error elog/ereport() processing interrupts, right?

Likely yes.

> It also looks like we couldn't accept interrupts when called by
> AbortTransaction(), because there we already are in a HOLD_INTERRUPTS()
> region. I'm pretty sure an error would trigger at least an assertion. But
> that's really an independent issue.

The only smgrdosyncall() caller is smgrDoPendingSyncs(), which doesn't call it
in the abort case.  So I think we're good.

> Moved.

Thanks.

> > > I suspect it's always called with interrupts held already though.
> >
> > Ah, confirmed.  If I put this assert at the top of smgrdounlinkall(),
> > check-world passes:
> >
> >     Assert(IsBinaryUpgrade || InRecovery || !INTERRUPTS_CAN_BE_PROCESSED());
> 
> I just made it hold interrupts for now, hope that makes sense?

Yep.


Patch looks perfect.



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Have postgres.bki self-identify
Next
From: Nathan Bossart
Date:
Subject: Re: optimize file transfer in pg_upgrade