Re: md.c vs elog.c vs smgrreleaseall() in barrier - Mailing list pgsql-hackers
From | Noah Misch |
---|---|
Subject | Re: md.c vs elog.c vs smgrreleaseall() in barrier |
Date | |
Msg-id | 20250320201644.3d.nmisch@google.com Whole thread Raw |
In response to | Re: md.c vs elog.c vs smgrreleaseall() in barrier (Andres Freund <andres@anarazel.de>) |
Responses |
Re: md.c vs elog.c vs smgrreleaseall() in barrier
|
List | pgsql-hackers |
On Thu, Mar 20, 2025 at 03:53:11PM -0400, Andres Freund wrote: > I updated the patch with the following changes: > > - Remove the assertion from smgrtruncate() - it would need to assert that it's > called in a critical section. > > Not sure why it's not already asserting that? > > The function header says: > * ... This function should normally > * be called in a critical section, but the current size must be checked > * outside the critical section, and no interrupts or smgr functions relating > * to this relation should be called in between. > > The "should normally" is bit weird imo, when would it be safe to *not* use > it in a critical section? I expect it would be okay in recovery, which is a crypto-critical-section IIRC. All callers, including smgr_redo(), do have an explicit critical section around the call. Hence, I gather we're no longer relying on any exceptions to this one's need for a critical section. > - added comments about the reason for HOLD_INTERRUPTS to smgrdounlinkall(), > smgrdestroyall() and smgrreleaseall() Perfect. > I still am leaning against backpatching, but I'm not sure that's not just > laziness. It's also some risk reduction. One of these smgr APIs might have a useful interruptibility that we're now blocking. (I'm not aware of one.) > On 2025-03-19 17:45:14 -0700, Noah Misch wrote: > > On Wed, Mar 19, 2025 at 06:45:20PM -0400, Andres Freund wrote: > > > On 2025-03-19 12:55:53 -0700, Noah Misch wrote: > > > > On Mon, Mar 17, 2025 at 07:52:02PM -0400, Andres Freund wrote: > > > > > @@ -434,6 +481,8 @@ smgrdosyncall(SMgrRelation *rels, int nrels) > > > > > if (nrels == 0) > > > > > return; > > > > > > > > > > + HOLD_INTERRUPTS(); > > > > > + > > > > > FlushRelationsAllBuffers(rels, nrels); > > > > > > > > FlushRelationsAllBuffers() isn't part of smgr or md.c, so it's unlikely to > > > > become sensitive to smgrrelease(). It may do a ton of I/O. Hence, I'd > > > > HOLD_INTERRUPTS() after FlushRelationsAllBuffers(), not before. > > > > > > Hm - we never would want to process interrupts while in > > > FlushRelationsAllBuffers() or such, would we? I'm ok with changing it, I > > > guess I just didn't see a reason not to use a wider scope. > > > > If we get a query cancel or fast shutdown, it's better for the user to abort > > the transaction rather than keep flushing. smgrDoPendingSyncs() calls here > > rather late in the pre-commit actions, so failing is still supposed to be > > fine. I think the code succeeds at making it fine to fail here. > > But we don't actually intentionally accept interrupts in > FlushRelationsAllBuffers()? Yes. It would be reasonable for future work to add that. > It would only happen as a side-effect of a > non-error elog/ereport() processing interrupts, right? Likely yes. > It also looks like we couldn't accept interrupts when called by > AbortTransaction(), because there we already are in a HOLD_INTERRUPTS() > region. I'm pretty sure an error would trigger at least an assertion. But > that's really an independent issue. The only smgrdosyncall() caller is smgrDoPendingSyncs(), which doesn't call it in the abort case. So I think we're good. > Moved. Thanks. > > > I suspect it's always called with interrupts held already though. > > > > Ah, confirmed. If I put this assert at the top of smgrdounlinkall(), > > check-world passes: > > > > Assert(IsBinaryUpgrade || InRecovery || !INTERRUPTS_CAN_BE_PROCESSED()); > > I just made it hold interrupts for now, hope that makes sense? Yep. Patch looks perfect.
pgsql-hackers by date: