Re: md.c vs elog.c vs smgrreleaseall() in barrier - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: md.c vs elog.c vs smgrreleaseall() in barrier |
Date | |
Msg-id | af3gdqg3guov3flyze2gjjphrrfbnusjesverlknn4tpxtafkh@vfpat6ffutek Whole thread Raw |
In response to | Re: md.c vs elog.c vs smgrreleaseall() in barrier (Noah Misch <noah@leadboat.com>) |
Responses |
Re: md.c vs elog.c vs smgrreleaseall() in barrier
Re: md.c vs elog.c vs smgrreleaseall() in barrier |
List | pgsql-hackers |
Hi, On 2025-03-19 12:55:53 -0700, Noah Misch wrote: > On Mon, Mar 17, 2025 at 07:52:02PM -0400, Andres Freund wrote: > > Here's a proposed patch for this. It turns out that the bug might already be > > reachable, even without defining FDDEBUG. There's a debug ereport() in > > register_dirty_segment() - but it's hard to reach in practice. > > > > I don't really know whether that means we ought to backpatch or not - which > > makes me want to be conservative and not backpatch. > > Non-backpatch sounds fine. Cool. > > @@ -362,12 +397,16 @@ smgrreleaseall(void) > > if (SMgrRelationHash == NULL) > > return; > > > > + HOLD_INTERRUPTS(); > > Likely not important, but it's not clear to me why smgrdestroyall() and > smgrreleaseall() get HOLD_INTERRUPTS(), as opposed to relying on the holds in > smgrdestroy() and smgrrelease(). In contrast, smgrreleaserellocator() does > rely on smgrrelease() for the hold. It didn't seem particularly safe to allow interrupts, which in turn could change the list of open relations, while iterating over a linked list / a hashtable. > > @@ -434,6 +481,8 @@ smgrdosyncall(SMgrRelation *rels, int nrels) > > if (nrels == 0) > > return; > > > > + HOLD_INTERRUPTS(); > > + > > FlushRelationsAllBuffers(rels, nrels); > > FlushRelationsAllBuffers() isn't part of smgr or md.c, so it's unlikely to > become sensitive to smgrrelease(). It may do a ton of I/O. Hence, I'd > HOLD_INTERRUPTS() after FlushRelationsAllBuffers(), not before. Hm - we never would want to process interrupts while in FlushRelationsAllBuffers() or such, would we? I'm ok with changing it, I guess I just didn't see a reason not to use a wider scope. I guess I am a bit paranoid because gave me flashbacks to issues around smgrtruncate() failing after doing DropRelationBuffers(), that Thomas recently fixed (and I had worked on a few years before). But of course DropRelationBuffers() is way more dangerous than FlushRelationsAllBuffers(). > > } > > } > > + > > + RESUME_INTERRUPTS(); > > } > > > > /* > > @@ -471,6 +522,8 @@ smgrdounlinkall(SMgrRelation *rels, int nrels, bool isRedo) > > if (nrels == 0) > > return; > > > > + HOLD_INTERRUPTS(); > > I would move this below DropRelationsAllBuffers(), for reasons like > FlushRelationsAllBuffers() above. I think that'd be unsafe. Once we dropped buffers from the buffer pool we can't just continue without also unlinking the underlying relation, otherwise an older view of the data later can be "revived from the dead" after an error, causing all manner of corruption. I suspect it's always called with interrupts held already though. But unfortunately I think this probably needs to be done in a critical section, not just run with interrupts held. We really really shouldn't ever palloc() after doing something like DropRelationsAllBuffers(). Thomas just spent a lot of time fixing corruption issues arising for related issues around relation truncations... I think this may mean that an error during smgr_unlink() leads to a cluster in a corrupted state? > > Subject: [PATCH v2 2/2] smgr: Make SMgrRelation initialization safer against > > errors > > > > In case the smgr_open callback failed, the ->pincount field would not be > > initialized and the relation would not be put onto the unpinned_relns list. > > > > This buglet was introduced in 21d9c3ee4ef7. As that commit is only in HEAD, no > > need to backpatch. > > > > Discussion: https://postgr.es/m/3vae7l5ozvqtxmd7rr7zaeq3qkuipz365u3rtim5t5wdkr6f4g@vkgf2fogjirl > > --- > > src/backend/storage/smgr/smgr.c | 6 +++--- > > 1 file changed, 3 insertions(+), 3 deletions(-) > > > > diff --git a/src/backend/storage/smgr/smgr.c b/src/backend/storage/smgr/smgr.c > > index 53a09fe4aaa..24971304b85 100644 > > --- a/src/backend/storage/smgr/smgr.c > > +++ b/src/backend/storage/smgr/smgr.c > > @@ -255,12 +255,12 @@ smgropen(RelFileLocator rlocator, ProcNumber backend) > > reln->smgr_cached_nblocks[i] = InvalidBlockNumber; > > reln->smgr_which = 0; /* we only have md.c at present */ > > > > - /* implementation-specific initialization */ > > - smgrsw[reln->smgr_which].smgr_open(reln); > > - > > /* it is not pinned yet */ > > reln->pincount = 0; > > dlist_push_tail(&unpinned_relns, &reln->node); > > + > > + /* implementation-specific initialization */ > > + smgrsw[reln->smgr_which].smgr_open(reln); > > } > > I struggle to speculate about the merits of this, because mdopen() can't fail. > If mdopen() started to do things that could fail, mdnblocks() would be > reasonable in assuming those things are done. Hence, the long-term direction > should be more like destroying the new smgr entry in the event of error. > > I would not make this change. I'd maybe add a comment that smgr_open > callbacks currently aren't allowed to fail, since smgropen() isn't ready to > clean up smgr-level state from a failed open. How do you see it? I see no disadvantage in the change - it seems strictly better to initialize pincount earlier. I agree that it might be a good idea to explicitly handle errors eventually, but that'd not be made harder by this change... Greetings, Andres Freund
pgsql-hackers by date: