Re: md.c vs elog.c vs smgrreleaseall() in barrier - Mailing list pgsql-hackers
From | Thomas Munro |
---|---|
Subject | Re: md.c vs elog.c vs smgrreleaseall() in barrier |
Date | |
Msg-id | CA+hUKG+HLL+Q6Q04R1UJc9fYo=z2ymZFHf0HJWat7-PdFCb+Gw@mail.gmail.com Whole thread Raw |
In response to | md.c vs elog.c vs smgrreleaseall() in barrier (Andres Freund <andres@anarazel.de>) |
Responses |
Re: md.c vs elog.c vs smgrreleaseall() in barrier
|
List | pgsql-hackers |
On Fri, Mar 14, 2025 at 5:03 AM Andres Freund <andres@anarazel.de> wrote: > If there is any LOG/DEBUG elog inside functions like mdreadv(), mdwritev(), > mdextend(), be it directly or indirectly, the functions become unsafe. The > problem is that at the end of errfinish() there is a CFI: > > /* > * Check for cancel/die interrupt first --- this is so that the user can > * stop a query emitting tons of notice or warning messages, even if it's > * in a loop that otherwise fails to check for interrupts. > */ > CHECK_FOR_INTERRUPTS(); > > that CFI in turn can call > ProcessProcSignalBarrier() -> > ProcessBarrierSmgrRelease() -> > smgrreleaseall() > > which will free the MdfdVec that was opened at the start of mdreadv() etc. > Right now I don't really see a solution other than putting > HOLD_INTERRUPTS()/RESUME_INTERRUPTS() into all those functions. Some other ideas: 1. An smgr-local smgr-release-hold flag, which would just cause ProcessBarrierSmgrReleaseAll() to return false. Main complication seems to be making sure you reset it on error, which is a bit annoying, elog.c probably needs to know how to clean it up. Yuck. 2. Somehow tell elog.c not to call CHECK_FOR_INTERRUPTS() instead. Also yuck, but at least elog.c is already the right place to clean state up on non-local exit... 3. Considering errfinish()'s stated goal, a sort of backstop to help you cancel looping message-spewing code only, I wonder if we should just restrict the kinds of interrupts it processes, so that it only calls CFI() if we're definitely throwing ERROR or FATAL? Something like CHECK_FOR_CANCEL_OR_DIE() which would enter the regular code only if it can already determine that, which would probably be defined as something like: if (INTERRUPTS_PENDING_CONDITION() && INTERRUPTS_CAN_BE_PROCESSED() && (QueryCancelPending || ProcDiePending)) ProcessInterrupts(). I realise that what I'm describing is essentially what CHECK_FOR_INTERRUPTS() used to be like, before a bunch of non-cancel, non-die stuff moved into CHECK_FOR_INTERRUPTS(), but that's probably the conditions under which that code was written. Could anything else be accidentally on elog() CFIs? Seems pretty weird? 4. Bigger refactoring of the interrupts system so you can express more precisely what kinds of stuff you want to handle. Well, we have some stuff like that coming in the nearby procsignal/interrupt refactoring thread. I don't feel enthusiastic about messing with it in the back branches, hence easier suggestion in #3. > If we go for that, I can see an argument for doing that in smgr.c instead of > md.c, afaict any plausible smgr implementation would run into this, as the > smgrcloseall() is triggered from the smgr level. Seems like maybe not a great idea to assume that future smgrs will be OK with being non-interruptible, if it can be avoided?
pgsql-hackers by date: