Re: md.c vs elog.c vs smgrreleaseall() in barrier - Mailing list pgsql-hackers

From Andres Freund
Subject Re: md.c vs elog.c vs smgrreleaseall() in barrier
Date
Msg-id 4qtmksxdbbp3pb7dqmn6lnzzdv7ujnizmbqtfbwm7c25waavtk@i6iyjrhk5eh5
Whole thread Raw
In response to Re: md.c vs elog.c vs smgrreleaseall() in barrier  (Thomas Munro <thomas.munro@gmail.com>)
Responses Re: md.c vs elog.c vs smgrreleaseall() in barrier
List pgsql-hackers
Hi,

On 2025-03-14 11:31:47 +1300, Thomas Munro wrote:
> 2.  Somehow tell elog.c not to call CHECK_FOR_INTERRUPTS() instead.
> Also yuck, but at least elog.c is already the right place to clean
> state up on non-local exit...

How would that differ from HOLD_INTERRUPTS?


> 3.  Considering errfinish()'s stated goal, a sort of backstop to help
> you cancel looping message-spewing code only, I wonder if we should
> just restrict the kinds of interrupts it processes, so that it only
> calls CFI() if we're definitely throwing ERROR or FATAL?

I'm not even sure it'd be safe to ERROR out in all the relevant places...

E.g.
        /* implementation-specific initialization */
        smgrsw[reln->smgr_which].smgr_open(reln);

        /* it is not pinned yet */
        reln->pincount = 0;
        dlist_push_tail(&unpinned_relns, &reln->node);

doesn't this mean that ->pincount is uninitialized in case smgr_open() errors
out and that we'd loose track of the reln because it wasn't yet added to
unpinned_rels?

We should probably make sure it's safe against ERROR, given that there are
several paths towards that...


> Could anything else be accidentally on elog() CFIs?  Seems pretty weird?

Hm, can't quite parse this...


> > If we go for that, I can see an argument for doing that in smgr.c instead of
> > md.c, afaict any plausible smgr implementation would run into this, as the
> > smgrcloseall() is triggered from the smgr level.
> 
> Seems like maybe not a great idea to assume that future smgrs will be
> OK with being non-interruptible, if it can be avoided?

I think you'd need a fairly large surgery of smgr.c to make that viable - I
rather doubt that smgr.c itself is actually safe against interrupts.

I can see some arguable uses of smgr handling interrupts, say an smgr
implementation based on a network backed store, but you'd need rather massive
changes to actually be sure that smgr.c is called while accepting interrupts -
e.g. today sgmrwritev() will largely be called with interrupts held. Plenty
reads too.  I doubt that undoing a few HOLD_INTERRUPTS() is a meaningful part
of the necessary work.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: Changing the state of data checksums in a running cluster
Next
From: Jacob Champion
Date:
Subject: Re: ecdh support causes unnecessary roundtrips