Thread: Interrupts vs signals

Interrupts vs signals

From

Thomas Munro

Date:

20 October 2021, 18:55:54

Hi,

I wonder if we really need signals to implement interrupts.  Given
that they are non-preemptive/cooperative (work happens at the next
CFI()), why not just use shared memory flags and latches?  That skips
a bunch of code, global variables and scary warnings about programming
in signal handlers.

I sketched out some code to try that a few months back, while
speculating about bite-sized subproblems that would come up if each
backend is, one day, a thread.

There are several other conditions that are also handled by
CHECK_FOR_INTERRUPTS(), but are not triggered by other backends
sending  signals, or are set by other signal handlers (SIGALRM,
SIGQUIT).  One idea is to convert those into "procsignals" too, for
consistency.  In the attached, they can be set (ie by the same
backend) with ProcSignalRaise(), but it's possible that in future we
might have a reason for another backend to set them too, so it seems
like a good idea to have a single system, effectively merging the
concepts of "procsignals" and "interrupts".

There are still a few more ad hoc (non-ProcSignal) uses of SIGUSR1 in
the tree.  For one thing, we don't allow the postmaster to set
latches; if we gave up on that rule, we wouldn't need the bgworker
please-signal-me thing.  Also the logical replication launcher does
the same sort of thing for no apparent reason.  Changed in the
attached -- mainly so I could demonstrate that check-world passes with
SIGUSR1 ignored.

The attached is only experiment grade code: in particular, I didn't
quite untangle the recovery conflict flags properly.  It's also doing
function calls where some kind of fast inlined magic is probably
required, and I probably have a few other details wrong, but I figured
it was good enough to demonstrate the concept.

Attachment

0001-WIP-Refactor-procsignals-and-interrupts.patch

Re: Interrupts vs signals

From

Andres Freund

Date:

20 October 2021, 19:27:48

Hi,

On 2021-10-21 07:55:54 +1300, Thomas Munro wrote:
> I wonder if we really need signals to implement interrupts.  Given
> that they are non-preemptive/cooperative (work happens at the next
> CFI()), why not just use shared memory flags and latches?  That skips
> a bunch of code, global variables and scary warnings about programming
> in signal handlers.

Depending on how you implement them, one difference could be whether / when
"slow" system calls (recv, poll, etc) are interrupted.

Another is that that signal handling provides a memory barrier in the
receiving process. For things that rarely change (like most interrupts), it
can be more efficient to move the cost of that out-of-line, instead of
incurring them on every check.

One nice thing of putting the state variables into shared memory would be that
that'd allow to see the pending interrupts of other backends for debugging
purposes.

> One idea is to convert those into "procsignals" too, for
> consistency.  In the attached, they can be set (ie by the same
> backend) with ProcSignalRaise(), but it's possible that in future we
> might have a reason for another backend to set them too, so it seems
> like a good idea to have a single system, effectively merging the
> concepts of "procsignals" and "interrupts".

This seems a bit confusing to me. For one, we need to have interrupts working
before we have a proc, IIRC. But leaving details like that aside, it just
seems a bit backwards to me. I'm on board with other backends directly setting
interrupt flags, but it seems to me that the procsignal stuff should be
"client" of the process-local interrupt infrastructure, rather than the other
way round.

> +bool
> +ProcSignalAnyPending(void)
> +{
> +    volatile ProcSignalSlot *slot = MyProcSignalSlot;
>  
> -    if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN))
> -        RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN);
> +    /* XXX make this static inline? */
> +    /* XXX point to a dummy entry instead of using NULL to avoid a branch */
> +    return slot && slot->pss_signaled;
> +}

ISTM it might be easier to make this stuff efficiently race-free if we made
this a count of pending operations.

> @@ -3131,12 +3124,13 @@ ProcessInterrupts(void)
>      /* OK to accept any interrupts now? */
>      if (InterruptHoldoffCount != 0 || CritSectionCount != 0)
>          return;
> -    InterruptPending = false;
> +    ProcSignalClearAnyPending();
> +
> +    pg_read_barrier();
>  
> -    if (ProcDiePending)
> +    if (ProcSignalConsume(PROCSIG_DIE))
>      {

I think making all of these checks into function calls isn't great. How about
making the set of pending signals a bitmask? That'd allow to efficiently check
a bunch of interrupts together and even where not, it'd just be a single test
of the mask, likely already in a register.

Greetings,

Andres Freund

Re: Interrupts vs signals

From

Thomas Munro

Date:

11 November 2021, 05:27:09

On Thu, Oct 21, 2021 at 8:27 AM Andres Freund <andres@anarazel.de> wrote:
> On 2021-10-21 07:55:54 +1300, Thomas Munro wrote:
> > I wonder if we really need signals to implement interrupts.  Given
> > that they are non-preemptive/cooperative (work happens at the next
> > CFI()), why not just use shared memory flags and latches?  That skips
> > a bunch of code, global variables and scary warnings about programming
> > in signal handlers.
>
> Depending on how you implement them, one difference could be whether / when
> "slow" system calls (recv, poll, etc) are interrupted.

Hopefully by now all such waits are implemented with latch.c facilities?

> Another is that that signal handling provides a memory barrier in the
> receiving process. For things that rarely change (like most interrupts), it
> can be more efficient to move the cost of that out-of-line, instead of
> incurring them on every check.

Agreed, but in this experiment I was trying out the idea that a memory
barrier is not really needed at all, unless you're about to go to
sleep.  We already insert one of those before a latch wait.  That is,
if we see !set->latch->is_set, we do pg_memory_barrier() and check
again, before sleeping, so your next CFI must see the flag.  For
computation loops (sort, hash, query execution, ...), I speculate that
a relaxed read of memory is fine... you'll see the flag pretty soon,
and you certainly won't be allowed to finish your computation and go
to sleep.

> One nice thing of putting the state variables into shared memory would be that
> that'd allow to see the pending interrupts of other backends for debugging
> purposes.

+1

> > One idea is to convert those into "procsignals" too, for
> > consistency.  In the attached, they can be set (ie by the same
> > backend) with ProcSignalRaise(), but it's possible that in future we
> > might have a reason for another backend to set them too, so it seems
> > like a good idea to have a single system, effectively merging the
> > concepts of "procsignals" and "interrupts".
>
> This seems a bit confusing to me. For one, we need to have interrupts working
> before we have a proc, IIRC. But leaving details like that aside, it just
> seems a bit backwards to me. I'm on board with other backends directly setting
> interrupt flags, but it seems to me that the procsignal stuff should be
> "client" of the process-local interrupt infrastructure, rather than the other
> way round.

Hmm.  Yeah, I see your point.  But I can also think of some arguments
for merging the concepts of local and shared interrupts; see below.

In this new sketch, I tried doing it the other way around.  That is,
completely removing the concept of "ProcSignal", leaving only
"Interrupts".  Initially, MyPendingInterrupts points to something
private, and once you're connected to shared memory it points to
MyProc->pending_interrupts.

> > +bool
> > +ProcSignalAnyPending(void)
> > +{
> > +     volatile ProcSignalSlot *slot = MyProcSignalSlot;
> >
> > -     if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN))
> > -             RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN);
> > +     /* XXX make this static inline? */
> > +     /* XXX point to a dummy entry instead of using NULL to avoid a branch */
> > +     return slot && slot->pss_signaled;
> > +}
>
> ISTM it might be easier to make this stuff efficiently race-free if we made
> this a count of pending operations.

Hmm, with a unified interrupt system and a bitmap it's not necessary
to have a separate flag/counter at all.

> > @@ -3131,12 +3124,13 @@ ProcessInterrupts(void)
> >       /* OK to accept any interrupts now? */
> >       if (InterruptHoldoffCount != 0 || CritSectionCount != 0)
> >               return;
> > -     InterruptPending = false;
> > +     ProcSignalClearAnyPending();
> > +
> > +     pg_read_barrier();
> >
> > -     if (ProcDiePending)
> > +     if (ProcSignalConsume(PROCSIG_DIE))
> >       {
>
> I think making all of these checks into function calls isn't great. How about
> making the set of pending signals a bitmask? That'd allow to efficiently check
> a bunch of interrupts together and even where not, it'd just be a single test
> of the mask, likely already in a register.

+1.

Some assorted notes:

1.  Aside from doing interrupts in this new way, I also have the
postmaster setting latches (!) instead of sending ad hoc SIGUSR1 here
and there.  My main reason for doing that was to be able to chase out
all reasons to register a SIGUSR1 handler, so I could prove that
check-world passes.  I like it, though. But maybe it's really a
separate topic.

2.  I moved this stuff into interrupt.{h,c}.  There is nothing left in
procsignal.c except code relating to ProcSignalBarrier.  I guess that
thing could use another name, anyway.  It's a ...
SystemInterruptBarrier?

3.  Child-level SIGINT and SIGTERM handlers probably aren't really
necessary, either: maybe the sender could do
InterruptSend(INTERRUPT_{DIE,QUERY_CANCEL}, pgprocno) instead?  But
perhaps people are attached to being able to send those signals from
external programs directly to backends.

4.  Like the above, a SIGALRM handler might need to do eg
InterruptRaise(INTERRUPT_STATEMENT_TIMEOUT).  That's a problem for
systems using spinlocks (self-deadlock against user context in
InterruptRaise()), so I'd need to come up with some flag protocol for
dinosaurs to make that safe, OR revert to having these "local only"
interrupts done with separate flags, as you were getting at earlier.

5.  The reason I prefer to put currently "local only" interrupts into
the same atomic system is that I speculate that ultimately all of the
backend-level signal handlers won't be needed.  They all fall into
three categories: (1) could be replaced with these interrupts
directly, (2) could be replaced by the new timer infrastructure that
multithreaded postgres would need to have to deliver interrupts to the
right recipients, (3) are quickdie and can be handled at the
containing process level.  Then the only signal handlers left are top
level external ones.

But perhaps you're right and I should try reintroducing separate local
interrupts for now.  I dunno, I like the simplicity of the unified
system; if only it weren't for those spinlock-backed atomics.

Attachment

0001-Refactor-interrupt-system.patch

Re: Interrupts vs signals

From

Robert Haas

Date:

11 November 2021, 14:06:01

On Thu, Nov 11, 2021 at 12:27 AM Thomas Munro <thomas.munro@gmail.com> wrote:
> > Depending on how you implement them, one difference could be whether / when
> > "slow" system calls (recv, poll, etc) are interrupted.
>
> Hopefully by now all such waits are implemented with latch.c facilities?

Do read(), write(), etc. count? Because we certainly have raw calls to
those functions in lots of places.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Interrupts vs signals

From

Andres Freund

Date:

11 November 2021, 19:50:00

Hi,

On 2021-11-11 09:06:01 -0500, Robert Haas wrote:
> On Thu, Nov 11, 2021 at 12:27 AM Thomas Munro <thomas.munro@gmail.com> wrote:
> > > Depending on how you implement them, one difference could be whether / when
> > > "slow" system calls (recv, poll, etc) are interrupted.
> >
> > Hopefully by now all such waits are implemented with latch.c facilities?
> 
> Do read(), write(), etc. count? Because we certainly have raw calls to
> those functions in lots of places.

They can count, but only when used for network sockets or pipes ("slow
devices" or whatever the posix language is). Disk IO doesn't count as that. So
I don't think it'd be a huge issue.

Greetings,

Andres Freund

Re: Interrupts vs signals

From

Robert Haas

Date:

11 November 2021, 20:24:41

On Thu, Nov 11, 2021 at 2:50 PM Andres Freund <andres@anarazel.de> wrote:
> They can count, but only when used for network sockets or pipes ("slow
> devices" or whatever the posix language is). Disk IO doesn't count as that. So
> I don't think it'd be a huge issue.

Somehow the idea that the network is a slow device and the disk a fast
one does not seem like it's necessarily accurate on modern hardware,
but I guess the spec is what it is.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Interrupts vs signals

From

Thomas Munro

Date:

11 November 2021, 20:57:38

On Fri, Nov 12, 2021 at 9:24 AM Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Nov 11, 2021 at 2:50 PM Andres Freund <andres@anarazel.de> wrote:
> > They can count, but only when used for network sockets or pipes ("slow
> > devices" or whatever the posix language is). Disk IO doesn't count as that. So
> > I don't think it'd be a huge issue.
>
> Somehow the idea that the network is a slow device and the disk a fast
> one does not seem like it's necessarily accurate on modern hardware,
> but I guess the spec is what it is.

[Somehow I managed to reply to Robert only; let me try that again,
this time to the list...]

Network filesystems have in the past been confusing because they're
both disk-like and network-like, and also slow as !@#$, which is why
there have been mount point options like "intr", "nointr" (now ignored
on Linux) to control what happens if you receive an async signal
during a sleepy read/write.  But even if you had some kind of
Deathstation 9000 that had a switch on the front panel that ignores
SA_RESTART and produces EINTR for disk I/O when a signal arrives,
PostgreSQL already doesn't work today.  Our pread() and pwrite() paths
for data and WAL don't not have a EINTR loops or
CHECK_FOR_INTERRUPTS() (we just can't take interrupts in the middle of
eg a synchronous write), so I think we'd produce an ERROR or PANIC.
So I think disk I/O is irrelevant, and network/pipe I/O is already
handled everywhere via latch.c facilities.

If there are any eg blocking reads on a socket in PostgreSQL, we
should fix that to use latch.c non-blocking techniques, because such a
place is already a place that ignores postmaster death and interrupts.
To be more precise: such a place could of course wake up for EINTR on
SIGUSR1 from procsignal.c, and that would no longer happen with my
patch, but if we're relying on that anywhere, it's dangerous and
unreliable.  If SIGUSR1 is delivered right before you enter a blocking
read(), you'll sleep waiting for the socket or whatever.  That's
precisely the problem that latch.c solves, and why it's already a bug
if there are such places.

Re: Interrupts vs signals

From

Thomas Munro

Date:

08 July 2024, 02:56:38

Here's an updated version of this patch.

The main idea is that SendProcSignal(pid, PROCSIGNAL_XXX, procno)
becomes SendInterrupt(INTERRUPT_XXX, procno), and all the pending
interrupt global variables and pss_procsignalFlags[] go away, along
with the SIGUSR1 handler.  The interrupts are compressed into a single
bitmap.  See commit message for more details.

The patch is failing on Windows CI for reasons I haven't debugged yet,
but I wanted to share what I have so far.  Work in progress!

Here is my attempt to survey the use of signals and write down what I
think we might do about them all so far, to give the context for this
patch:

https://wiki.postgresql.org/wiki/Signals

Comments, corrections, edits very welcome.

Attachment

v2-0001-Redesign-interrupts-remove-ProcSignals.patch

Re: Interrupts vs signals

From

Heikki Linnakangas

Date:

08 July 2024, 09:38:22

On 08/07/2024 05:56, Thomas Munro wrote:
> Here's an updated version of this patch.
> 
> The main idea is that SendProcSignal(pid, PROCSIGNAL_XXX, procno)
> becomes SendInterrupt(INTERRUPT_XXX, procno), and all the pending
> interrupt global variables and pss_procsignalFlags[] go away, along
> with the SIGUSR1 handler.  The interrupts are compressed into a single
> bitmap.  See commit message for more details.
> 
> The patch is failing on Windows CI for reasons I haven't debugged yet,
> but I wanted to share what I have so far.  Work in progress!
> 
> Here is my attempt to survey the use of signals and write down what I
> think we might do about them all so far, to give the context for this
> patch:
> 
> https://wiki.postgresql.org/wiki/Signals
> 
> Comments, corrections, edits very welcome.

Nice, thanks!

> Background worker state notifications are also changed from raw
> kill(SIGUSR1) to SetLatch().  That means that SetLatch() is now called
> from the postmaster.  The main purpose of including that change is to be
> able to remove procsignal_sigusr1_handler completely and set SIGUSR1 to
> SIG_IGN, and show the system working.
> 
> XXX Do we need to invent SetLatchRobust() that doesn't trust anything in
> shared memory, to be able to set latches from the postmaster?

The patch actually does both: it still does kill(SIGUSR1) and also sets 
the latch.

I think it would be nice if RegisterDynamicBackgroundWorker() had a 
"bool notify_me" argument, instead of requiring the caller to set 
"bgw_notify_pid = MyProcPid" before the call. That's a 
backwards-compatibility break, but maybe we should bite the bullet and 
do it. Or we could do this in RegisterDynamicBackgroundWorker():

if (worker->bgw_notify_pid == MyProcPid)
     worker->bgw_notify_pgprocno = MyProcNumber;

I think we can forbid setting pgw_notify_pid to anything else than 0 or 
MyProcPid.

A SetLatchRobust would be nice. Looking at SetLatch(), I don't think it 
can do any damage if you called it on a pointer to garbage, except if 
the pointer itself is bogus, then just dereferencing it an cause a 
segfault. So it would be nice to have a version specifically designed 
with that in mind. For example, it could assume that the Latch's pid is 
never legally equal to MyProcPid, because postmaster cannot own any latches.

Another approach would be to move the responsibility of background 
worker state notifications out of postmaster completely. When a new 
background worker is launched, the worker process itself could send the 
notification that it has started. And similarly, when a worker exits, it 
could send the notification just before exiting. There's a little race 
condition with exiting: if a process is waiting for the bgworker to 
exit, and launches a new worker immediately when the old one exits, 
there will be a brief period when the old and new process are alive at 
the same time. The old worker wouldn't be doing anything interesting 
anymore since it's exiting, but it still counts towards 
max_worker_processes, so launching the new process might fail because of 
hitting the limit. Maybe we should just bump up max_worker_processes. Or 
postmaster could check PMChildFlags and not count processes that have 
already deregistered from PMChildFlags towards the limit.

> -volatile uint32 InterruptHoldoffCount = 0;
> -volatile uint32 QueryCancelHoldoffCount = 0;
> -volatile uint32 CritSectionCount = 0;
> +uint32 InterruptHoldoffCount = 0;
> +uint32 QueryCancelHoldoffCount = 0;
> +uint32 CritSectionCount = 0;

I wondered if these are used in PG_TRY-CATCH blocks in a way that would 
still require volatile. I couldn't find any such usage by some quick 
grepping, so I think we're good, but I thought I'd mention it.

> +/*
> + * The set of "standard" interrupts that CHECK_FOR_INTERRUPTS() and
> + * ProcessInterrupts() handle.  These perform work that is safe to run whenever
> + * interrupts are not "held".  Other kinds of interrupts are only handled at
> + * more restricted times.
> + */
> +#define INTERRUPT_STANDARD_MASK                               \

Some interrupts are missing from this mask:

- INTERRUPT_PARALLEL_APPLY_MESSAGE
- INTERRUPT_IDLE_STATS_UPDATE_TIMEOUT
- INTERRUPT_SINVAL_CATCHUP
- INTERRUPT_NOTIFY

Is that on purpose?

> -/*
> - * Because backends sitting idle will not be reading sinval events, we
> - * need a way to give an idle backend a swift kick in the rear and make
> - * it catch up before the sinval queue overflows and forces it to go
> - * through a cache reset exercise.  This is done by sending
> - * PROCSIG_CATCHUP_INTERRUPT to any backend that gets too far behind.
> - *
> - * The signal handler will set an interrupt pending flag and will set the
> - * processes latch. Whenever starting to read from the client, or when
> - * interrupted while doing so, ProcessClientReadInterrupt() will call
> - * ProcessCatchupEvent().
> - */
> -volatile sig_atomic_t catchupInterruptPending = false;

Would be nice to move that comment somewhere else rather than remove it 
completely.

> --- a/src/backend/storage/lmgr/proc.c
> +++ b/src/backend/storage/lmgr/proc.c
> @@ -444,6 +444,10 @@ InitProcess(void)
>      OwnLatch(&MyProc->procLatch);
>      SwitchToSharedLatch();
>  
> +    /*We're now ready to accept interrupts from other processes. */
> +    pg_atomic_init_u32(&MyProc->pending_interrupts, 0);
> +    SwitchToSharedInterrupts();
> +
>      /* now that we have a proc, report wait events to shared memory */
>      pgstat_set_wait_event_storage(&MyProc->wait_event_info);
>  
> @@ -611,6 +615,9 @@ InitAuxiliaryProcess(void)
>      OwnLatch(&MyProc->procLatch);
>      SwitchToSharedLatch();
>  
> +    /* We're now ready to accept interrupts from other processes. */
> +    SwitchToSharedInterrupts();
> +
>      /* now that we have a proc, report wait events to shared memory */
>      pgstat_set_wait_event_storage(&MyProc->wait_event_info);
>  

Is there a reason for the different initialization between a regular 
backend and aux process?

> +/*
> + * Switch to shared memory interrupts.  Other backends can send interrupts
> + * to this one if they know its ProcNumber.
> + */
> +void
> +SwitchToSharedInterrupts(void)
> +{
> +    pg_atomic_fetch_or_u32(&MyProc->pending_interrupts, pg_atomic_read_u32(MyPendingInterrupts));
> +    MyPendingInterrupts = &MyProc->pending_interrupts;
> +}

Hmm, I think there's a race condition here (and similarly in 
SwitchToLocalInterrupts), if the signal handler runs between the 
pg_atomic_fetch_or_u32, and changing MyPendingInterrupts. Maybe 
something like this instead:

MyPendingInterrupts = &MyProc->pending_interrupts;
pg_memory_barrier();
pg_atomic_fetch_or_u32(&MyProc->pending_interrupts, 
pg_atomic_read_u32(LocalPendingInterrupts));

And perhaps this should also clear LocalPendingInterrupts, just to be tidy.

> @@ -138,7 +139,8 @@
>  typedef struct ProcState
>  {
>      /* procPid is zero in an inactive ProcState array entry. */
> -    pid_t        procPid;        /* PID of backend, for signaling */
> +    pid_t        procPid;        /* pid of backend */
> +    ProcNumber    pgprocno;        /* for sending interrupts */
>      /* nextMsgNum is meaningless if procPid == 0 or resetState is true. */
>      int            nextMsgNum;        /* next message number to read */
>      bool        resetState;        /* backend needs to reset its state */

We can easily remove procPid altogether now that we have pgprocno here. 
Similarly with the pid/pgprocno in ReplicationSlot and WalSndState.

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Re: Interrupts vs signals

From

Robert Haas

Date:

08 July 2024, 20:18:01

On Mon, Jul 8, 2024 at 5:38 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> Another approach would be to move the responsibility of background
> worker state notifications out of postmaster completely. When a new
> background worker is launched, the worker process itself could send the
> notification that it has started. And similarly, when a worker exits, it
> could send the notification just before exiting. There's a little race
> condition with exiting: if a process is waiting for the bgworker to
> exit, and launches a new worker immediately when the old one exits,
> there will be a brief period when the old and new process are alive at
> the same time. The old worker wouldn't be doing anything interesting
> anymore since it's exiting, but it still counts towards
> max_worker_processes, so launching the new process might fail because of
> hitting the limit. Maybe we should just bump up max_worker_processes. Or
> postmaster could check PMChildFlags and not count processes that have
> already deregistered from PMChildFlags towards the limit.

I can testify that the current system is the result of a lot of trial
and error. I'm not saying it can't be made better, but my initial
attempts at getting this to work (back in the 9.4 era) resembled what
you proposed here, were consequently a lot simpler than what we have
now, and also did not work. Race conditions like you mention here were
part of that. Another consideration is that fork() can fail, and in
that case, the process that tried to register the new background
worker needs to find out that the background worker won't ever be
starting. Yet another problem is that, even if fork() succeeds, the
new process might fail before it executes any of our code e.g. because
it seg faults very early, a case that actually happened to me -
inadvertently - while I was testing these facilities. I ended up
deciding that we can't rely on the new process to do anything until
it's given us some signal that it is alive and able to carry out its
duties. If it dies before telling us that, or never starts in the
first place, we have to have some other way of finding that out, and
it's difficult to see how that can happen without postmaster
involvement.

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: Interrupts vs signals

From

Thomas Munro

Date:

10 July 2024, 06:48:48

On Mon, Jul 8, 2024 at 9:38 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> The patch actually does both: it still does kill(SIGUSR1) and also sets
> the latch.

Yeah, I had some ideas about supporting old extension code that really
wanted a SIGUSR1, but on reflection, the only reason anyone ever wants
that is so that sigusr1_handler can SetLatch(), which pairs with
WaitLatch() in WaitForBackgroundWorker*().  Let's go all the way and
assume that.

> I think it would be nice if RegisterDynamicBackgroundWorker() had a
> "bool notify_me" argument, instead of requiring the caller to set
> "bgw_notify_pid = MyProcPid" before the call. That's a
> backwards-compatibility break, but maybe we should bite the bullet and
> do it. Or we could do this in RegisterDynamicBackgroundWorker():
>
> if (worker->bgw_notify_pid == MyProcPid)
>      worker->bgw_notify_pgprocno = MyProcNumber;
>
> I think we can forbid setting pgw_notify_pid to anything else than 0 or
> MyProcPid.

Another idea: we could keep the bgw_notify_pid field around for a
while, documented as unused and due to be removed in future.  We could
automatically capture the caller's proc number.  So you get latch
wakeups by default, which I expect many people want, and most people
could cope with even if they don't want them.  If you really really
don't want them, you could set a new flag BGW_NO_NOTIFY.

I have now done this part of the change in a new first patch.  This
particular use of kill(SIGUSR1) is separate from the ProcSignal
removal, it's just that it relies on ProcSignal's handler's default
action of calling SetLatch().  It's needed so the ProcSignal-ectomy
can fully delete sigusr1_handler(), but it's not directly the same
thing, so it seemed good to split the patch.

> A SetLatchRobust would be nice. Looking at SetLatch(), I don't think it
> can do any damage if you called it on a pointer to garbage, except if
> the pointer itself is bogus, then just dereferencing it an cause a
> segfault. So it would be nice to have a version specifically designed
> with that in mind. For example, it could assume that the Latch's pid is
> never legally equal to MyProcPid, because postmaster cannot own any latches.

Yeah I'm starting to think that all we need to do here is range-check
the proc number.  Here's a version that adds:
ProcSetLatch(proc_number).  Another idea would be for SetLatch(latch)
to sanitise the address of a latch, ie its offset and range.

What the user really wants to be able to do with this bgworker API, I
think, is wait for a given a handle, which could find a condition
variable + generation in the slot, so that we don't have to register
any proc numbers anywhere until we're actually waiting.  But *clearly*
the postmaster can't use the condition variable API without risking
following corrupted pointers in shared memory.  Whereas AFAICT
ProcSetLatch() from the patched postmaster can't really be corrupted
in any new way that it couldn't already be corrupted in master (where
it runs in the target process), if we're just a bit paranoid about how
we find our way to the latch.

Receiving latch wakeups in the postmaster might be another question,
but I don't think we need to confront that question just yet.

> > -volatile uint32 InterruptHoldoffCount = 0;
> > -volatile uint32 QueryCancelHoldoffCount = 0;
> > -volatile uint32 CritSectionCount = 0;
> > +uint32 InterruptHoldoffCount = 0;
> > +uint32 QueryCancelHoldoffCount = 0;
> > +uint32 CritSectionCount = 0;
>
> I wondered if these are used in PG_TRY-CATCH blocks in a way that would
> still require volatile. I couldn't find any such usage by some quick
> grepping, so I think we're good, but I thought I'd mention it.

Hmm.  Still thinking about this.

> > +/*
> > + * The set of "standard" interrupts that CHECK_FOR_INTERRUPTS() and
> > + * ProcessInterrupts() handle.  These perform work that is safe to run whenever
> > + * interrupts are not "held".  Other kinds of interrupts are only handled at
> > + * more restricted times.
> > + */
> > +#define INTERRUPT_STANDARD_MASK                                                         \
>
> Some interrupts are missing from this mask:
>
> - INTERRUPT_PARALLEL_APPLY_MESSAGE

Oops, that one ^ is a rebasing mistake.  I wrote the ancestor of this
patch in 2021, and that new procsignal arrived in 2023, and I put the
code in to handle it, but I forgot to add it to the mask, which gives
me an idea (see below)...

> - INTERRUPT_IDLE_STATS_UPDATE_TIMEOUT
> - INTERRUPT_SINVAL_CATCHUP
> - INTERRUPT_NOTIFY
>
> Is that on purpose?

INTERRUPT_SINVAL_CATCHUP and INTERRUPT_NOTIFY are indeed handled
differently on purpose.  In master, they don't set InterruptPending,
and they are not handled by regular CHECK_FOR_INTERRUPTS() sites, but
in the patch they still need a bit in pending_interrupts, and that is
what that mask hides from CHECK_FOR_INTERRUPTS().  They are checked
explicitly in ProcessClientReadInterrupt().  I think the idea is that
we can't handle sinval at random places because that might create
dangling pointers to cached objects where we don't expect them, and we
can't emit NOTIFY-related protocol messages at random times either.

There is something a little funky about _IDLE_STATS_UPDATE_TIMEOUT,
though.  It has a different scheme for running only when idle, where
if it opts not to do anything, it doesn't consume the interrupt (a
later CFI() will, but the latch is not set so we might sleep).  I was
confused by that.  I think I have changed to be more faithful to
master's behaviour now.

Hmm, a better terminology for the interupts that CFI handles would be
s/standard/regular/, so I have changed that.

New idea: it would be less error-prone if we instead had a mask of
these special cases, of which there are now only two.  Tried that way!

> > -/*
> > - * Because backends sitting idle will not be reading sinval events, we
> > - * need a way to give an idle backend a swift kick in the rear and make
> > - * it catch up before the sinval queue overflows and forces it to go
> > - * through a cache reset exercise.  This is done by sending
> > - * PROCSIG_CATCHUP_INTERRUPT to any backend that gets too far behind.
> > - *
> > - * The signal handler will set an interrupt pending flag and will set the
> > - * processes latch. Whenever starting to read from the client, or when
> > - * interrupted while doing so, ProcessClientReadInterrupt() will call
> > - * ProcessCatchupEvent().
> > - */
> > -volatile sig_atomic_t catchupInterruptPending = false;
>
> Would be nice to move that comment somewhere else rather than remove it
> completely.

OK, I moved it to the top of ProcessCatchupInterrupt().

> > --- a/src/backend/storage/lmgr/proc.c
> > +++ b/src/backend/storage/lmgr/proc.c
> > @@ -444,6 +444,10 @@ InitProcess(void)
> >       OwnLatch(&MyProc->procLatch);
> >       SwitchToSharedLatch();
> >
> > +     /*We're now ready to accept interrupts from other processes. */
> > +     pg_atomic_init_u32(&MyProc->pending_interrupts, 0);
> > +     SwitchToSharedInterrupts();
> > +
> >       /* now that we have a proc, report wait events to shared memory */
> >       pgstat_set_wait_event_storage(&MyProc->wait_event_info);
> >
> > @@ -611,6 +615,9 @@ InitAuxiliaryProcess(void)
> >       OwnLatch(&MyProc->procLatch);
> >       SwitchToSharedLatch();
> >
> > +     /* We're now ready to accept interrupts from other processes. */
> > +     SwitchToSharedInterrupts();
> > +
> >       /* now that we have a proc, report wait events to shared memory */
> >       pgstat_set_wait_event_storage(&MyProc->wait_event_info);
> >
>
> Is there a reason for the different initialization between a regular
> backend and aux process?

No.  But I thought about something else to fix here.  Really we don't
want to switch to shared interrupts until we are ready for CFI() to do
stuff.  I think that should probably be at the places where master
unblocks signals.  Otherwise, for example, if someone sends you an
interrupt while you're starting up, something as innocent as
elog(DEBUG, ...), which reaches CFI(), might try to do things for
which the infrastructure is not yet fully set up, for example
INTERRUPT_BARRIER.

Not done yet, but wanted to share this new version.

> > +/*
> > + * Switch to shared memory interrupts.  Other backends can send interrupts
> > + * to this one if they know its ProcNumber.
> > + */
> > +void
> > +SwitchToSharedInterrupts(void)
> > +{
> > +     pg_atomic_fetch_or_u32(&MyProc->pending_interrupts, pg_atomic_read_u32(MyPendingInterrupts));
> > +     MyPendingInterrupts = &MyProc->pending_interrupts;
> > +}
>
> Hmm, I think there's a race condition here (and similarly in
> SwitchToLocalInterrupts), if the signal handler runs between the
> pg_atomic_fetch_or_u32, and changing MyPendingInterrupts. Maybe
> something like this instead:
>
> MyPendingInterrupts = &MyProc->pending_interrupts;
> pg_memory_barrier();
> pg_atomic_fetch_or_u32(&MyProc->pending_interrupts,
> pg_atomic_read_u32(LocalPendingInterrupts));

Yeah, right, done.

> And perhaps this should also clear LocalPendingInterrupts, just to be tidy.

I used atomic_exchange() to read and clear the bits in one go.

> > @@ -138,7 +139,8 @@
> >  typedef struct ProcState
> >  {
> >       /* procPid is zero in an inactive ProcState array entry. */
> > -     pid_t           procPid;                /* PID of backend, for signaling */
> > +     pid_t           procPid;                /* pid of backend */
> > +     ProcNumber      pgprocno;               /* for sending interrupts */
> >       /* nextMsgNum is meaningless if procPid == 0 or resetState is true. */
> >       int                     nextMsgNum;             /* next message number to read */
> >       bool            resetState;             /* backend needs to reset its state */
>
> We can easily remove procPid altogether now that we have pgprocno here.

Since other things access those values, I propose to remove them in
separate patches.

> Similarly with the pid/pgprocno in ReplicationSlot and WalSndState.

Same.  Those pids show up in user interfaces, so I think they should
be handled in separate patches.

Note to self: I need to change some pgprocno to proc_number...

The next problems to remove are, I think, the various SIGUSR2, SIGINT,
SIGTERM signals sent by the postmaster.  These should clearly become
SendInterrupt() or ProcSetLatch().  The problem here is that the
postmaster doesn't have the proc numbers yet.  One idea is to teach
the postmaster to assign them!  Not explored yet.

This version is passing on Windows.  I'll create a CF entry.  Still
work in progress!

Attachment

Re: Interrupts vs signals

From

Heikki Linnakangas

Date:

24 July 2024, 12:57:58

On 10/07/2024 09:48, Thomas Munro wrote:
> On Mon, Jul 8, 2024 at 9:38 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>> The patch actually does both: it still does kill(SIGUSR1) and also sets
>> the latch.
> 
> Yeah, I had some ideas about supporting old extension code that really
> wanted a SIGUSR1, but on reflection, the only reason anyone ever wants
> that is so that sigusr1_handler can SetLatch(), which pairs with
> WaitLatch() in WaitForBackgroundWorker*().  Let's go all the way and
> assume that.

+1

>> I think it would be nice if RegisterDynamicBackgroundWorker() had a
>> "bool notify_me" argument, instead of requiring the caller to set
>> "bgw_notify_pid = MyProcPid" before the call. That's a
>> backwards-compatibility break, but maybe we should bite the bullet and
>> do it. Or we could do this in RegisterDynamicBackgroundWorker():
>>
>> if (worker->bgw_notify_pid == MyProcPid)
>>       worker->bgw_notify_pgprocno = MyProcNumber;
>>
>> I think we can forbid setting pgw_notify_pid to anything else than 0 or
>> MyProcPid.
> 
> Another idea: we could keep the bgw_notify_pid field around for a
> while, documented as unused and due to be removed in future.  We could
> automatically capture the caller's proc number.  So you get latch
> wakeups by default, which I expect many people want, and most people
> could cope with even if they don't want them.  If you really really
> don't want them, you could set a new flag BGW_NO_NOTIFY.

Ok. I was going to say that it feels excessive to change the default 
like that. However, searching for RegisterDynamicBackgroundWorker() in 
github, I can't actually find any callers that don't set pg_notify_pid. 
So yeah, make sense.

> I have now done this part of the change in a new first patch.  This
> particular use of kill(SIGUSR1) is separate from the ProcSignal
> removal, it's just that it relies on ProcSignal's handler's default
> action of calling SetLatch().  It's needed so the ProcSignal-ectomy
> can fully delete sigusr1_handler(), but it's not directly the same
> thing, so it seemed good to split the patch.

PostmasterMarkPIDForWorkerNotify() is now unused. Which means that 
bgworker_notify is never set, and BackgroundWorkerStopNotifications() is 
never called either.

>> A SetLatchRobust would be nice. Looking at SetLatch(), I don't think it
>> can do any damage if you called it on a pointer to garbage, except if
>> the pointer itself is bogus, then just dereferencing it an cause a
>> segfault. So it would be nice to have a version specifically designed
>> with that in mind. For example, it could assume that the Latch's pid is
>> never legally equal to MyProcPid, because postmaster cannot own any latches.
> 
> Yeah I'm starting to think that all we need to do here is range-check
> the proc number.  Here's a version that adds:
> ProcSetLatch(proc_number).  Another idea would be for SetLatch(latch)
> to sanitise the address of a latch, ie its offset and range.

Hmm, I don't think postmaster should trust ProcGlobal->allProcCount either.

> The next problems to remove are, I think, the various SIGUSR2, SIGINT,
> SIGTERM signals sent by the postmaster.  These should clearly become
> SendInterrupt() or ProcSetLatch().

+1

> The problem here is that the
> postmaster doesn't have the proc numbers yet.  One idea is to teach
> the postmaster to assign them!  Not explored yet.

I've been thinking that we should:

a) assign every child process a PGPROC entry, and make postmaster 
responsible for assigning them like you suggest. We'll need more PGPROC 
entries, because currently a process doesn't reserve one until 
authentication has happened. Or we change that behavior.

or

b) Use the pmsignal.c slot numbers for this, instead of ProcNumbers. 
Postmaster already assigns those.

I'm kind of leaning towards b) for now, because it feels like a much 
smaller patch. In the long run, it would be nice if every child process 
had a ProcNumber, though. It was a nice simplification in v17 that we 
don't have separate BackendId and ProcNumber anymore; similarly it would 
be nice to not have separate PMChildSlot and ProcNumber concepts.

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Re: Interrupts vs signals

From

Robert Haas

Date:

29 July 2024, 16:56:52

On Wed, Jul 24, 2024 at 8:58 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> a) assign every child process a PGPROC entry, and make postmaster
> responsible for assigning them like you suggest. We'll need more PGPROC
> entries, because currently a process doesn't reserve one until
> authentication has happened. Or we change that behavior.

I wonder how this works right now. Is there something that limits the
number of authentication requests that can be in flight concurrently,
or is it completely uncapped (except by machine resources)?

My first thought when I read this was that it would be bad to have to
put a limit on something that's currently unlimited. But then I
started thinking that, even if it is currently unlimited, that might
be a bug rather than a feature. If you have hundreds of pending
authentication requests, that just means you're using a lot of machine
resources on something that doesn't really help anybody. A machine
with hundreds of authentication-pending connections is possibly
getting DDOS'd and probably getting buried. You'd be better off
focusing the machine's limited resources on the already-established
connections and a more limited number of new connection attempts. If
you accept so many connection attempts that you don't actually have
enough memory/CPU/kernel scheduling firepower to complete the
authentication process with any of them, it does nobody any good.

I'm not sure what's best to do here; just thinking out loud.

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: Interrupts vs signals

From

Heikki Linnakangas

Date:

29 July 2024, 17:18:22

On 29/07/2024 19:56, Robert Haas wrote:
> On Wed, Jul 24, 2024 at 8:58 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>> a) assign every child process a PGPROC entry, and make postmaster
>> responsible for assigning them like you suggest. We'll need more PGPROC
>> entries, because currently a process doesn't reserve one until
>> authentication has happened. Or we change that behavior.
> 
> I wonder how this works right now. Is there something that limits the
> number of authentication requests that can be in flight concurrently,
> or is it completely uncapped (except by machine resources)?

> My first thought when I read this was that it would be bad to have to
> put a limit on something that's currently unlimited. But then I
> started thinking that, even if it is currently unlimited, that might
> be a bug rather than a feature. If you have hundreds of pending
> authentication requests, that just means you're using a lot of machine
> resources on something that doesn't really help anybody. A machine
> with hundreds of authentication-pending connections is possibly
> getting DDOS'd and probably getting buried. You'd be better off
> focusing the machine's limited resources on the already-established
> connections and a more limited number of new connection attempts. If
> you accept so many connection attempts that you don't actually have
> enough memory/CPU/kernel scheduling firepower to complete the
> authentication process with any of them, it does nobody any good.
> 
> I'm not sure what's best to do here; just thinking out loud.

Yes, there's a limit, roughly 2x max_connections. see 
MaxLivePostmasterChildren().

There's another issue with that that I was about to post in another 
thread, but here goes: the MaxLivePostmasterChildren() limit is shared 
by all regular backends, bgworkers and autovacuum workers. If you open a 
lot of TCP connections to postmaster and don't send anything to the 
server, you exhaust those slots, and the server won't be able to start 
any autovacuum workers or background workers either. That's not great. I 
started to work on approach b), with separate pools of slots for 
different kinds of child processes, which fixes that. Stay tuned for a 
patch.

In addition to that, you can have an unlimited number of "dead-end" 
backends, which are doomed to just respond with "sorry, too many 
clients" error. The only limit on those is the amount of resources 
needed for all the processes and a little memory to track them.

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Re: Interrupts vs signals

From

Tom Lane

Date:

29 July 2024, 17:24:21

Robert Haas <robertmhaas@gmail.com> writes:
> I wonder how this works right now. Is there something that limits the
> number of authentication requests that can be in flight concurrently,
> or is it completely uncapped (except by machine resources)?

The former.  IIRC, the postmaster won't spawn more than 2X max_connections
subprocesses (don't recall the exact limit, but it's around there).

            regards, tom lane

Re: Interrupts vs signals

From

Robert Haas

Date:

29 July 2024, 18:05:18

On Mon, Jul 29, 2024 at 1:24 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > I wonder how this works right now. Is there something that limits the
> > number of authentication requests that can be in flight concurrently,
> > or is it completely uncapped (except by machine resources)?
>
> The former.  IIRC, the postmaster won't spawn more than 2X max_connections
> subprocesses (don't recall the exact limit, but it's around there).

Hmm. Not to sidetrack this thread too much, but multiplying by two
doesn't really sound like the right idea to me. The basic idea
articulated in the comment for canAcceptConnections() makes sense:
some backends might fail authentication, or might be about to exit, so
it makes sense to allow for some slop. But 2X is a lot of slop even on
a machine with the default max_connections=100, and people with
connection management problems are likely to be running with
max_connections=500 or max_connections=900 or even (insanely)
max_connections=2000. Continuing with a connection attempt because we
think that hundreds or thousands of connections that are ahead of us
in the queue might clear out of the way before we need a PGPROC is not
a good bet.

I wonder if we ought to restrict this to a small, flat value, like say
50, or by a new GUC that defaults to such a value if a constant seems
problematic. Maybe it doesn't really matter. I'm not sure how much
work we'd save by booting out the doomed connection attempt earlier.

The unlimited number of dead-end backends doesn't sound too great
either. I don't have another idea, but presumably resisting DDOS
attacks and/or preserving resources for things that still have a
chance of working ought to take priority over printing a nicer error
message from a connection that's doomed to fail anyway.

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: Interrupts vs signals

From

Heikki Linnakangas

Date:

07 August 2024, 14:59:23

On 10/07/2024 09:48, Thomas Munro wrote:
> The next problems to remove are, I think, the various SIGUSR2, SIGINT,
> SIGTERM signals sent by the postmaster.  These should clearly become
> SendInterrupt() or ProcSetLatch().  The problem here is that the
> postmaster doesn't have the proc numbers yet.  One idea is to teach
> the postmaster to assign them!  Not explored yet.

With my latest patches on the "Refactoring postmaster's code to cleanup 
after child exit" thread [1], every postmaster child process is assigned 
a slot in the pmsignal.c array, including all the aux processes. If we 
moved 'pending_interrupts' and the process Latch to the pmsignal.c 
array, then you could send an interrupt also to a process that doesn't 
have a PGPROC entry. That includes dead-end backends, backends that are 
still in authentication, and the syslogger.

That would also make it so that the postmaster would never need to poke 
into the procarray. pmsignal.c is already designated as the limited 
piece of shared memory that is accessed by the postmaster 
(BackgroundWorkerSlots is the other exception), so it would be kind of 
nice if all the information that the postmaster needs to send an 
interrupt was there. That would mean that where you currently use a 
ProcNumber to identify a process, you'd use an index into the 
PMSignalState array instead.

I don't insist on changing that right now, I think this patch is OK as 
it is, but that might be a good next step later.

[1] 
https://www.postgresql.org/message-id/8f2118b9-79e3-4af7-b2c9-bd5818193ca4%40iki.fi

I'm also wondering about the relationship between interrupts and 
latches. Currently, SendInterrupt sets a latch to wake up the target 
process. I wonder if it should be the other way 'round? Move all the 
wakeup code, with the signalfd, the self-pipe etc to interrupt.c, and in 
SetLatch, call SendInterrupt to wake up the target process? Somehow that 
feels more natural to me, I think.

> This version is passing on Windows.  I'll create a CF entry.  Still
> work in progress!

Some comments on the patch details:

>          ereport(WARNING,
>                  (errmsg("NOTIFY queue is %.0f%% full", fillDegree * 100),
> -                 (minPid != InvalidPid ?
> -                  errdetail("The server process with PID %d is among those with 
the oldest transactions.", minPid)
> +                 (minPgprocno != INVALID_PROC_NUMBER ?
> +                  errdetail("The server process with pgprocno %d is among those 
with the oldest transactions.", minPgprocno)
>                    : 0),
> -                 (minPid != InvalidPid ?
> +                 (minPgprocno != INVALID_PROC_NUMBER ?
>                    errhint("The NOTIFY queue cannot be emptied until that process 
ends its current transaction.")
>                    : 0)));

This makes the message less useful to the user, as the ProcNumber isn't 
exposed to users. With the PID, you can do "pg_terminate_backend(pid)"

> diff --git a/src/backend/optimizer/util/pathnode.c 
b/src/backend/optimizer/util/pathnode.c
> index c42742d2c7b..bfb89049020 100644
> --- a/src/backend/optimizer/util/pathnode.c
> +++ b/src/backend/optimizer/util/pathnode.c
> @@ -18,6 +18,7 @@
>
>  #include "foreign/fdwapi.h"
>  #include "miscadmin.h"
> +#include "postmaster/interrupt.h"
>  #include "nodes/extensible.h"
>  #include "optimizer/appendinfo.h"
>  #include "optimizer/clauses.h"

misordered

> +     * duplicated interrupts later if we switch backx.

typo: backx -> back

> -    if (IdleInTransactionSessionTimeoutPending)
> +    if (ConsumeInterrupt(INTERRUPT_IDLE_TRANSACTION_TIMEOUT))
>      {
>          /*
>           * If the GUC has been reset to zero, ignore the signal.  This is
> @@ -3412,7 +3361,6 @@ ProcessInterrupts(void)
>           * interrupt.  We need to unset the flag before the injection point,
>           * otherwise we could loop in interrupts checking.
>           */
> -        IdleInTransactionSessionTimeoutPending = false;
>          if (IdleInTransactionSessionTimeout > 0)
>          {
>              INJECTION_POINT("idle-in-transaction-session-timeout");

The "We need to unset the flag.." comment is a bit out of place now, 
since the flag was already cleared by ConsumeInterrupt(). Same in the 
INTERRUPT_TRANSACTION_TIMEOUT and INTERRUPT_IDLE_SESSION_TIMEOUT 
handling after this.

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Re: Interrupts vs signals

From

Thomas Munro

Date:

26 August 2024, 02:05:46

On Sun, Aug 25, 2024 at 5:17 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> On 07/08/2024 17:59, Heikki Linnakangas wrote:
> > I'm also wondering about the relationship between interrupts and
> > latches. Currently, SendInterrupt sets a latch to wake up the target
> > process. I wonder if it should be the other way 'round? Move all the
> > wakeup code, with the signalfd, the self-pipe etc to interrupt.c, and in
> > SetLatch, call SendInterrupt to wake up the target process? Somehow that
> > feels more natural to me, I think.
>
> I explored that a little, see attached patch set. It's going towards the
> same end state as your patches, I think, but it starts from different
> angle. In a nutshell:
>
> Remove Latch as an abstraction, and replace all use of Latches with
> Interrupts. When I originally created the Latch abstraction, I imagined
> that we would have different latches for different purposes, but in
> reality, almost all code just used the general-purpose "process latch".
> this patch accepts that reality and replaces the Latch struct directly
> with the interrupt mask in PGPROC.

Some very initial reactions:

* I like it!

* This direction seems to fit quite nicely with future ideas about
asynchronous network I/O.  That may sound unrelated, but imagine that
a future version of WaitEventSet is built on Linux io_uring (or
Windows iorings, or Windows IOCP, or kqueue), and waits for the kernel
to tell you that network data has been transferred directly into a
user space buffer.  You could wait for the interrupt word to change at
the same time by treating it as a futex[1].  Then all that other stuff
-- signalfd, is_set, maybe_sleeping -- just goes away, and all we have
left is one single word in memory.  (That it is possible to do that is
not really a coincidence, as our own Mr Freund asked Mr Axboe to add
it[2].  The existing latch implementation techniques could be used as
fallbacks, but when looked at from the right angle, once you squish
all the wakeup reasons into a single word, it's all just an
implementation of a multiplexable futex with extra steps.)

* Speaking of other problems in other threads that might be solved by
this redesign, I think I see the outline of some solutions to the
problem of different classes of wakeup which you can handle at
different times, using masks.  There is a tension in a few places
where we want to handle some kind of interrupts but not others in
localised wait points, which we sort of try to address by holding
interrupts or holding cancel interrupts, but it's not satisfying and
there are some places where it doesn't work well.  Needs a lot more
thought, but a basic step would be: after old_interrupt_vector =
pg_atomic_fetch_or_u32(interrupt_vector, new_bits), if
(old_interrupt_vector & new_bits) == new_bits, then you didn't
actually change any bits, so you probably don't really need to wake
the other backend.  If someone is currently unable to handle that type
of interrupt (has ignored, ie not cleared, those bits) or is already
in the process of handling it (is currently being rescheduled but
hasn't cleared those bits yet), then you don't bother to wake it up.
Concretely, it could mean that we avoid some of the useless wakeup
storm problems we see in vacuum delays or while executing a query and
not in a good place to handle sinval wakeups, etc.  These are just
some raw thoughts, I am not sure about the bigger picture of that
topic yet.

* Archeological note on terminology: the reason almost every relation
database and all the literature uses the term "latch" for something
like our LWLocks seems to be that latches were/are one of the kinds of
system-provided mutex on IBM System/370 and modern descendents ie
z/OS.  Oracle and other systems that started as knock-offs of the IBM
System R (the original SQL system, of which DB2 is the modern heir)
continued that terminology, even though they ran on VMS or Unix or
whatever.  I would not be sad if we removed our unusual use of the
term latch.

[1] https://man7.org/linux/man-pages/man3/io_uring_prep_futex_wait.3.html
[2] https://lore.kernel.org/lkml/20230720221858.135240-1-axboe@kernel.dk/

Re: Interrupts vs signals

From

Thomas Munro

Date:

31 August 2024, 03:12:23

On Sat, Aug 31, 2024 at 10:17 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> > * This direction seems to fit quite nicely with future ideas about
> > asynchronous network I/O.  That may sound unrelated, but imagine that
> > a future version of WaitEventSet is built on Linux io_uring (or
> > Windows iorings, or Windows IOCP, or kqueue), and waits for the kernel
> > to tell you that network data has been transferred directly into a
> > user space buffer.  You could wait for the interrupt word to change at
> > the same time by treating it as a futex[1].  Then all that other stuff
> > -- signalfd, is_set, maybe_sleeping -- just goes away, and all we have
> > left is one single word in memory.  (That it is possible to do that is
> > not really a coincidence, as our own Mr Freund asked Mr Axboe to add
> > it[2].  The existing latch implementation techniques could be used as
> > fallbacks, but when looked at from the right angle, once you squish
> > all the wakeup reasons into a single word, it's all just an
> > implementation of a multiplexable futex with extra steps.)
>
> Cool

Just by the way, speaking of future tricks and the connections between
this code and other problems in other threads, I wanted to mention
that the above thought is also connected to CF #3998.  When I started
working on this, in parallel I had an experimental patch set using
futexes[1] (back then, to try out futexes, I had to patch my OS[2]
because Linux couldn't multiplex them yet, and macOS/*BSD had
something sort of vaguely similar but effectively only usable between
threads in one process).  I planned to switch to waiting directly on
the interrupt vector as a futex when bringing that idea together with
the one in this thread, but I guess I assumed we had to keep latches
too since they seemed like such a central concept in PostgreSQL.  Your
idea seems much better, the more I think about it, but maybe only the
inventor of latches could have the idea of blowing them up :-)
Anyway, in that same experiment I realised I could wake multiple
backends in one system call, which led to more discoveries about the
negative interactions between latches and locks, and begat CF #3998
(SetLatches()).   By way of excuse, unfortunately I got blocked in my
progress on interrupt vectors for a couple of release cycles by the
recovery conflict system, a set of procsignals that were not like the
others, and turned out to be broken more or less as a result.  That
was tricky to fix (CF #3615), leading to journeys into all kinds of
strange places like the regex code...

[1] https://github.com/macdice/postgres/commits/kqueue-usermem/
[2] https://reviews.freebsd.org/D37102

Re: Interrupts vs signals

From

Heikki Linnakangas

Date:

30 October 2024, 19:03:57

One remaining issue with these patches is backwards-compatibility for 
extensions:

1. Any extensions using WaitLatch, SetLatch, ResetLatch need to be 
changed to use WaitInterrupt, SetInterrupt/RaiseInterrupt, 
ResetInterrupt instead.

2. If an extension is defining its own Latch with InitLatch, rather than 
using MyLatch, it's out of luck. You could probably rewrite it using the 
INTERRUPT_GENERAL_WAKEUP, which is multiplexed like MyLatch, but it's a 
bit more effort.

We can provide backwards compatibility macros and a new facility to 
allocate custom interrupt bits. But how big of a problem is this anyway? 
In an in-person chat last week, Andres said something like "this will 
break every extension". I don't think it's quite that bad, and more 
complicated extensions need small tweaks in every major release anyway. 
But let's try to quantify that.

Analysis
--------

Joel compiled a list of all extensions that are on github: 
https://gist.github.com/joelonsql/e5aa27f8cc9bd22b8999b7de8aee9d47. I 
did "git checkout" on all of them, 1159 repositories in total (some of 
the links on the list were broken, or required authentication).

86 out of those 1159 extensions contain the word "Latch":

$ ls -d  */*  | xargs -IFOO bash -c "grep -q -r -I  Latch FOO && echo 
FOO" | wc -l
86

For comparison, in PostgreSQL v10, we added a new 'wait_event_info' 
argument to WaitLatch and WaitLatchForSocket. That breakage was on 
similar scale:

$ ls -d  */*  | xargs -IFOO bash -c "grep -q -r -I  WaitLatch FOO && 
echo FOO" | wc -l
71

Admittedly that was a long time ago, we might have different priorities now.

Deeper analysis
---------------

Most of those calls are simple calls to SetLatch(MyLatch), 
WaitLatch(MyLatch, ...) etc. that could be handled by backwards 
compatibility macros. Many of the calls don't use MyLatch, but use 
MyProc->procLatch directly:

     rc = WaitLatch(&MyProc->procLatch, ...)

That makes sense, since  MyLatch was added in 2015, while the initial 
latch code was added in 2010. Any extensions that lived during that era 
would use MyProc->procLatch for backwards-compatibility, and there's 
been no need to change that since. That could also be supported by 
backwards-compatibility macros, or we could tell the extension authors 
to search & replace MyProc->procLatch to MyLatch.

Excluding those, there are a handful of extensions that would need to be 
updated. Below is a quick analysis of the remaining results of "grep 
Latch" in all the repos:

These calls use the process latch to wake up a different process (I 
excluded some duplicated forks of the same extension):

2ndQuadrant/pglogical/pglogical_sync.c: 
SetLatch(&apply->proc->procLatch);
2ndQuadrant/pglogical/pglogical_apply.c: 
                SetLatch(&worker->proc->procLatch);
2ndQuadrant/pglogical/pglogical_worker.c: 
SetLatch(&w->proc->procLatch);
2ndQuadrant/pglogical/pglogical_worker.c: 
SetLatch(&PGLogicalCtx->supervisor->procLatch);
heterodb/pg-strom/src/gpu_cache.c:      Latch      *backend;
heterodb/pg-strom/src/gpu_cache.c:      cmd->backend = (is_async ? NULL 
: MyLatch);
heterodb/pg-strom/src/gpu_cache.c: 
SetLatch(cmd->backend);
heterodb/pg-strom/src/gpu_cache.c: 
SetLatch(cmd->backend);
citusdata/citus/src/backend/distributed/utils/maintenanced.c:   Latch 
*latch; /* pointer to the background worker's latch */
citusdata/citus/src/backend/distributed/utils/maintenanced.c: 
                SetLatch(dbData->latch);
liaorc/devel-master/src/cuda_program.c: 
SetLatch(&proc->procLatch);
okbob/generic-scheduler/dbworker.c: 
SetLatch(&dbworker->parent->procLatch);
postgrespro/lsm3/lsm3.c: 
SetLatch(&entry->merger->procLatch);
postgrespro/lsm3/lsm3.c: 
SetLatch(&entry->merger->procLatch);
postgrespro/pg_pageprep/pg_pageprep.c: 
SetLatch(&starter->procLatch);
postgrespro/pg_pageprep/pg_pageprep.c: 
SetLatch(&starter->procLatch);
postgrespro/pg_query_state/pg_query_state.c:    Latch   *caller;
postgrespro/pg_query_state/pg_query_state.c: 
SetLatch(counterpart_userid->caller);
postgrespro/pg_query_state/pg_query_state.c: 
counterpart_userid->caller = MyLatch;
timescale/timescaledb/src/loader/bgw_message_queue.c: 
SetLatch(&BackendPidGetProc(queue_get_reader(queue))->procLatch);
troels/hbase_fdw/process_communication.c:       control->latch = MyLatch;
troels/hbase_fdw/process_communication.c: 
SetLatch(control->latch);

The above could easily be fairly replaced with the new SendInterrupt() 
function. A backwards compatibility macro might be possible for some of 
these, but would be tricky for others. So I think these extensions will 
need to adapt by switching to SendInterrupt(). At quick glance, it would 
easy for all of these to do that with a small "#if PG_VERSION_NUM" block.


postgrespro/pg_wait_sampling/pg_wait_sampling.c:         * null wait 
events. So instead we make use of DisownLatch() resetting
postgrespro/pg_wait_sampling/pg_wait_sampling.c:        if (proc->pid == 
0 || proc->procLatch.owner_pid == 0 || proc->pid == MyProcPid)
postgrespro/pg_wait_sampling/pg_wait_sampling.c: 
SetLatch(pgws_collector_hdr->latch);
postgrespro/pg_wait_sampling/pg_wait_sampling.c: 
SetLatch(pgws_collector_hdr->latch);
postgrespro/pg_wait_sampling/pg_wait_sampling.h:        Latch 
   *latch;

This case is similar to the previous group: a latch is used to wake up 
another process. It peeks directly into procLatch.owner_pid however. I 
suspect this will be simpler and better after rewriting to use 
interrupts, but not sure.


darold/datalink/datalink_bgw.c:                         (errmsg("Latch 
status before waitlatch call: %d", MyProc->procLatch.is_set)));

This is a DEBUG1 message. Could probably be just removed, it doesn't 
seem very useful.

postgrespro/jsonbd/jsonbd.c:    SetLatch(&wd->latch);
postgrespro/jsonbd/jsonbd.h:    Latch                           latch;
postgrespro/jsonbd/jsonbd.h:    Latch 
launcher_latch;
postgrespro/jsonbd/jsonbd_worker.c:     SetLatch(&hdr->launcher_latch);
postgrespro/jsonbd/jsonbd_worker.c:     InitLatch(&worker_state->latch);
postgrespro/jsonbd/jsonbd_worker.c:     InitLatch(&hdr->launcher_latch);
postgrespro/jsonbd/jsonbd_worker.c:             rc = WaitLatch(MyLatch, 
WL_LATCH_SET | WL_POSTMASTER_DEATH,
postgrespro/jsonbd/jsonbd_worker.c:             ResetLatch(MyLatch);
postgrespro/jsonbd/jsonbd_worker.c:             rc = 
WaitLatch(&worker_state->latch, WL_LATCH_SET | WL_POSTMASTER_DEATH,
postgrespro/jsonbd/jsonbd_worker.c: 
ResetLatch(&worker_state->latch);
postgrespro/jsonbd/jsonbd_worker.c:     WaitLatch(&hdr->launcher_latch, 
WL_LATCH_SET, 0, PG_WAIT_EXTENSION);
postgrespro/jsonbd/jsonbd_worker.c:     WaitLatch(&hdr->launcher_latch, 
WL_LATCH_SET, 0);
postgrespro/jsonbd/jsonbd_worker.c:     ResetLatch(&hdr->launcher_latch);
timescale/timescaledb/test/src/bgw/params.h:    Latch timer_latch;
timescale/timescaledb/test/src/bgw/params.c: 
SetLatch(&wrapper->params.timer_latch);
timescale/timescaledb/test/src/bgw/params.c: 
InitLatch(&wrapper->params.timer_latch);
timescale/timescaledb/test/src/bgw/params.c: 
ResetLatch(&wrapper->params.timer_latch);
timescale/timescaledb/test/src/bgw/params.c: 
WaitLatch(&wrapper->params.timer_latch,

These two extensions, 'jsonbd' and 'timescaledb', actually use InitLatch 
to define their own latches.

I think the 'jsonbd' usage was copied from logical apply workers. At 
quick glance, it probably should be using the process latch instead of 
creating its own.

In 'timescaledb', the InitLatch call is in a mock test module. It 
probably could be written differently..



Summary
-------


We have a few options for how to deal with backwards-compatibility for 
extensions:

A) If we rip off the bandaid in one go and don't provide any 
backwards-compatibility macros, we will break 96 extensions. Most of 
them can be fixed by replacing WaitLatch, SetLatch, ResetLatch with 
corresponding WaitInterrupt, RaiseInterrupt, ResetInterrupt calls. (With 
#ifdef for compatibility with older postgres versions)

B) If we provide backwards-compatibility macros so that simple Latch
calls on MyLatch continue working, we will break about 14 extensions. 
They will need some tweaking like in option A). A bit more than the 
simple cases in option A), but nothing too difficult.

C) We could provide "forward-compatibility" macros in a separate header 
file, to make the new "SetInterrupt" etc calls work in old PostgreSQL 
versions. Many of the extensions already have a header file like this, 
see e.g. citusdata/citus/src/include/pg_version_compat.h, 
pipelinedb/pipelinedb/include/compat.h. It might actually be a good idea 
to provide a semi-official header file like this on the Postgres wiki, 
to help extension authors. It would encourage extensions to use the 
latest idioms, while still being able to compile for older versions.



I'm leaning towards option C). Let's rip off the band-aid, but provide 
documentation for how to adapt your extension code. And provide a 
forwards-compatibility header on the wiki, that extension authors can 
use to make the new Interrupt calls work against old server versions.


The second question is whether we need to provide a replacement for 
InitLatch for extensions. ISTM that none of the extensions out there 
truly need it. But maybe if we had a nicer interface for allocating 
interrupt bits for custom usage, they would actually find that handy. 
But I'm leaning towards "no" at this point. Could be added later.

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Re: Interrupts vs signals

From

Robert Haas

Date:

30 October 2024, 20:23:54

On Wed, Oct 30, 2024 at 12:03 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> 1. Any extensions using WaitLatch, SetLatch, ResetLatch need to be
> changed to use WaitInterrupt, SetInterrupt/RaiseInterrupt,
> ResetInterrupt instead.
>
> 2. If an extension is defining its own Latch with InitLatch, rather than
> using MyLatch, it's out of luck. You could probably rewrite it using the
> INTERRUPT_GENERAL_WAKEUP, which is multiplexed like MyLatch, but it's a
> bit more effort.
>
> We can provide backwards compatibility macros and a new facility to
> allocate custom interrupt bits. But how big of a problem is this anyway?
> In an in-person chat last week, Andres said something like "this will
> break every extension".

Seems hyperbolic.

> For comparison, in PostgreSQL v10, we added a new 'wait_event_info'
> argument to WaitLatch and WaitLatchForSocket. That breakage was on
> similar scale:
>
> $ ls -d  */*  | xargs -IFOO bash -c "grep -q -r -I  WaitLatch FOO &&
> echo FOO" | wc -l
> 71

However, that was also pretty easy to fix. This seems like it might be
a little more complicated.

> We have a few options for how to deal with backwards-compatibility for
> extensions:
>
> A) If we rip off the bandaid in one go and don't provide any
> backwards-compatibility macros, we will break 96 extensions. Most of
> them can be fixed by replacing WaitLatch, SetLatch, ResetLatch with
> corresponding WaitInterrupt, RaiseInterrupt, ResetInterrupt calls. (With
> #ifdef for compatibility with older postgres versions)
>
> B) If we provide backwards-compatibility macros so that simple Latch
> calls on MyLatch continue working, we will break about 14 extensions.
> They will need some tweaking like in option A). A bit more than the
> simple cases in option A), but nothing too difficult.
>
> C) We could provide "forward-compatibility" macros in a separate header
> file, to make the new "SetInterrupt" etc calls work in old PostgreSQL
> versions. Many of the extensions already have a header file like this,
> see e.g. citusdata/citus/src/include/pg_version_compat.h,
> pipelinedb/pipelinedb/include/compat.h. It might actually be a good idea
> to provide a semi-official header file like this on the Postgres wiki,
> to help extension authors. It would encourage extensions to use the
> latest idioms, while still being able to compile for older versions.
>
> I'm leaning towards option C). Let's rip off the band-aid, but provide
> documentation for how to adapt your extension code. And provide a
> forwards-compatibility header on the wiki, that extension authors can
> use to make the new Interrupt calls work against old server versions.

I don't know which of these options is best, but I don't find any of
them categorically unacceptable.

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: Interrupts vs signals

From

Jelte Fennema-Nio

Date:

05 November 2024, 23:21:35

On Mon, 4 Nov 2024 at 20:42, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> Having spent some time playing with this, I quite like option C: break
> compatibility, but provide an out-of-tree header file with
> *forward*-compatibility macros. That encourages extension authors to
> adapt to new idioms, but avoids having to sprinkle extension code with
> #if version checks to support old versions.

+1 maintaining a subset of these things for every extension is kind of a pain

> My plan is to put this on the Wiki

Why the wiki and not as a file in the repo? Seems like it would be
nice to update this file together with patches that introduce such
breakages. To be clear, I think it shouldn't be possible to #include
the file, such a forward compatibility file should always be
copy-pasted. But having it in the same place as the code seems useful,
just like we update docs together with the code.

> We could add helpers for previous
> incompatibilities in v17 and v16 too, although at quick glance I'm not
> sure what those might be.

One thing that I personally add to any extension I maintain is the new
foreach macros introduced in PG17, because they are just so much nicer
to use.

> I tested this approach by adapting pg_cron to build with these patches
> and the compatibility header, and compiling it with all supported server
> versoins. Works great, see adapt-pg_cron.patch.

Looks like a simple change indeed.

On Mon, 4 Nov 2024 at 20:42, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>
> On 31/10/2024 02:32, Michael Paquier wrote:
> > On Wed, Oct 30, 2024 at 01:23:54PM -0400, Robert Haas wrote:
> >> On Wed, Oct 30, 2024 at 12:03 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> >>> C) We could provide "forward-compatibility" macros in a separate header
> >>> file, to make the new "SetInterrupt" etc calls work in old PostgreSQL
> >>> versions. Many of the extensions already have a header file like this,
> >>> see e.g. citusdata/citus/src/include/pg_version_compat.h,
> >>> pipelinedb/pipelinedb/include/compat.h. It might actually be a good idea
> >>> to provide a semi-official header file like this on the Postgres wiki,
> >>> to help extension authors. It would encourage extensions to use the
> >>> latest idioms, while still being able to compile for older versions.
> >>>
> >>> I'm leaning towards option C). Let's rip off the band-aid, but provide
> >>> documentation for how to adapt your extension code. And provide a
> >>> forwards-compatibility header on the wiki, that extension authors can
> >>> use to make the new Interrupt calls work against old server versions.
> >>
> >> I don't know which of these options is best, but I don't find any of
> >> them categorically unacceptable.
> >
> > Looking at the compatibility macros of 0008 for the latches with
> > INTERRUPT_GENERAL_WAKEUP under latch.h, the changes are not that bad
> > to adapt to, IMO.  It reminds of f25968c49697: hard breakage, no
> > complaints I've heard of because I guess that most folks have been
> > using an in-house compatibility headers.
> >
> > A big disadvantage of B is that someone may decide to add new code in
> > core that depends on the past routines, and we'd better avoid that for
> > this new layer of APIs for interrupt handling.  A is a subset of C: do
> > a hard switch in the core code, with C mentioning a compatibility
> > layer in the wiki that does not exist in the core code.  Any of A or C
> > is OK, I would not choose B for the core backend.
> Having spent some time playing with this, I quite like option C: break
> compatibility, but provide an out-of-tree header file with
> *forward*-compatibility macros. That encourages extension authors to
> adapt to new idioms, but avoids having to sprinkle extension code with
> #if version checks to support old versions.
>
> See attached pg_version_compatibility.h header. It allows compiling code
> that uses basic SendInterrupt, RaiseInterrupt, WaitInterrupt calls with
> older server versions. My plan is to put this on the Wiki, and update it
> with similar compatibility helpers for other changes we might make in
> v18 or future versions. We could add helpers for previous
> incompatibilities in v17 and v16 too, although at quick glance I'm not
> sure what those might be.
>
> I tested this approach by adapting pg_cron to build with these patches
> and the compatibility header, and compiling it with all supported server
> versoins. Works great, see adapt-pg_cron.patch.
>
> I pushed the preliminary cleanup patches from this patch set earlier,
> only the main patches remain. Attached is a new version of those, with
> mostly comment cleanups.
>
> --
> Heikki Linnakangas
> Neon (https://neon.tech)

Re: Interrupts vs signals

From

Heikki Linnakangas

Date:

11 November 2024, 00:30:13

On 05/11/2024 22:21, Jelte Fennema-Nio wrote:
> On Mon, 4 Nov 2024 at 20:42, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>> Having spent some time playing with this, I quite like option C: break
>> compatibility, but provide an out-of-tree header file with
>> *forward*-compatibility macros. That encourages extension authors to
>> adapt to new idioms, but avoids having to sprinkle extension code with
>> #if version checks to support old versions.
> 
> +1 maintaining a subset of these things for every extension is kind of a pain
> 
>> My plan is to put this on the Wiki
> 
> Why the wiki and not as a file in the repo? Seems like it would be
> nice to update this file together with patches that introduce such
> breakages. To be clear, I think it shouldn't be possible to #include
> the file, such a forward compatibility file should always be
> copy-pasted. But having it in the same place as the code seems useful,
> just like we update docs together with the code.

I thought of the Wiki so that it could updated more casually by 
extension authors. But sure, it could be a file in the main repo too.

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Re: Interrupts vs signals

From

Jelte Fennema-Nio

Date:

14 November 2024, 15:22:13

On Sun, 10 Nov 2024 at 22:30, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> I thought of the Wiki so that it could updated more casually by
> extension authors.

Makes sense, I agree it would be nice not to have to rely on
committers to merge these changes. Especially since many extension
authors are not committers. So +1 for storing it on the wiki.

Re: Interrupts vs signals

From

Thomas Munro

Date:

19 November 2024, 07:09:10

Some feedback on v4-0001-Replace-Latches-with-Interrupts.patch:

+ latch.c -> waiteventset.c

+1

+       /*
+        * INTERRUPT_RECOVERY_WAKEUP is used to wake up startup process, to tell
+        * it that it should continue WAL replay. It's sent by WAL receiver when
+        * more WAL arrives, or when promotion is requested.
+        */
+       INTERRUPT_RECOVERY_CONTINUE,

Names don't match here.  I prefer _CONTINUE.  As for the general one,
I'm on the fence about INTERRUPT_GENERAL_WAKEUP, since wakeups aren't
necessarily involved, but I don't have a specific better idea so I'm
not objecting...  Perhaps it's more like INTERRUPT_GENERAL_NOTIFY,
except that _NOTIFY is already a well known thing, and the procsignal
patch introduces INTERRUPT_NOTIFY...

It looks like maybeSleepingOnInterrupts replaces maybe_sleeping, and
SendInterrupt() would need to read it to suppress needless kill()
calls, but doesn't yet, or am I missing something?  Hmm, I think there
are two kinds of kill() suppression that are missing compared to
master:

1.  No wakeup is needed if the bit is already set:

 SendInterrupt(InterruptType reason, ProcNumber pgprocno)
 {
        PGPROC     *proc;
+       uint32          old_interrupts;

        Assert(pgprocno != INVALID_PROC_NUMBER);
        Assert(pgprocno >= 0);
        Assert(pgprocno < ProcGlobal->allProcCount);

        proc = &ProcGlobal->allProcs[pgprocno];
-       pg_atomic_fetch_or_u32(&proc->pendingInterrupts, 1 << reason);
-       WakeupOtherProc(proc);
+       old_interrupts =
pg_atomic_fetch_or_u32(&proc->pendingInterrupts, 1 << reason);
+       if ((old_interrupts & (1 << reason)) == 0)
+               WakeupOtherProc(proc);
 }

That's equivalent to this removed latch code:

-       /* Quick exit if already set */
-       if (latch->is_set)
-               return;

... except it's atomic, which I find a lot easier to think about.

2.  Would it make sense to use a bit in pendingInterrupts as a
replacement for the old maybe_sleeping flag?  (Similar to typical
implementation of mutexes and other such things, where you adjust the
lock and atomically know whether you have to wake anyone.)  Then we we
could easily extend the check above to test that at the same time,
with the same simple race-free atomic goodness:

+       if ((old_interrupts & (WAKEME | (1 << reason))) == WAKEME)
+               WakeupOtherProc(proc);

I think the waiting side would also be tidier and simpler than what
you have: you could use atomic_fetch_or to announce you're about to
sleep, while atomically reading the interrupt bits already set up to
that moment.

More soon...

Re: Interrupts vs signals

From

Robert Haas

Date:

20 November 2024, 00:02:39

On Mon, Nov 18, 2024 at 11:09 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> Names don't match here.  I prefer _CONTINUE.  As for the general one,
> I'm on the fence about INTERRUPT_GENERAL_WAKEUP, since wakeups aren't
> necessarily involved, but I don't have a specific better idea so I'm
> not objecting...  Perhaps it's more like INTERRUPT_GENERAL_NOTIFY,
> except that _NOTIFY is already a well known thing, and the procsignal
> patch introduces INTERRUPT_NOTIFY...

INTERRUPT_GENERAL with no third word isn't out of the question, either.

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: Interrupts vs signals

From

Heikki Linnakangas

Date:

23 November 2024, 00:58:16

On 19/11/2024 06:09, Thomas Munro wrote:
> It looks like maybeSleepingOnInterrupts replaces maybe_sleeping, and
> SendInterrupt() would need to read it to suppress needless kill()
> calls, but doesn't yet, or am I missing something?

Ah yes, you're right.

> Hmm, I think there are two kinds of kill() suppression that are
> missing compared to master:
> 
> 1.  No wakeup is needed if the bit is already set:
> 
>   SendInterrupt(InterruptType reason, ProcNumber pgprocno)
>   {
>          PGPROC     *proc;
> +       uint32          old_interrupts;
> 
>          Assert(pgprocno != INVALID_PROC_NUMBER);
>          Assert(pgprocno >= 0);
>          Assert(pgprocno < ProcGlobal->allProcCount);
> 
>          proc = &ProcGlobal->allProcs[pgprocno];
> -       pg_atomic_fetch_or_u32(&proc->pendingInterrupts, 1 << reason);
> -       WakeupOtherProc(proc);
> +       old_interrupts =
> pg_atomic_fetch_or_u32(&proc->pendingInterrupts, 1 << reason);
> +       if ((old_interrupts & (1 << reason)) == 0)
> +               WakeupOtherProc(proc);
>   }
> 
> That's equivalent to this removed latch code:
> 
> -       /* Quick exit if already set */
> -       if (latch->is_set)
> -               return;
> 
> ... except it's atomic, which I find a lot easier to think about.

+1

> 2.  Would it make sense to use a bit in pendingInterrupts as a
> replacement for the old maybe_sleeping flag?  (Similar to typical
> implementation of mutexes and other such things, where you adjust the
> lock and atomically know whether you have to wake anyone.)  Then we we
> could easily extend the check above to test that at the same time,
> with the same simple race-free atomic goodness:
> 
> +       if ((old_interrupts & (WAKEME | (1 << reason))) == WAKEME)
> +               WakeupOtherProc(proc);
> 
> I think the waiting side would also be tidier and simpler than what
> you have: you could use atomic_fetch_or to announce you're about to
> sleep, while atomically reading the interrupt bits already set up to
> that moment.

Hmm, so this would replace the maybeSleepingOnInterrupts bitmask I 
envisioned. Makes a lot of sense. If it's a single bit though, that 
means that you'll still get woken up by interrupts that you're not 
waiting for. Maybe that's fine. Or we could merge the 
maybeSleepingOnInterrupts and pendingInterrupts bitmasks to a single 
atomic word, so that you would have a separate "maybe sleeping" bit for 
each interrupt bit, but could still use atomic_fetch_or atomically read 
the interrupt bits and announce the sleeping.

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Re: Interrupts vs signals

From

Heikki Linnakangas

Date:

23 November 2024, 00:58:36

On 19/11/2024 23:02, Robert Haas wrote:
> On Mon, Nov 18, 2024 at 11:09 PM Thomas Munro <thomas.munro@gmail.com> wrote:
>> Names don't match here.  I prefer _CONTINUE.  As for the general one,
>> I'm on the fence about INTERRUPT_GENERAL_WAKEUP, since wakeups aren't
>> necessarily involved, but I don't have a specific better idea so I'm
>> not objecting...  Perhaps it's more like INTERRUPT_GENERAL_NOTIFY,
>> except that _NOTIFY is already a well known thing, and the procsignal
>> patch introduces INTERRUPT_NOTIFY...
> 
> INTERRUPT_GENERAL with no third word isn't out of the question, either.

I like that

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Re: Interrupts vs signals

From

Thomas Munro

Date:

02 December 2024, 10:32:33

On Sat, Nov 23, 2024 at 10:58 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> Hmm, so this would replace the maybeSleepingOnInterrupts bitmask I
> envisioned. Makes a lot of sense. If it's a single bit though, that
> means that you'll still get woken up by interrupts that you're not
> waiting for. Maybe that's fine. Or we could merge the
> maybeSleepingOnInterrupts and pendingInterrupts bitmasks to a single
> atomic word, so that you would have a separate "maybe sleeping" bit for
> each interrupt bit, but could still use atomic_fetch_or atomically read
> the interrupt bits and announce the sleeping.

I think one bit is fine for now.  At least, until we have a serious
problem with interrupts arriving when you're sleeping but not ready to
service that particular interrupt.  The 'interrupt bit already set,
don't try to wake me' stuff discussed earlier would limit the number
of useless wakeups to one, until you eventually are ready and consume
the interrupt.  The main case I can think of, if we fast forward to
the all-procsignals-become-interrupts patch (which I'll be rebasing on
top of this when the next version appears), is that you might receive
a sinval catchup request, but you might be busy running a long query.
Sinval catchup messages are only processed between queries, so you
just keep ignoring them until end of query.  I think that's fine, and
unlikely.  Do you have other cases in mind?

If there is legitimate use case for a more fine-grained maybe-sleeping
and I've been too optimistic above, I don't think we should give one
whole maybe-sleeping bit to each interrupt reason.  We only have 32
bit atomics (splinlock-based emulation of 64 bit atomics is not good
enough for this, it's not safe in SIGALRM handlers, at least not
without a lot more pain; admittedly the SIGALRM handlers should
eventually be replaced but not for a while) so if we used up two bits
for every interrupt reason we could handle only 16 interrupt reasons,
and that's already not enough.  Perhaps we could add maybe-sleeping
bits for classes of interrupt if we ever determine that one bit for
all of them isn't enough?

Re: Interrupts vs signals

From

Heikki Linnakangas

Date:

02 December 2024, 13:42:20

On 02/12/2024 09:32, Thomas Munro wrote:
> On Sat, Nov 23, 2024 at 10:58 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>> Hmm, so this would replace the maybeSleepingOnInterrupts bitmask I
>> envisioned. Makes a lot of sense. If it's a single bit though, that
>> means that you'll still get woken up by interrupts that you're not
>> waiting for. Maybe that's fine. Or we could merge the
>> maybeSleepingOnInterrupts and pendingInterrupts bitmasks to a single
>> atomic word, so that you would have a separate "maybe sleeping" bit for
>> each interrupt bit, but could still use atomic_fetch_or atomically read
>> the interrupt bits and announce the sleeping.
> 
> I think one bit is fine for now.  At least, until we have a serious
> problem with interrupts arriving when you're sleeping but not ready to
> service that particular interrupt.  The 'interrupt bit already set,
> don't try to wake me' stuff discussed earlier would limit the number
> of useless wakeups to one, until you eventually are ready and consume
> the interrupt.  The main case I can think of, if we fast forward to
> the all-procsignals-become-interrupts patch (which I'll be rebasing on
> top of this when the next version appears), is that you might receive
> a sinval catchup request, but you might be busy running a long query.
> Sinval catchup messages are only processed between queries, so you
> just keep ignoring them until end of query.  I think that's fine, and
> unlikely.  Do you have other cases in mind?

Yeah, no, I think one bit is is good enough. Let's go with that.

> If there is legitimate use case for a more fine-grained maybe-sleeping
> and I've been too optimistic above, I don't think we should give one
> whole maybe-sleeping bit to each interrupt reason.  We only have 32
> bit atomics (splinlock-based emulation of 64 bit atomics is not good
> enough for this, it's not safe in SIGALRM handlers, at least not
> without a lot more pain; admittedly the SIGALRM handlers should
> eventually be replaced but not for a while) so if we used up two bits
> for every interrupt reason we could handle only 16 interrupt reasons,
> and that's already not enough.  Perhaps we could add maybe-sleeping
> bits for classes of interrupt if we ever determine that one bit for
> all of them isn't enough?

If we run out of bits in a single pendingInterrupt words, we can have 
multiple words. SendInterrupt and ClearInterrupt would still only need 
to manipulate one word, the one holding the bit it's setting/clearing. 
WaitEventSetWait() would need to touch all of them, or at least all the 
ones that hold bits you want to wait for. That seems OK from a 
performance point of view.

I don't think we need to go there any time soon though, 32 bits should 
be enough for the use cases we've been discussing.

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Re: Interrupts vs signals

From

Andres Freund

Date:

28 January, 20:01:41

Hi,

On 2024-12-02 16:39:28 +0200, Heikki Linnakangas wrote:
> From eff8de11fbfea4e2aadce9c1d71452b0f5a1b80b Mon Sep 17 00:00:00 2001
> From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
> Date: Mon, 2 Dec 2024 15:51:54 +0200
> Subject: [PATCH v5 1/4] Replace Latches with Interrupts

Your email has only three attachements - I assume the fourth is something
local that didn't need to be shared?


> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
> 
> The Latch was a flag, with an inter-process signalling mechanism that
> allowed a process to wait for the Latch to be set. My original vision
> for Latches was that you could have different latches for different
> things that you need to wait for, but in practice, almost all code
> used one "process latch" that was shared for all wakeups. The only
> exception was the "recoveryWakeupLatch". I think the reason that we
> ended up just sharing the same latch for all wakeups is that it was
> cumbersome in practice to deal with multiple latches. For starters,
> there was no machinery for waiting for multiple latches at the same
> time. Secondly, changing the "ownership" of a latch needed extra
> locking or other coordination. Thirdly, each inter-process latch
> needed to be initialized at postmaster startup time.
> 
> This patch embraces the reality of how Latches were used, and replaces
> the Latches with a per-process interrupt mask. You can no longer
> allocate Latches in arbitrary shared memory structs and an interrupt
> is always directed at a particular process, addressed by its
> ProcNumber.  Each process has a bitmask of pending interrupts in
> PGPROC.

That doesn't really remove the need for inter-process coordination to know
what ProcNumber corresponds to what...


> This commit introduces two interrupt bits. INTERRUPT_GENERAL replaces
> the general-purpose per-process latch. All code that previously set a
> process's process latch now sets its INTERRUPT_GENERAL interrupt bit
> instead.

Half-formed-at-best thought: I wonder if we should split the "interrupt
yourself in response to a signal" cases from "another process wakes you up",
even in the initial commit. They seem rather different to me.


> This also moves the WaitEventSet functions to a different source file,
> waiteventset.c. This separates the platform-dependent code waiting and
> signalling code from the platform-independent parts.

I think this should be split into a separate commit. It's painful to verify
that code-movement didn't change anything else if there's lots of other
changes in the same commit.


> diff --git a/src/include/storage/interrupt.h b/src/include/storage/interrupt.h
> new file mode 100644
> index 0000000000..1da05369c3
> --- /dev/null
> +++ b/src/include/storage/interrupt.h

So we now have postmaster/interrupt.[ch] and storage/interrupt.h /
storage/ipc/interrupt.c.


> @@ -0,0 +1,171 @@
> +/*-------------------------------------------------------------------------
> + *
> + * interrupt.h
> + *      Interrupt handling routines.

And all of them use the same "short description".

That somehow doesn't seem optimal, although I don't really have a good
solution.



> + * "Interrupts" are a set of flags that represent conditions that should be
> + * handled at a later time.  They are roughly analogous to Unix signals,
> + * except that they are handled cooperatively by checking for them at many
> + * points in the code.
> + *
> + * Interrupt flags can be "raised" synchronously by code that wants to defer
> + * an action, or asynchronously by timer signal handlers, other signal
> + * handlers or "sent" by other backends setting them directly.

I think it might be worth explicitly stating here that multiple "arriving"
interrupts can be coalesced.


> + * Most code currently deals with the INTERRUPT_GENERAL interrupt. It is
> + * raised by any of the events checked by CHECK_FOR_INTERRUPTS(), like query
> + * cancellation or idle session timeout. Well behaved backend code performs
> + * CHECK_FOR_INTERRUPTS() periodically in long computations, and should never
> + * sleep using mechanisms other than the WaitEventSet mechanism or the more
> + * convenient WaitInterrupt/WaitSockerOrInterrupt functions (except for
> + * bounded short periods, eg LWLock waits), so they should react in good time.




> + * The "standard" set of interrupts is handled by CHECK_FOR_INTERRUPTS(), and
> + * consists of tasks that are safe to perform at most times.  They can be
> + * suppressed by HOLD_INTERRUPTS()/RESUME_INTERRUPTS().
> + *
> + *
> + * The correct pattern to wait for event(s) using INTERRUPT_GENERAL is:
> + *
> + * for (;;)
> + * {
> + *       ClearInterrupt(INTERRUPT_GENERAL);
> + *       if (work to do)
> + *           Do Stuff();
> + *       WaitInterrupt(1 << INTERRUPT_GENERAL, ...);
> + * }

I don't particularly like that there's now dozens of places that need to do
bit-shifting.

One reason I don't like it is that, for example, this should actually be 1U <<
INTERRUPT_GENERAL, and at some later point we might need it to be a 64bit
integer due to a larger number of interrupts, which will require going to all
extensions and updating them.

Perhaps WaitInterrupt should accept just a single interrupt type and
WaitEventSets should allow registering/reporting different interrupts to wait
for as different events?

Or perhaps we could introduce an interrupt mask type?


> + * To wake up the waiter, you must first set a global flag or something else
> + * that the wait loop tests in the "if (work to do)" part, and call
> + * SendInterrupt(INTERRUPT_GENERAL) *after* that. SendInterrupt is designed to
> + * return quickly if the interrupt is already set. In more complex scenarios
> + * with nested loops that can consume different events, you can define your
> + * own INTERRUPT_* flag instead of relying on INTERRUPT_GENERAL.
> + *
> + * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
> + * Portions Copyright (c) 1994, Regents of the University of California
> + *
> + * IDENTIFICATION
> + *      src/include/storage/interrupt.h
> + *
> + *-------------------------------------------------------------------------
> + */

This is really rather similar to procsignal.h. I think at the very least there
should be some mention in either file to make the delineation clearer.



> +#ifndef STORAGE_INTERRUPT_H
> +#define STORAGE_INTERRUPT_H
> +
> +#include "port/atomics.h"
> +#include "storage/procnumber.h"
> +#include "storage/waiteventset.h"
> +
> +#include <signal.h>

Do we actually need signal.h here? I think that's from latch.h, which needed
it for sig_atomic_t, but we don't use that anymore here.


> +extern PGDLLIMPORT pg_atomic_uint32 *MyPendingInterrupts;
> +
> +/*
> + * Flags in the pending interrupts bitmask. Each value represents one bit in
> + * the bitmask.
> + */
> +typedef enum

Personally I prefer if we name the objects underlying typedefs, as otherwise
some tools will label them "anonymous". And, while it doesn't matter for
enums, which we can't forward declare in our version of C, it also is required
to make forward declarations work.



> +/*
> + * Test an interrupt flag.
> + */
> +static inline bool
> +InterruptIsPending(InterruptType reason)
> +{
> +    return (pg_atomic_read_u32(MyPendingInterrupts) & (1 << reason)) != 0;
> +}

This has no memory ordering guarantees implied - is that correct?  I think
that could lead to missing interrupts in some cases.

Given functions like ClearInterrupt() are <Verb><Object>, why differ here?


> +/*
> + * Test an interrupt flag.
> + */
> +static inline bool
> +InterruptsPending(uint32 mask)
> +{
> +    return (pg_atomic_read_u32(MyPendingInterrupts) & (mask)) != 0;
> +}

It seems somewhat confusing to have two naming patterns for this and the
closely related InterruptIsPending().

Maybe IsInterruptPending() and AreInterruptsPending()? Or
IsInterruptPendingMask()?

I think I like IsInterruptPendingMask() better, because we're not actually
waiting for multiple interrupts, we're waiting for one out of multiple
possible interrupts to arrive



> +/*
> + * Clear an interrupt flag.
> + */
> +static inline void
> +ClearInterrupt(InterruptType reason)
> +{
> +    pg_atomic_fetch_and_u32(MyPendingInterrupts, ~(1 << reason));
> +}
> +
> +/*
> + * Test and clear an interrupt flag.
> + */
> +static inline bool
> +ConsumeInterrupt(InterruptType reason)
> +{
> +    if (likely(!InterruptIsPending(reason)))
> +        return false;
> +
> +    ClearInterrupt(reason);
> +
> +    return true;
> +}

I think some guidance when which of these two should be used would be good.


> +extern void RaiseInterrupt(InterruptType reason);
> +extern void SendInterrupt(InterruptType reason, ProcNumber pgprocno);
> +extern int    WaitInterrupt(uint32 interruptMask, int wakeEvents, long timeout,
> +                          uint32 wait_event_info);
> +extern int    WaitInterruptOrSocket(uint32 interruptMask, int wakeEvents, pgsocket sock,
> +                                  long timeout, uint32 wait_event_info);
> +extern void SwitchToLocalInterrupts(void);
> +extern void SwitchToSharedInterrupts(void);
> +extern void InitializeInterruptWaitSet(void);
> +
> +#endif



> @@ -4476,7 +4458,10 @@ CheckPromoteSignal(void)
>  void
>  WakeupRecovery(void)
>  {
> -    SetLatch(&XLogRecoveryCtl->recoveryWakeupLatch);
> +    ProcNumber    procno = ((volatile PROC_HDR *) ProcGlobal)->startupProc;
> +
> +    if (procno != INVALID_PROC_NUMBER)
> +        SendInterrupt(INTERRUPT_RECOVERY_CONTINUE, procno);
>  }

Not a new problem, and not really a problem for the startup process, that we
don't expect to die, but: Code like this could obviously lead to interrupts
being sent to the "former" owner of a procno.

I think either the file header of SendInterrupt() should mention that risk and
that code has to be aware of it.

Should we explicitly define the interrupt state a process has at startup?


> +/*
> + * Switch to local interrupts.  Other backends can't send interrupts to this
> + * one.  Only RaiseInterrupt() can set them, from inside this process.
> + */
> +void
> +SwitchToLocalInterrupts(void)
> +{
> +    if (MyPendingInterrupts == &LocalPendingInterrupts)
> +        return;
> +
> +    MyPendingInterrupts = &LocalPendingInterrupts;
> +
> +    /*
> +     * Make sure that SIGALRM handlers that call RaiseInterrupt() are now
> +     * seeing the new MyPendingInterrupts destination.
> +     */
> +    pg_memory_barrier();

I don't think you need a memory barrier between signal handler and normal
code.  A compiler barrier, sure, sometimes also something to protect against
partial writes. but memory barriers shouldn't be needed.


> +/*
> + * Set an interrupt flag in this backend.
> + */
> +void
> +RaiseInterrupt(InterruptType reason)

> +{
> +    uint32        old_pending;
> +
> +    old_pending = pg_atomic_fetch_or_u32(MyPendingInterrupts, 1 << reason);
> +
> +    /*
> +     * If the process is currently blocked waiting for an interrupt to arrive,
> +     * and the interrupt wasn't already pending, wake it up.
> +     */
> +    if ((old_pending & (1 << reason | 1 << SLEEPING_ON_INTERRUPTS)) == 1 << SLEEPING_ON_INTERRUPTS)

This is a somewhat hard to read line. Maybe it's worth breaking it out into
two conditions? Or use a local variable?

Greetings,

Andres Freund

Re: Interrupts vs signals

From

Andres Freund

Date:

29 January, 00:15:23

Hi,

I was chatting with Heikki about this patch and he mentioned that he recalls a
patch that did some work to unify the signal replacement, procsignal.h and
CHECK_FOR_INTERRUPTS(). Thomas, that was probably from you? Do you have a
pointer, if so?

It does seem like we're going to have to do some unification here. We have too
many different partially overlapping, partially collaborating systems here.


- procsignals - kinda like the interrupts here, but not quite. Probably can't
  just merge them 1:1 into the proposed mechanism, or we'll run out of bits
  rather soon.  I don't know if we want the relevant multiplexing to be built
  into interrupt.h or not.

  Or we ought to redesign this mechanism to deal with more than 32 types of
  interrupt.


- pmsignal.h - basically the same thing as here, except for signals sent *too*
  postmaster.

  It might or might not be required to keep this separate. There are different
  "reliability" requirements...


- CHECK_FOR_INTERRUPTS(), which uses the mechanism introduced here to react to
  signals while blocked, using RaiseInterrupt() (whereas it did SetLatch()
  before).

  A fairly simple improvement would be to use different InterruptType for
  e.g. termination and query cancellation.

  But even with this infrastructure in place, we can't *quite* get away from
  signal handlers, because we need some way to set at the very least
  InterruptPending (and perhaps ProcDiePending, QueryCancelPending etc,
  although those could be replaced with InterruptIsPending()). The
  infrastructure introduced with these patches provides neither a way to set
  those variables in response to delivery of an interrupt, nor can we
  currently set them from another backend, as they are global variables.

  We could of course do InterruptsPending(~0) in in
  INTERRUPTS_PENDING_CONDITION(), but it would require analyzing the increased
  indirection overhead as well as the "timeliness" guarantees.

  Today we don't need a memory barrier around checking InterruptPending,
  because delivery of a signal delivery (via SetLatch()) ensures that. But I
  think we would need one if we were to not send signals for
  cancellation/terminations etc. I.e. right now the overhead of delivery is
  bigger, but checking if there pending signals is cheaper.


  Another aspect of this is that we're cramming more and more code into
  ProcessInterrupts(), in a long weave of ifs.  I wonder if somewhere along
  ipc/interrupt.h there should be a facility to register callbacks to react to
  delivered interrupts.  It'd need to be a bit more than just a mapping of
  InterruptType to callback, because of things like InterruptHoldoffCount,
  CritSectionCount, QueryCancelHoldoffCount.


- timeout.c - will need a fairly big renovation for threading.  IIUC Heikki is
  thinking that we'll have a dedicated timer thread.

  It's a lot more expensive to wake up another thread than your own. A process
  local SIGALRM delivery does not require an inter processor interrupt, but
  doing a pthread_sigqueue() or whatnot does. Which would be a bit silly,
  because what most of the timeout handlers do is to set some variables to
  true and then call SetLatch().  It might not matter *that* much, because we
  don't except timeout "delivery" that frequently, but I'm also not convinced
  it won't be noticeable.

  Nearly all the timeouts we currently have could actually just specify the
  interrupt that ought to be raised, to be sent by that timeout thread. The
  one exception to that is StartupPacketTimeoutHandler() (which does
  _exit(1)).

  Without that one exception, a renovated timeout.c would not need to support
  handlers, which would be rather nice.  It's not at all obvious to me that we
  actually need StartupPacketTimeoutHandler() and process_startup_packet_die()
  actually need to do _exit(), we should be able to do non-blocking network IO
  around these parts - at least for SSL we largely do, we just don't check for
  interupts in in the right places.


Other questions:

- Can ipc/interrupts.c style interrupts be sent by postmaster? I think they
  ought to before long.


Greetings,

Andres Freund

Re: Interrupts vs signals

From

Thomas Munro

Date:

29 January, 01:04:49

On Wed, Jan 29, 2025 at 10:15 AM Andres Freund <andres@anarazel.de> wrote:
> I was chatting with Heikki about this patch and he mentioned that he recalls a
> patch that did some work to unify the signal replacement, procsignal.h and
> CHECK_FOR_INTERRUPTS(). Thomas, that was probably from you? Do you have a
> pointer, if so?

Yeah, attempts at that were included in earlier versions in this very
thread, but then Heikki came up with the
let's-just-replace-latches-completely concept and rejiggered the lower
level patches around that (which I liked and support).  I will try to
rebase/reorganise the accompanying procsignal removal part on top of
this version today, more soon...  (I had meant to do that earlier but
got a bit distracted by summer holiday season down here and some
personal stuff, sorry for the delay on that).

> It does seem like we're going to have to do some unification here. We have too
> many different partially overlapping, partially collaborating systems here.

Right, my goal from the start of this thread was always a full
unification leaving just one single system for this type of IPC, and
Heikki's latest version is a transitional point; hopefully with the
other stuff rebased on top we'll be getting pretty close to that, at
least for the procsignal part.  More soon.

Re: Interrupts vs signals

From

Heikki Linnakangas

Date:

28 February, 23:24:56

On 28/01/2025 19:01, Andres Freund wrote:
> On 2024-12-02 16:39:28 +0200, Heikki Linnakangas wrote:
>> This also moves the WaitEventSet functions to a different source file,
>> waiteventset.c. This separates the platform-dependent code waiting and
>> signalling code from the platform-independent parts.
> 
> I think this should be split into a separate commit. It's painful to verify
> that code-movement didn't change anything else if there's lots of other
> changes in the same commit.

Here's a patch set to do just that, split WaitEventSet stuff to a 
separate source file. That's actually a pretty nice separation even 
without any of the rest of the patches.

I noticed that the ShutdownLatchSupport() function is unused. The first 
patch removes it.

The second patch makes it possible to use ModifyWaitEvent() to switch 
between WL_POSTMASTER_DEATH and WL_EXIT_ON_PM_DEATH. WaitLatch() used to 
modify WaitEventSet->exit_on_postmaster_death directly, now it uses 
ModifyWaitEvent() for that. That's needed because with the final patch, 
WaitLatch() is in a different source file than WaitEventSet, so it 
cannot directly modify its field anymore.

The third patch is mechanical and moves existing code. The file header 
comments in the modified files are perhaps worth reviewing. They are 
also just existing text moved around, but there was some small decisions 
on what exactly should go where.

I'll continue working on the other parts, but these patches seems ready 
for commit already.

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Attachment

Re: Interrupts vs signals

From

Andres Freund

Date:

05 March, 22:25:19

Hi,

On 2025-02-28 22:24:56 +0200, Heikki Linnakangas wrote:
> I noticed that the ShutdownLatchSupport() function is unused. The first
> patch removes it.

Looks like that's the case since

commit 80a8f95b3bca6a80672d1766c928cda34e979112
Author: Andres Freund <andres@anarazel.de>
Date:   2021-08-13 05:49:26 -0700
 
    Remove support for background workers without BGWORKER_SHMEM_ACCESS.

Oops.

LGTM.


> The second patch makes it possible to use ModifyWaitEvent() to switch
> between WL_POSTMASTER_DEATH and WL_EXIT_ON_PM_DEATH. WaitLatch() used to
> modify WaitEventSet->exit_on_postmaster_death directly, now it uses
> ModifyWaitEvent() for that. That's needed because with the final patch,
> WaitLatch() is in a different source file than WaitEventSet, so it cannot
> directly modify its field anymore.


> @@ -505,8 +513,14 @@ WaitLatch(Latch *latch, int wakeEvents, long timeout,
>      if (!(wakeEvents & WL_LATCH_SET))
>          latch = NULL;
>      ModifyWaitEvent(LatchWaitSet, LatchWaitSetLatchPos, WL_LATCH_SET, latch);
> -    LatchWaitSet->exit_on_postmaster_death =
> -        ((wakeEvents & WL_EXIT_ON_PM_DEATH) != 0);
> +
> +    /*
> +     * Update the event set for whether WL_EXIT_ON_PM_DEATH or
> +     * WL_POSTMASTER_DEATH was requested.  This is also cheap.
> +     */

"for whether" sounds odd to me.

I'd remove the "also" in the second sentence.


> @@ -1037,15 +1051,19 @@ ModifyWaitEvent(WaitEventSet *set, int pos, uint32 events, Latch *latch)
>          (!(event->events & WL_LATCH_SET) || set->latch == latch))
>          return;
>  
> -    if (event->events & WL_LATCH_SET &&
> -        events != event->events)
> +    /* Allow switching between WL_POSTMASTER_DEATH and WL_EXIT_ON_PM_DEATH */
> +    if (event->events & WL_POSTMASTER_DEATH)

Hm, this only supports switching from WL_POSTMASTER_DEATH to
WL_EXIT_ON_PM_DEATH, not the other way round, right?


>      {
> -        elog(ERROR, "cannot modify latch event");
> +        if (events != WL_POSTMASTER_DEATH && events != WL_EXIT_ON_PM_DEATH)
> +            elog(ERROR, "cannot modify postmaster death event");
> +        set->exit_on_postmaster_death = ((events & WL_EXIT_ON_PM_DEATH) != 0);
> +        return;
>      }


> -    if (event->events & WL_POSTMASTER_DEATH)
> +    if (event->events & WL_LATCH_SET &&
> +        events != event->events)
>      {
> -        elog(ERROR, "cannot modify postmaster death event");
> +        elog(ERROR, "cannot modify latch event");
>      }

Why did you reorder this? I don't have a problem with it, I just can't quite
follow.


> The third patch is mechanical and moves existing code. The file header
> comments in the modified files are perhaps worth reviewing. They are also
> just existing text moved around, but there was some small decisions on what
> exactly should go where.
> 
> I'll continue working on the other parts, but these patches seems ready for
> commit already.

Interestingly git diff's was pretty annoying to read, even with --color-moved,
unless I used -M100 (setting the rename detection to require 100%
renamed). With "-M100 --color-moved=dimmed-zebra" it looks a lot saner.


> From 7bd0d2f98a1b9d7ad40f3f72cd1c93430c1d7cee Mon Sep 17 00:00:00 2001
> From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
> Date: Fri, 28 Feb 2025 16:42:15 +0200
> Subject: [PATCH 3/3] Split WaitEventSet functions to separate source file
> 
> latch.c now only contains the Latch related functions, which build on
> the WaitEventSet abstraction. Most of the platform-dependent stuff is
> now in waiteventset.c.


>  include $(top_srcdir)/src/backend/common.mk
> diff --git a/src/backend/storage/ipc/latch.c b/src/backend/storage/ipc/latch.c
> index ab601c748f8..aae0cf7577d 100644
> --- a/src/backend/storage/ipc/latch.c
> +++ b/src/backend/storage/ipc/latch.c

>  #include "miscadmin.h"

It's a bit weird that MyLatch is declared in miscadmin.h, but that's not this
patch's fault.



> diff --git a/src/backend/storage/ipc/waiteventset.c b/src/backend/storage/ipc/waiteventset.c
> new file mode 100644
> index 00000000000..35e836d3398

> +#include "storage/proc.h"

proc.h wasn't previously included, and it's not clear to me why it'd be needed
now? If I remove it, things still compile, and I verified it's not indirectly
included.

Perhaps you had WakeupMyProc() in proc.c earlier?


> +/*
> + * Change the event mask and, in the WL_LATCH_SET case, the latch associated
> + * with the WaitEvent.  The latch may be changed to NULL to disable the latch
> + * temporarily, and then set back to a latch later.
> + *
> + * 'pos' is the id returned by AddWaitEventToSet.
> + */
> +void
> +ModifyWaitEvent(WaitEventSet *set, int pos, uint32 events, Latch *latch)
> +{
> ...
> +
> +    /* Allow switching between WL_POSTMASTER_DEATH and WL_EXIT_ON_PM_DEATH */
> +    if (event->events & WL_POSTMASTER_DEATH)
> +    {
> +        if (events != WL_POSTMASTER_DEATH && events != WL_EXIT_ON_PM_DEATH)
> +            elog(ERROR, "cannot modify postmaster death event");
> +        set->exit_on_postmaster_death =    ((events & WL_EXIT_ON_PM_DEATH) != 0);

FWIW, --color-moved flagged this line as having changed. That confused me for
a bit, but it seems that somehow you ended up with a tab after the =. pgindent
wants to remove it too.


> +int
> +WaitEventSetWait(WaitEventSet *set, long timeout,
> +                 WaitEvent *occurred_events, int nevents,
> +                 uint32 wait_event_info)
> +{
> ...
> +        if (set->latch && !set->latch->is_set)
> +        {
> +            /* about to sleep on a latch */
> +            set->latch->maybe_sleeping = true;
> +            pg_memory_barrier();
> +            /* and recheck */
> +        }

It's a bit ugly that we modify the latch state here - but I don't think the
alternatives are really better...


> +/*
> + * Wake up my process if it's currently waiting on a WaitEventSet.

Hm - that's overstating it a bit, I think. If the WES doesn't wait on the
latch set, this won't wake the process, right?


> +/*
> + * Wake up another process if it's currently waiting.
> + */
> +void
> +WakeupOtherProc(int pid)
> +{
> +    kill(pid, SIGURG);
> +}

Dito, I think?


> diff --git a/src/include/storage/latch.h b/src/include/storage/latch.h
> index 66e7a5b7c08..5af32621c0d 100644
> --- a/src/include/storage/latch.h
> +++ b/src/include/storage/latch.h
> @@ -84,10 +84,11 @@
>   * use of any generic handler.
>   *
>   *
> - * WaitEventSets allow to wait for latches being set and additional events -
> - * postmaster dying and socket readiness of several sockets currently - at the
> - * same time.  On many platforms using a long lived event set is more
> - * efficient than using WaitLatch or WaitLatchOrSocket.
> + * See also WaitEventSets in waiteventset.h. They allow to wait for latches
> + * being set and additional events - postmaster dying and socket readiness of
> + * several sockets currently - at the same time.  On many platforms using a
> + * long lived event set is more efficient than using WaitLatch or
> + * WaitLatchOrSocket.
>   *
>   *
>   * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
> @@ -102,6 +103,7 @@
>  
>  #include <signal.h>
>  
> +#include "storage/waiteventset.h"

Perhaps worth commenting that this is included here for backward compat?


>  #include "utils/resowner.h"

I don't think the resowner.h include is still needed in latch.h


> @@ -0,0 +1,95 @@
> +/*-------------------------------------------------------------------------
> + *
> + * waiteventset.h
> + *        ppoll() / pselect() like interface for waiting for events
> + *
> + * WaitEventSets allow to wait for latches being set and additional events -
> + * postmaster dying and socket readiness of several sockets currently - at the
> + * same time.  On many platforms using a long lived event set is more
> + * efficient than using WaitInterrupt or WaitInterruptOrSocket.

I don't think WaitInterrupt/WaitInterruptOrSocket exist at this stage?


These minor things aside, this looks good to me.

Greetings,

Andres Freund

Re: Interrupts vs signals

From

Heikki Linnakangas

Date:

06 March, 02:28:07

On 05/03/2025 21:25, Andres Freund wrote:
> On 2025-02-28 22:24:56 +0200, Heikki Linnakangas wrote:
>> The second patch makes it possible to use ModifyWaitEvent() to switch
>> between WL_POSTMASTER_DEATH and WL_EXIT_ON_PM_DEATH. WaitLatch() used to
>> modify WaitEventSet->exit_on_postmaster_death directly, now it uses
>> ModifyWaitEvent() for that. That's needed because with the final patch,
>> WaitLatch() is in a different source file than WaitEventSet, so it cannot
>> directly modify its field anymore.
> 
> 
>> @@ -505,8 +513,14 @@ WaitLatch(Latch *latch, int wakeEvents, long timeout,
>>       if (!(wakeEvents & WL_LATCH_SET))
>>           latch = NULL;
>>       ModifyWaitEvent(LatchWaitSet, LatchWaitSetLatchPos, WL_LATCH_SET, latch);
>> -    LatchWaitSet->exit_on_postmaster_death =
>> -        ((wakeEvents & WL_EXIT_ON_PM_DEATH) != 0);
>> +
>> +    /*
>> +     * Update the event set for whether WL_EXIT_ON_PM_DEATH or
>> +     * WL_POSTMASTER_DEATH was requested.  This is also cheap.
>> +     */
> 
> "for whether" sounds odd to me.
> 
> I'd remove the "also" in the second sentence.

I removed the whole comment. The "also" was referring to the previous 
comment in the function just above what's shown above, but now that I 
look more closely, the other comment already covered the postmaster 
death event.

>> @@ -1037,15 +1051,19 @@ ModifyWaitEvent(WaitEventSet *set, int pos, uint32 events, Latch *latch)
>>           (!(event->events & WL_LATCH_SET) || set->latch == latch))
>>           return;
>>   
>> -    if (event->events & WL_LATCH_SET &&
>> -        events != event->events)
>> +    /* Allow switching between WL_POSTMASTER_DEATH and WL_EXIT_ON_PM_DEATH */
>> +    if (event->events & WL_POSTMASTER_DEATH)
> 
> Hm, this only supports switching from WL_POSTMASTER_DEATH to
> WL_EXIT_ON_PM_DEATH, not the other way round, right?

Fixed.

Checking "if (event->events & WL_POSTMASTER_DEATH)" was in fact correct, 
because in a WaitEventSet, WL_EXIT_ON_PM_DEATH is converted to 
WL_POSTMASTER_DEATH with exit_on_postmaster_death = true. But it was 
still broken because when switching from WL_EXIT_ON_PM_DEATH to 
WL_POSTMASTER_DEATH, we took the "events == event->events" quick-exit 
path before reaching this, for the same reason. I fixed that by moving 
this check before the "events == event->events" check.

>>       {
>> -        elog(ERROR, "cannot modify latch event");
>> +        if (events != WL_POSTMASTER_DEATH && events != WL_EXIT_ON_PM_DEATH)
>> +            elog(ERROR, "cannot modify postmaster death event");
>> +        set->exit_on_postmaster_death = ((events & WL_EXIT_ON_PM_DEATH) != 0);
>> +        return;
>>       }
> 
> 
>> -    if (event->events & WL_POSTMASTER_DEATH)
>> +    if (event->events & WL_LATCH_SET &&
>> +        events != event->events)
>>       {
>> -        elog(ERROR, "cannot modify postmaster death event");
>> +        elog(ERROR, "cannot modify latch event");
>>       }
> 
> Why did you reorder this? I don't have a problem with it, I just can't quite
> follow.

I wanted to keep the non-error returns together for aesthetic reasons, 
that's all. But that point is moot now, because as I mentioned above, I 
had to move the check for switching between WL_POSTMASTER_DEATH and 
WL_EXIT_ON_PM_DEATH anyway.

>> The third patch is mechanical and moves existing code. The file header
>> comments in the modified files are perhaps worth reviewing. They are also
>> just existing text moved around, but there was some small decisions on what
>> exactly should go where.
>>
>> I'll continue working on the other parts, but these patches seems ready for
>> commit already.
> 
> Interestingly git diff's was pretty annoying to read, even with --color-moved,
> unless I used -M100 (setting the rename detection to require 100%
> renamed). With "-M100 --color-moved=dimmed-zebra" it looks a lot saner.

Thanks for the tip! I didn't know about --color-moved.

>> diff --git a/src/backend/storage/ipc/waiteventset.c b/src/backend/storage/ipc/waiteventset.c
>> new file mode 100644
>> index 00000000000..35e836d3398
> 
>> +#include "storage/proc.h"
> 
> proc.h wasn't previously included, and it's not clear to me why it'd be needed
> now? If I remove it, things still compile, and I verified it's not indirectly
> included.
> 
> Perhaps you had WakeupMyProc() in proc.c earlier?

Thanks, removed.

>> +/*
>> + * Wake up my process if it's currently waiting on a WaitEventSet.
> 
> Hm - that's overstating it a bit, I think. If the WES doesn't wait on the
> latch set, this won't wake the process, right?

It does wake up the process from the sleeping syscall in 
WaitEventSetWaitBlock(). WaitEventSetWait() will re-check the wait 
conditions and go back to sleep, if it's not waiting for the latch set.

I think the problem here is that "waiting on a WaitEventSet" is 
ambiguous. I'll change it to "... if it's currently sleeping in 
WaitEventSetWaitBlock()".

>> diff --git a/src/include/storage/latch.h b/src/include/storage/latch.h
>> index 66e7a5b7c08..5af32621c0d 100644
>> --- a/src/include/storage/latch.h
>> +++ b/src/include/storage/latch.h
>> @@ -84,10 +84,11 @@
>>    * use of any generic handler.
>>    *
>>    *
>> - * WaitEventSets allow to wait for latches being set and additional events -
>> - * postmaster dying and socket readiness of several sockets currently - at the
>> - * same time.  On many platforms using a long lived event set is more
>> - * efficient than using WaitLatch or WaitLatchOrSocket.
>> + * See also WaitEventSets in waiteventset.h. They allow to wait for latches
>> + * being set and additional events - postmaster dying and socket readiness of
>> + * several sockets currently - at the same time.  On many platforms using a
>> + * long lived event set is more efficient than using WaitLatch or
>> + * WaitLatchOrSocket.
>>    *
>>    *
>>    * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
>> @@ -102,6 +103,7 @@
>>   
>>   #include <signal.h>
>>   
>> +#include "storage/waiteventset.h"
> 
> Perhaps worth commenting that this is included here for backward compat?

Hmm. You're right that it's not strictly required here. In practice 
though, all callers of WaitLatch() will need to include it, because the 
WL_* argument flags are defined there. I'm not sure if we have a policy 
on that kind of convenience #includes. I'll add a comment.

>> @@ -0,0 +1,95 @@
>> +/*-------------------------------------------------------------------------
>> + *
>> + * waiteventset.h
>> + *        ppoll() / pselect() like interface for waiting for events
>> + *
>> + * WaitEventSets allow to wait for latches being set and additional events -
>> + * postmaster dying and socket readiness of several sockets currently - at the
>> + * same time.  On many platforms using a long lived event set is more
>> + * efficient than using WaitInterrupt or WaitInterruptOrSocket.
> 
> I don't think WaitInterrupt/WaitInterruptOrSocket exist at this stage?

Fixed.

> These minor things aside, this looks good to me.

Thanks! Committed with those fixes and some minor WIN32 build fixes.

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Re: Interrupts vs signals

From

Heikki Linnakangas

Date:

06 March, 03:47:38

Here's a new patch set. It includes the previous work, and also goes the 
whole hog and replaces procsignals and many other signalling with 
interrupts. It's based on Thomas's 
v3-0002-Redesign-interrupts-remove-ProcSignals.patch much earlier in 
this thread.

That's included as a separate patch in the patch set, on top of the 
previous ones. I like the end state, but I'm not sure if that's the most 
sensible way to commit these. It might make more sense to commit the 
procsignal->interrupt changes first, and the commit to remove latches 
second. Or just commit all in one giant commit. Not sure.

In any case, I hope this split is easier to review at least than just 
squashing them all tgoether...

I haven't addressed all your comments yet, and there are some TODO/FIXME 
comments. More details below.

On 28/01/2025 19:01, Andres Freund wrote:
> On 2024-12-02 16:39:28 +0200, Heikki Linnakangas wrote:
>>  From eff8de11fbfea4e2aadce9c1d71452b0f5a1b80b Mon Sep 17 00:00:00 2001
>> From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
>> Date: Mon, 2 Dec 2024 15:51:54 +0200
>> Subject: [PATCH v5 1/4] Replace Latches with Interrupts
> 
> Your email has only three attachements - I assume the fourth is something
> local that didn't need to be shared?

Yeah, it was a work-in-progress patch to convert config-reload requests 
to the interrupts, to see how it works.

>> The Latch was a flag, with an inter-process signalling mechanism that
>> allowed a process to wait for the Latch to be set. My original vision
>> for Latches was that you could have different latches for different
>> things that you need to wait for, but in practice, almost all code
>> used one "process latch" that was shared for all wakeups. The only
>> exception was the "recoveryWakeupLatch". I think the reason that we
>> ended up just sharing the same latch for all wakeups is that it was
>> cumbersome in practice to deal with multiple latches. For starters,
>> there was no machinery for waiting for multiple latches at the same
>> time. Secondly, changing the "ownership" of a latch needed extra
>> locking or other coordination. Thirdly, each inter-process latch
>> needed to be initialized at postmaster startup time.
>>
>> This patch embraces the reality of how Latches were used, and replaces
>> the Latches with a per-process interrupt mask. You can no longer
>> allocate Latches in arbitrary shared memory structs and an interrupt
>> is always directed at a particular process, addressed by its
>> ProcNumber.  Each process has a bitmask of pending interrupts in
>> PGPROC.
> 
> That doesn't really remove the need for inter-process coordination to know
> what ProcNumber corresponds to what...

Right. Was there a comment that claims it does or something, or why do 
you point that out?

>> This commit introduces two interrupt bits. INTERRUPT_GENERAL replaces
>> the general-purpose per-process latch. All code that previously set a
>> process's process latch now sets its INTERRUPT_GENERAL interrupt bit
>> instead.
> 
> Half-formed-at-best thought: I wonder if we should split the "interrupt
> yourself in response to a signal" cases from "another process wakes you up",
> even in the initial commit. They seem rather different to me.

Hmm. Some interrupts are indeed clearly only sent within the process, 
while others are sent across processes. Not sure where that distinction 
would help though. Just group them together and have comments? Or what 
did you have in mind?

There are other ways to group the interrupts. Some are harmless if 
they're sent to a wrong process. Others are not. Some are handled by 
CHECK_FOR_INTERRUTPS, others are not.

>> diff --git a/src/include/storage/interrupt.h b/src/include/storage/interrupt.h
>> new file mode 100644
>> index 0000000000..1da05369c3
>> --- /dev/null
>> +++ b/src/include/storage/interrupt.h
> 
> So we now have postmaster/interrupt.[ch] and storage/interrupt.h /
> storage/ipc/interrupt.c.

In the attached version, I renamed the existing 
postmaster/interrupt.[ch] to postmaster/interrupt_handlers.[ch]. I think 
that helps.

Now that I look at the big picture again, though, I'm thinking that they 
should perhaps be merged, so that there would be only 
postmaster/interrupt.[ch]. That's how Thomas had them in the original 
procsignal->interrupt patches. There's not much left of the old 
postmaster/interrupt.[ch], and it's all related to interrupts anyway.

>> + * The "standard" set of interrupts is handled by CHECK_FOR_INTERRUPTS(), and
>> + * consists of tasks that are safe to perform at most times.  They can be
>> + * suppressed by HOLD_INTERRUPTS()/RESUME_INTERRUPTS().
>> + *
>> + *
>> + * The correct pattern to wait for event(s) using INTERRUPT_GENERAL is:
>> + *
>> + * for (;;)
>> + * {
>> + *       ClearInterrupt(INTERRUPT_GENERAL);
>> + *       if (work to do)
>> + *           Do Stuff();
>> + *       WaitInterrupt(1 << INTERRUPT_GENERAL, ...);
>> + * }
> 
> I don't particularly like that there's now dozens of places that need to do
> bit-shifting.
> 
> One reason I don't like it is that, for example, this should actually be 1U <<
> INTERRUPT_GENERAL, and at some later point we might need it to be a 64bit
> integer due to a larger number of interrupts, which will require going to all
> extensions and updating them.
> 
> Perhaps WaitInterrupt should accept just a single interrupt type and
> WaitEventSets should allow registering/reporting different interrupts to wait
> for as different events?
> 
> Or perhaps we could introduce an interrupt mask type?

I changed the values of INTERRUPT_* to be bitmasks, so that you can do 
e.g. "INTERRUPT_GENERAL | INTERRUPT_CONFIG_RELOAD" without any 
bit-shifting. All the functions like ClearInterrupt, RaiseInterrupt, 
SendInterrupt now work with a bitmask, it doesn't have to be just a 
single flag anymore.

>> +extern PGDLLIMPORT pg_atomic_uint32 *MyPendingInterrupts;
>> +
>> +/*
>> + * Flags in the pending interrupts bitmask. Each value represents one bit in
>> + * the bitmask.
>> + */
>> +typedef enum
> 
> Personally I prefer if we name the objects underlying typedefs, as otherwise
> some tools will label them "anonymous". And, while it doesn't matter for
> enums, which we can't forward declare in our version of C, it also is required
> to make forward declarations work.

Ok

>> +/*
>> + * Test an interrupt flag.
>> + */
>> +static inline bool
>> +InterruptIsPending(InterruptType reason)
>> +{
>> +    return (pg_atomic_read_u32(MyPendingInterrupts) & (1 << reason)) != 0;
>> +}
> 
> This has no memory ordering guarantees implied - is that correct?  I think
> that could lead to missing interrupts in some cases.

I added pg_read_barrier() and pg_write_barrier() calls to these. I'm not 
100% if that's really needed, or if I got them right, though.

> Given functions like ClearInterrupt() are <Verb><Object>, why differ here?
> 
> 
>> +/*
>> + * Test an interrupt flag.
>> + */
>> +static inline bool
>> +InterruptsPending(uint32 mask)
>> +{
>> +    return (pg_atomic_read_u32(MyPendingInterrupts) & (mask)) != 0;
>> +}
> 
> It seems somewhat confusing to have two naming patterns for this and the
> closely related InterruptIsPending().
> 
> Maybe IsInterruptPending() and AreInterruptsPending()? Or
> IsInterruptPendingMask()?
> 
> I think I like IsInterruptPendingMask() better, because we're not actually
> waiting for multiple interrupts, we're waiting for one out of multiple
> possible interrupts to arrive

There's now just a single function, InterruptPending(mask), which works 
with a single bit or mask of several bits.

I agree it's not a great name, I just didn't get around to think about 
that yet.

>> @@ -4476,7 +4458,10 @@ CheckPromoteSignal(void)
>>   void
>>   WakeupRecovery(void)
>>   {
>> -    SetLatch(&XLogRecoveryCtl->recoveryWakeupLatch);
>> +    ProcNumber    procno = ((volatile PROC_HDR *) ProcGlobal)->startupProc;
>> +
>> +    if (procno != INVALID_PROC_NUMBER)
>> +        SendInterrupt(INTERRUPT_RECOVERY_CONTINUE, procno);
>>   }
> 
> Not a new problem, and not really a problem for the startup process, that we
> don't expect to die, but: Code like this could obviously lead to interrupts
> being sent to the "former" owner of a procno.
> 
> I think either the file header of SendInterrupt() should mention that risk and
> that code has to be aware of it.

Yeah, that's fair. As the patch stands, the rules are different for 
different interrupts. Some interrupts are OK to send to random backends; 
no harm done. But others, like INTERRUPT_DIE or INTERRUPT_QUERY_CANCEL, 
are not. I think we need a mechanism to reliably send an interrupt to 
the right backend, without race conditions if the backend exits 
concurrently. Holding ProcArrayLock works, but that's a pretty big hammer.

> Should we explicitly define the interrupt state a process has at startup?

Makes sense. (Not done yet, it's a TODO)

>> +/*
>> + * Set an interrupt flag in this backend.
>> + */
>> +void
>> +RaiseInterrupt(InterruptType reason)
> 
>> +{
>> +    uint32        old_pending;
>> +
>> +    old_pending = pg_atomic_fetch_or_u32(MyPendingInterrupts, 1 << reason);
>> +
>> +    /*
>> +     * If the process is currently blocked waiting for an interrupt to arrive,
>> +     * and the interrupt wasn't already pending, wake it up.
>> +     */
>> +    if ((old_pending & (1 << reason | 1 << SLEEPING_ON_INTERRUPTS)) == 1 << SLEEPING_ON_INTERRUPTS)
> 
> This is a somewhat hard to read line. Maybe it's worth breaking it out into
> two conditions? Or use a local variable?

It's a little better now that the bitshifts are gone, but still a fair 
point..

> I was chatting with Heikki about this patch and he mentioned that he recalls a
> patch that did some work to unify the signal replacement, procsignal.h and
> CHECK_FOR_INTERRUPTS(). Thomas, that was probably from you? Do you have a
> pointer, if so?
> 
> It does seem like we're going to have to do some unification here. We have too
> many different partially overlapping, partially collaborating systems here.
> 
> 
> - procsignals - kinda like the interrupts here, but not quite. Probably can't
>   just merge them 1:1 into the proposed mechanism, or we'll run out of bits
>   rather soon.  I don't know if we want the relevant multiplexing to be built
>   into interrupt.h or not.
> 
>   Or we ought to redesign this mechanism to deal with more than 32 types of
>   interrupt.

In this patch, 29 out of 32 bits are used. Yeah, that doesn't leave much 
headroom.

Some ideas for addressing that:

1. Use 64-bit atomics. Are there any architectures left where that would 
not be acceptable?

2. Use 64-bit integers, but for the actual signaling part, split it into 
two 32-bit atomic words. Waiting on the interrupts would need to check 
both, but that seems fine from a performance point of view.

3. If we need > 64 bits, things get a bit awkward as you can no longer 
have a single integer to represent a bitmask that includes all the bits. 
That makes the function signatures more ugly, you need to have two 
arguments like interruptMask1 and interruptMask2 or somehting. But 
should basically work otherwise.

4. Multiplex the interrupts so that we need fewer of them. I think all 
the recovery conflict interrupts could be merged into one, for example, 
if we had a separate bitmask somewhere else in PGPROC to indicate which 
ones are set. The limitation would be that if interrupts are multiplexed 
together into one bit, you could only wait for all or none of them.

I'd like to not have to treat these bits as a scarce resource. It'd be 
nice to use even more distinct interrupts than what's in the patch now 
for different things, just for clarity. And I'd like to make it possible 
for extensions to allocate interrupt bits too. (Not included in these 
patches yet)

I'm leaning towards 1. or 2. at the moment. 64 bits should be enough for 
a long time.

> - pmsignal.h - basically the same thing as here, except for signals sent *too*
>   postmaster.
> 
>   It might or might not be required to keep this separate. There are different
>   "reliability" requirements...

I think pmsignal.c could be rewritten to use interrupts fairly easily. I 
didn't do that yet though. For that, postmaster needs to have a PGPROC 
slot, or at least a special reserved ProcNumber, but it doesn't seem hard.

> - CHECK_FOR_INTERRUPTS(), which uses the mechanism introduced here to react to
>   signals while blocked, using RaiseInterrupt() (whereas it did SetLatch()
>   before).
> 
>   A fairly simple improvement would be to use different InterruptType for
>   e.g. termination and query cancellation.

That's done now.

>   But even with this infrastructure in place, we can't *quite* get away from
>   signal handlers, because we need some way to set at the very least
>   InterruptPending (and perhaps ProcDiePending, QueryCancelPending etc,
>   although those could be replaced with InterruptIsPending()). The
>   infrastructure introduced with these patches provides neither a way to set
>   those variables in response to delivery of an interrupt, nor can we
>   currently set them from another backend, as they are global variables.
> 
>   We could of course do InterruptsPending(~0) in in
>   INTERRUPTS_PENDING_CONDITION(), but it would require analyzing the increased
>   indirection overhead as well as the "timeliness" guarantees.
> 
>   Today we don't need a memory barrier around checking InterruptPending,
>   because delivery of a signal delivery (via SetLatch()) ensures that. But I
>   think we would need one if we were to not send signals for
>   cancellation/terminations etc. I.e. right now the overhead of delivery is
>   bigger, but checking if there pending signals is cheaper.

That's changed heavily in this latest patch. All the *Pending global 
variables are gone, they are replaced with InterruptPending() calls.

>   Another aspect of this is that we're cramming more and more code into
>   ProcessInterrupts(), in a long weave of ifs.  I wonder if somewhere along
>   ipc/interrupt.h there should be a facility to register callbacks to react to
>   delivered interrupts.  It'd need to be a bit more than just a mapping of
>   InterruptType to callback, because of things like InterruptHoldoffCount,
>   CritSectionCount, QueryCancelHoldoffCount.

+1 for the general idea, but I didn't pursue it yet.

There are some changes to CHECK_FOR_INTERRUPTS() and the holdoff counts 
here already though. The pattern for waiting for something now looks 
like this:

for (;;)
{
     CHECK_FOR_INTERRUPTS();

     /*
      * The baker will wake us up with INTERRUPT_GENERAL when the cake
      * is ready.
      */
     ClearInterrupt(INTERRUPT_GENERAL);
     if (IsTheCakeBakedYet())
         break;

     WaitInterrupt(CheckForInterruptsMask | INTERRUPT_GENERAL, ...);
}

CheckForInterruptsMask includes INTERRUPT_BARRIER, INTERRUPT_DIE, 
INTERRUPT_QUERY_CANCEL, and all the other interrupts that are handled by 
CHECK_FOR_INTERRUPTS(). However, when you call HOLD_INTERRUPTS(), 
INTERRUPT_DIE is removed from CheckForInterruptsMask.

(I'm not wedded to the name CheckForInterruptsMask. Thomas called them 
"regular interrupts" in his patch. I think I used the term "standard 
interrupts" somewhere.)

> Other questions:
> 
> - Can ipc/interrupts.c style interrupts be sent by postmaster? I think they
>   ought to before long.

Yes. The postmaster does that now to notify backends when background 
workers are launched or terminated.

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Attachment

Re: Interrupts vs signals

From

Heikki Linnakangas

Date:

06 March, 19:43:47

On 06/03/2025 02:47, Heikki Linnakangas wrote:
> Here's a new patch set. It includes the previous work, and also goes the 
> whole hog and replaces procsignals and many other signalling with 
> interrupts. It's based on Thomas's v3-0002-Redesign-interrupts-remove- 
> ProcSignals.patch much earlier in this thread.

And here's yet another version. It's the same at high level, but with a 
ton of little fixes.

One notable change is that I merged storage/interrupt.[ch] with 
postmaster/interrupt.[ch], so the awkwardness of having two files with 
same name is gone.

-- 
Heikki Linnakangas
Neon (https://neon.tech)