Thread: Interrupts vs signals
Hi, I wonder if we really need signals to implement interrupts. Given that they are non-preemptive/cooperative (work happens at the next CFI()), why not just use shared memory flags and latches? That skips a bunch of code, global variables and scary warnings about programming in signal handlers. I sketched out some code to try that a few months back, while speculating about bite-sized subproblems that would come up if each backend is, one day, a thread. There are several other conditions that are also handled by CHECK_FOR_INTERRUPTS(), but are not triggered by other backends sending signals, or are set by other signal handlers (SIGALRM, SIGQUIT). One idea is to convert those into "procsignals" too, for consistency. In the attached, they can be set (ie by the same backend) with ProcSignalRaise(), but it's possible that in future we might have a reason for another backend to set them too, so it seems like a good idea to have a single system, effectively merging the concepts of "procsignals" and "interrupts". There are still a few more ad hoc (non-ProcSignal) uses of SIGUSR1 in the tree. For one thing, we don't allow the postmaster to set latches; if we gave up on that rule, we wouldn't need the bgworker please-signal-me thing. Also the logical replication launcher does the same sort of thing for no apparent reason. Changed in the attached -- mainly so I could demonstrate that check-world passes with SIGUSR1 ignored. The attached is only experiment grade code: in particular, I didn't quite untangle the recovery conflict flags properly. It's also doing function calls where some kind of fast inlined magic is probably required, and I probably have a few other details wrong, but I figured it was good enough to demonstrate the concept.
Attachment
Hi, On 2021-10-21 07:55:54 +1300, Thomas Munro wrote: > I wonder if we really need signals to implement interrupts. Given > that they are non-preemptive/cooperative (work happens at the next > CFI()), why not just use shared memory flags and latches? That skips > a bunch of code, global variables and scary warnings about programming > in signal handlers. Depending on how you implement them, one difference could be whether / when "slow" system calls (recv, poll, etc) are interrupted. Another is that that signal handling provides a memory barrier in the receiving process. For things that rarely change (like most interrupts), it can be more efficient to move the cost of that out-of-line, instead of incurring them on every check. One nice thing of putting the state variables into shared memory would be that that'd allow to see the pending interrupts of other backends for debugging purposes. > One idea is to convert those into "procsignals" too, for > consistency. In the attached, they can be set (ie by the same > backend) with ProcSignalRaise(), but it's possible that in future we > might have a reason for another backend to set them too, so it seems > like a good idea to have a single system, effectively merging the > concepts of "procsignals" and "interrupts". This seems a bit confusing to me. For one, we need to have interrupts working before we have a proc, IIRC. But leaving details like that aside, it just seems a bit backwards to me. I'm on board with other backends directly setting interrupt flags, but it seems to me that the procsignal stuff should be "client" of the process-local interrupt infrastructure, rather than the other way round. > +bool > +ProcSignalAnyPending(void) > +{ > + volatile ProcSignalSlot *slot = MyProcSignalSlot; > > - if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN)) > - RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN); > + /* XXX make this static inline? */ > + /* XXX point to a dummy entry instead of using NULL to avoid a branch */ > + return slot && slot->pss_signaled; > +} ISTM it might be easier to make this stuff efficiently race-free if we made this a count of pending operations. > @@ -3131,12 +3124,13 @@ ProcessInterrupts(void) > /* OK to accept any interrupts now? */ > if (InterruptHoldoffCount != 0 || CritSectionCount != 0) > return; > - InterruptPending = false; > + ProcSignalClearAnyPending(); > + > + pg_read_barrier(); > > - if (ProcDiePending) > + if (ProcSignalConsume(PROCSIG_DIE)) > { I think making all of these checks into function calls isn't great. How about making the set of pending signals a bitmask? That'd allow to efficiently check a bunch of interrupts together and even where not, it'd just be a single test of the mask, likely already in a register. Greetings, Andres Freund
On Thu, Oct 21, 2021 at 8:27 AM Andres Freund <andres@anarazel.de> wrote: > On 2021-10-21 07:55:54 +1300, Thomas Munro wrote: > > I wonder if we really need signals to implement interrupts. Given > > that they are non-preemptive/cooperative (work happens at the next > > CFI()), why not just use shared memory flags and latches? That skips > > a bunch of code, global variables and scary warnings about programming > > in signal handlers. > > Depending on how you implement them, one difference could be whether / when > "slow" system calls (recv, poll, etc) are interrupted. Hopefully by now all such waits are implemented with latch.c facilities? > Another is that that signal handling provides a memory barrier in the > receiving process. For things that rarely change (like most interrupts), it > can be more efficient to move the cost of that out-of-line, instead of > incurring them on every check. Agreed, but in this experiment I was trying out the idea that a memory barrier is not really needed at all, unless you're about to go to sleep. We already insert one of those before a latch wait. That is, if we see !set->latch->is_set, we do pg_memory_barrier() and check again, before sleeping, so your next CFI must see the flag. For computation loops (sort, hash, query execution, ...), I speculate that a relaxed read of memory is fine... you'll see the flag pretty soon, and you certainly won't be allowed to finish your computation and go to sleep. > One nice thing of putting the state variables into shared memory would be that > that'd allow to see the pending interrupts of other backends for debugging > purposes. +1 > > One idea is to convert those into "procsignals" too, for > > consistency. In the attached, they can be set (ie by the same > > backend) with ProcSignalRaise(), but it's possible that in future we > > might have a reason for another backend to set them too, so it seems > > like a good idea to have a single system, effectively merging the > > concepts of "procsignals" and "interrupts". > > This seems a bit confusing to me. For one, we need to have interrupts working > before we have a proc, IIRC. But leaving details like that aside, it just > seems a bit backwards to me. I'm on board with other backends directly setting > interrupt flags, but it seems to me that the procsignal stuff should be > "client" of the process-local interrupt infrastructure, rather than the other > way round. Hmm. Yeah, I see your point. But I can also think of some arguments for merging the concepts of local and shared interrupts; see below. In this new sketch, I tried doing it the other way around. That is, completely removing the concept of "ProcSignal", leaving only "Interrupts". Initially, MyPendingInterrupts points to something private, and once you're connected to shared memory it points to MyProc->pending_interrupts. > > +bool > > +ProcSignalAnyPending(void) > > +{ > > + volatile ProcSignalSlot *slot = MyProcSignalSlot; > > > > - if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN)) > > - RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN); > > + /* XXX make this static inline? */ > > + /* XXX point to a dummy entry instead of using NULL to avoid a branch */ > > + return slot && slot->pss_signaled; > > +} > > ISTM it might be easier to make this stuff efficiently race-free if we made > this a count of pending operations. Hmm, with a unified interrupt system and a bitmap it's not necessary to have a separate flag/counter at all. > > @@ -3131,12 +3124,13 @@ ProcessInterrupts(void) > > /* OK to accept any interrupts now? */ > > if (InterruptHoldoffCount != 0 || CritSectionCount != 0) > > return; > > - InterruptPending = false; > > + ProcSignalClearAnyPending(); > > + > > + pg_read_barrier(); > > > > - if (ProcDiePending) > > + if (ProcSignalConsume(PROCSIG_DIE)) > > { > > I think making all of these checks into function calls isn't great. How about > making the set of pending signals a bitmask? That'd allow to efficiently check > a bunch of interrupts together and even where not, it'd just be a single test > of the mask, likely already in a register. +1. Some assorted notes: 1. Aside from doing interrupts in this new way, I also have the postmaster setting latches (!) instead of sending ad hoc SIGUSR1 here and there. My main reason for doing that was to be able to chase out all reasons to register a SIGUSR1 handler, so I could prove that check-world passes. I like it, though. But maybe it's really a separate topic. 2. I moved this stuff into interrupt.{h,c}. There is nothing left in procsignal.c except code relating to ProcSignalBarrier. I guess that thing could use another name, anyway. It's a ... SystemInterruptBarrier? 3. Child-level SIGINT and SIGTERM handlers probably aren't really necessary, either: maybe the sender could do InterruptSend(INTERRUPT_{DIE,QUERY_CANCEL}, pgprocno) instead? But perhaps people are attached to being able to send those signals from external programs directly to backends. 4. Like the above, a SIGALRM handler might need to do eg InterruptRaise(INTERRUPT_STATEMENT_TIMEOUT). That's a problem for systems using spinlocks (self-deadlock against user context in InterruptRaise()), so I'd need to come up with some flag protocol for dinosaurs to make that safe, OR revert to having these "local only" interrupts done with separate flags, as you were getting at earlier. 5. The reason I prefer to put currently "local only" interrupts into the same atomic system is that I speculate that ultimately all of the backend-level signal handlers won't be needed. They all fall into three categories: (1) could be replaced with these interrupts directly, (2) could be replaced by the new timer infrastructure that multithreaded postgres would need to have to deliver interrupts to the right recipients, (3) are quickdie and can be handled at the containing process level. Then the only signal handlers left are top level external ones. But perhaps you're right and I should try reintroducing separate local interrupts for now. I dunno, I like the simplicity of the unified system; if only it weren't for those spinlock-backed atomics.
Attachment
On Thu, Nov 11, 2021 at 12:27 AM Thomas Munro <thomas.munro@gmail.com> wrote: > > Depending on how you implement them, one difference could be whether / when > > "slow" system calls (recv, poll, etc) are interrupted. > > Hopefully by now all such waits are implemented with latch.c facilities? Do read(), write(), etc. count? Because we certainly have raw calls to those functions in lots of places. -- Robert Haas EDB: http://www.enterprisedb.com
Hi, On 2021-11-11 09:06:01 -0500, Robert Haas wrote: > On Thu, Nov 11, 2021 at 12:27 AM Thomas Munro <thomas.munro@gmail.com> wrote: > > > Depending on how you implement them, one difference could be whether / when > > > "slow" system calls (recv, poll, etc) are interrupted. > > > > Hopefully by now all such waits are implemented with latch.c facilities? > > Do read(), write(), etc. count? Because we certainly have raw calls to > those functions in lots of places. They can count, but only when used for network sockets or pipes ("slow devices" or whatever the posix language is). Disk IO doesn't count as that. So I don't think it'd be a huge issue. Greetings, Andres Freund
On Thu, Nov 11, 2021 at 2:50 PM Andres Freund <andres@anarazel.de> wrote: > They can count, but only when used for network sockets or pipes ("slow > devices" or whatever the posix language is). Disk IO doesn't count as that. So > I don't think it'd be a huge issue. Somehow the idea that the network is a slow device and the disk a fast one does not seem like it's necessarily accurate on modern hardware, but I guess the spec is what it is. -- Robert Haas EDB: http://www.enterprisedb.com
On Fri, Nov 12, 2021 at 9:24 AM Robert Haas <robertmhaas@gmail.com> wrote: > On Thu, Nov 11, 2021 at 2:50 PM Andres Freund <andres@anarazel.de> wrote: > > They can count, but only when used for network sockets or pipes ("slow > > devices" or whatever the posix language is). Disk IO doesn't count as that. So > > I don't think it'd be a huge issue. > > Somehow the idea that the network is a slow device and the disk a fast > one does not seem like it's necessarily accurate on modern hardware, > but I guess the spec is what it is. [Somehow I managed to reply to Robert only; let me try that again, this time to the list...] Network filesystems have in the past been confusing because they're both disk-like and network-like, and also slow as !@#$, which is why there have been mount point options like "intr", "nointr" (now ignored on Linux) to control what happens if you receive an async signal during a sleepy read/write. But even if you had some kind of Deathstation 9000 that had a switch on the front panel that ignores SA_RESTART and produces EINTR for disk I/O when a signal arrives, PostgreSQL already doesn't work today. Our pread() and pwrite() paths for data and WAL don't not have a EINTR loops or CHECK_FOR_INTERRUPTS() (we just can't take interrupts in the middle of eg a synchronous write), so I think we'd produce an ERROR or PANIC. So I think disk I/O is irrelevant, and network/pipe I/O is already handled everywhere via latch.c facilities. If there are any eg blocking reads on a socket in PostgreSQL, we should fix that to use latch.c non-blocking techniques, because such a place is already a place that ignores postmaster death and interrupts. To be more precise: such a place could of course wake up for EINTR on SIGUSR1 from procsignal.c, and that would no longer happen with my patch, but if we're relying on that anywhere, it's dangerous and unreliable. If SIGUSR1 is delivered right before you enter a blocking read(), you'll sleep waiting for the socket or whatever. That's precisely the problem that latch.c solves, and why it's already a bug if there are such places.
Here's an updated version of this patch. The main idea is that SendProcSignal(pid, PROCSIGNAL_XXX, procno) becomes SendInterrupt(INTERRUPT_XXX, procno), and all the pending interrupt global variables and pss_procsignalFlags[] go away, along with the SIGUSR1 handler. The interrupts are compressed into a single bitmap. See commit message for more details. The patch is failing on Windows CI for reasons I haven't debugged yet, but I wanted to share what I have so far. Work in progress! Here is my attempt to survey the use of signals and write down what I think we might do about them all so far, to give the context for this patch: https://wiki.postgresql.org/wiki/Signals Comments, corrections, edits very welcome.
Attachment
On 08/07/2024 05:56, Thomas Munro wrote: > Here's an updated version of this patch. > > The main idea is that SendProcSignal(pid, PROCSIGNAL_XXX, procno) > becomes SendInterrupt(INTERRUPT_XXX, procno), and all the pending > interrupt global variables and pss_procsignalFlags[] go away, along > with the SIGUSR1 handler. The interrupts are compressed into a single > bitmap. See commit message for more details. > > The patch is failing on Windows CI for reasons I haven't debugged yet, > but I wanted to share what I have so far. Work in progress! > > Here is my attempt to survey the use of signals and write down what I > think we might do about them all so far, to give the context for this > patch: > > https://wiki.postgresql.org/wiki/Signals > > Comments, corrections, edits very welcome. Nice, thanks! > Background worker state notifications are also changed from raw > kill(SIGUSR1) to SetLatch(). That means that SetLatch() is now called > from the postmaster. The main purpose of including that change is to be > able to remove procsignal_sigusr1_handler completely and set SIGUSR1 to > SIG_IGN, and show the system working. > > XXX Do we need to invent SetLatchRobust() that doesn't trust anything in > shared memory, to be able to set latches from the postmaster? The patch actually does both: it still does kill(SIGUSR1) and also sets the latch. I think it would be nice if RegisterDynamicBackgroundWorker() had a "bool notify_me" argument, instead of requiring the caller to set "bgw_notify_pid = MyProcPid" before the call. That's a backwards-compatibility break, but maybe we should bite the bullet and do it. Or we could do this in RegisterDynamicBackgroundWorker(): if (worker->bgw_notify_pid == MyProcPid) worker->bgw_notify_pgprocno = MyProcNumber; I think we can forbid setting pgw_notify_pid to anything else than 0 or MyProcPid. A SetLatchRobust would be nice. Looking at SetLatch(), I don't think it can do any damage if you called it on a pointer to garbage, except if the pointer itself is bogus, then just dereferencing it an cause a segfault. So it would be nice to have a version specifically designed with that in mind. For example, it could assume that the Latch's pid is never legally equal to MyProcPid, because postmaster cannot own any latches. Another approach would be to move the responsibility of background worker state notifications out of postmaster completely. When a new background worker is launched, the worker process itself could send the notification that it has started. And similarly, when a worker exits, it could send the notification just before exiting. There's a little race condition with exiting: if a process is waiting for the bgworker to exit, and launches a new worker immediately when the old one exits, there will be a brief period when the old and new process are alive at the same time. The old worker wouldn't be doing anything interesting anymore since it's exiting, but it still counts towards max_worker_processes, so launching the new process might fail because of hitting the limit. Maybe we should just bump up max_worker_processes. Or postmaster could check PMChildFlags and not count processes that have already deregistered from PMChildFlags towards the limit. > -volatile uint32 InterruptHoldoffCount = 0; > -volatile uint32 QueryCancelHoldoffCount = 0; > -volatile uint32 CritSectionCount = 0; > +uint32 InterruptHoldoffCount = 0; > +uint32 QueryCancelHoldoffCount = 0; > +uint32 CritSectionCount = 0; I wondered if these are used in PG_TRY-CATCH blocks in a way that would still require volatile. I couldn't find any such usage by some quick grepping, so I think we're good, but I thought I'd mention it. > +/* > + * The set of "standard" interrupts that CHECK_FOR_INTERRUPTS() and > + * ProcessInterrupts() handle. These perform work that is safe to run whenever > + * interrupts are not "held". Other kinds of interrupts are only handled at > + * more restricted times. > + */ > +#define INTERRUPT_STANDARD_MASK \ Some interrupts are missing from this mask: - INTERRUPT_PARALLEL_APPLY_MESSAGE - INTERRUPT_IDLE_STATS_UPDATE_TIMEOUT - INTERRUPT_SINVAL_CATCHUP - INTERRUPT_NOTIFY Is that on purpose? > -/* > - * Because backends sitting idle will not be reading sinval events, we > - * need a way to give an idle backend a swift kick in the rear and make > - * it catch up before the sinval queue overflows and forces it to go > - * through a cache reset exercise. This is done by sending > - * PROCSIG_CATCHUP_INTERRUPT to any backend that gets too far behind. > - * > - * The signal handler will set an interrupt pending flag and will set the > - * processes latch. Whenever starting to read from the client, or when > - * interrupted while doing so, ProcessClientReadInterrupt() will call > - * ProcessCatchupEvent(). > - */ > -volatile sig_atomic_t catchupInterruptPending = false; Would be nice to move that comment somewhere else rather than remove it completely. > --- a/src/backend/storage/lmgr/proc.c > +++ b/src/backend/storage/lmgr/proc.c > @@ -444,6 +444,10 @@ InitProcess(void) > OwnLatch(&MyProc->procLatch); > SwitchToSharedLatch(); > > + /*We're now ready to accept interrupts from other processes. */ > + pg_atomic_init_u32(&MyProc->pending_interrupts, 0); > + SwitchToSharedInterrupts(); > + > /* now that we have a proc, report wait events to shared memory */ > pgstat_set_wait_event_storage(&MyProc->wait_event_info); > > @@ -611,6 +615,9 @@ InitAuxiliaryProcess(void) > OwnLatch(&MyProc->procLatch); > SwitchToSharedLatch(); > > + /* We're now ready to accept interrupts from other processes. */ > + SwitchToSharedInterrupts(); > + > /* now that we have a proc, report wait events to shared memory */ > pgstat_set_wait_event_storage(&MyProc->wait_event_info); > Is there a reason for the different initialization between a regular backend and aux process? > +/* > + * Switch to shared memory interrupts. Other backends can send interrupts > + * to this one if they know its ProcNumber. > + */ > +void > +SwitchToSharedInterrupts(void) > +{ > + pg_atomic_fetch_or_u32(&MyProc->pending_interrupts, pg_atomic_read_u32(MyPendingInterrupts)); > + MyPendingInterrupts = &MyProc->pending_interrupts; > +} Hmm, I think there's a race condition here (and similarly in SwitchToLocalInterrupts), if the signal handler runs between the pg_atomic_fetch_or_u32, and changing MyPendingInterrupts. Maybe something like this instead: MyPendingInterrupts = &MyProc->pending_interrupts; pg_memory_barrier(); pg_atomic_fetch_or_u32(&MyProc->pending_interrupts, pg_atomic_read_u32(LocalPendingInterrupts)); And perhaps this should also clear LocalPendingInterrupts, just to be tidy. > @@ -138,7 +139,8 @@ > typedef struct ProcState > { > /* procPid is zero in an inactive ProcState array entry. */ > - pid_t procPid; /* PID of backend, for signaling */ > + pid_t procPid; /* pid of backend */ > + ProcNumber pgprocno; /* for sending interrupts */ > /* nextMsgNum is meaningless if procPid == 0 or resetState is true. */ > int nextMsgNum; /* next message number to read */ > bool resetState; /* backend needs to reset its state */ We can easily remove procPid altogether now that we have pgprocno here. Similarly with the pid/pgprocno in ReplicationSlot and WalSndState. -- Heikki Linnakangas Neon (https://neon.tech)
On Mon, Jul 8, 2024 at 5:38 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > Another approach would be to move the responsibility of background > worker state notifications out of postmaster completely. When a new > background worker is launched, the worker process itself could send the > notification that it has started. And similarly, when a worker exits, it > could send the notification just before exiting. There's a little race > condition with exiting: if a process is waiting for the bgworker to > exit, and launches a new worker immediately when the old one exits, > there will be a brief period when the old and new process are alive at > the same time. The old worker wouldn't be doing anything interesting > anymore since it's exiting, but it still counts towards > max_worker_processes, so launching the new process might fail because of > hitting the limit. Maybe we should just bump up max_worker_processes. Or > postmaster could check PMChildFlags and not count processes that have > already deregistered from PMChildFlags towards the limit. I can testify that the current system is the result of a lot of trial and error. I'm not saying it can't be made better, but my initial attempts at getting this to work (back in the 9.4 era) resembled what you proposed here, were consequently a lot simpler than what we have now, and also did not work. Race conditions like you mention here were part of that. Another consideration is that fork() can fail, and in that case, the process that tried to register the new background worker needs to find out that the background worker won't ever be starting. Yet another problem is that, even if fork() succeeds, the new process might fail before it executes any of our code e.g. because it seg faults very early, a case that actually happened to me - inadvertently - while I was testing these facilities. I ended up deciding that we can't rely on the new process to do anything until it's given us some signal that it is alive and able to carry out its duties. If it dies before telling us that, or never starts in the first place, we have to have some other way of finding that out, and it's difficult to see how that can happen without postmaster involvement. -- Robert Haas EDB: http://www.enterprisedb.com
On Mon, Jul 8, 2024 at 9:38 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > The patch actually does both: it still does kill(SIGUSR1) and also sets > the latch. Yeah, I had some ideas about supporting old extension code that really wanted a SIGUSR1, but on reflection, the only reason anyone ever wants that is so that sigusr1_handler can SetLatch(), which pairs with WaitLatch() in WaitForBackgroundWorker*(). Let's go all the way and assume that. > I think it would be nice if RegisterDynamicBackgroundWorker() had a > "bool notify_me" argument, instead of requiring the caller to set > "bgw_notify_pid = MyProcPid" before the call. That's a > backwards-compatibility break, but maybe we should bite the bullet and > do it. Or we could do this in RegisterDynamicBackgroundWorker(): > > if (worker->bgw_notify_pid == MyProcPid) > worker->bgw_notify_pgprocno = MyProcNumber; > > I think we can forbid setting pgw_notify_pid to anything else than 0 or > MyProcPid. Another idea: we could keep the bgw_notify_pid field around for a while, documented as unused and due to be removed in future. We could automatically capture the caller's proc number. So you get latch wakeups by default, which I expect many people want, and most people could cope with even if they don't want them. If you really really don't want them, you could set a new flag BGW_NO_NOTIFY. I have now done this part of the change in a new first patch. This particular use of kill(SIGUSR1) is separate from the ProcSignal removal, it's just that it relies on ProcSignal's handler's default action of calling SetLatch(). It's needed so the ProcSignal-ectomy can fully delete sigusr1_handler(), but it's not directly the same thing, so it seemed good to split the patch. > A SetLatchRobust would be nice. Looking at SetLatch(), I don't think it > can do any damage if you called it on a pointer to garbage, except if > the pointer itself is bogus, then just dereferencing it an cause a > segfault. So it would be nice to have a version specifically designed > with that in mind. For example, it could assume that the Latch's pid is > never legally equal to MyProcPid, because postmaster cannot own any latches. Yeah I'm starting to think that all we need to do here is range-check the proc number. Here's a version that adds: ProcSetLatch(proc_number). Another idea would be for SetLatch(latch) to sanitise the address of a latch, ie its offset and range. What the user really wants to be able to do with this bgworker API, I think, is wait for a given a handle, which could find a condition variable + generation in the slot, so that we don't have to register any proc numbers anywhere until we're actually waiting. But *clearly* the postmaster can't use the condition variable API without risking following corrupted pointers in shared memory. Whereas AFAICT ProcSetLatch() from the patched postmaster can't really be corrupted in any new way that it couldn't already be corrupted in master (where it runs in the target process), if we're just a bit paranoid about how we find our way to the latch. Receiving latch wakeups in the postmaster might be another question, but I don't think we need to confront that question just yet. > > -volatile uint32 InterruptHoldoffCount = 0; > > -volatile uint32 QueryCancelHoldoffCount = 0; > > -volatile uint32 CritSectionCount = 0; > > +uint32 InterruptHoldoffCount = 0; > > +uint32 QueryCancelHoldoffCount = 0; > > +uint32 CritSectionCount = 0; > > I wondered if these are used in PG_TRY-CATCH blocks in a way that would > still require volatile. I couldn't find any such usage by some quick > grepping, so I think we're good, but I thought I'd mention it. Hmm. Still thinking about this. > > +/* > > + * The set of "standard" interrupts that CHECK_FOR_INTERRUPTS() and > > + * ProcessInterrupts() handle. These perform work that is safe to run whenever > > + * interrupts are not "held". Other kinds of interrupts are only handled at > > + * more restricted times. > > + */ > > +#define INTERRUPT_STANDARD_MASK \ > > Some interrupts are missing from this mask: > > - INTERRUPT_PARALLEL_APPLY_MESSAGE Oops, that one ^ is a rebasing mistake. I wrote the ancestor of this patch in 2021, and that new procsignal arrived in 2023, and I put the code in to handle it, but I forgot to add it to the mask, which gives me an idea (see below)... > - INTERRUPT_IDLE_STATS_UPDATE_TIMEOUT > - INTERRUPT_SINVAL_CATCHUP > - INTERRUPT_NOTIFY > > Is that on purpose? INTERRUPT_SINVAL_CATCHUP and INTERRUPT_NOTIFY are indeed handled differently on purpose. In master, they don't set InterruptPending, and they are not handled by regular CHECK_FOR_INTERRUPTS() sites, but in the patch they still need a bit in pending_interrupts, and that is what that mask hides from CHECK_FOR_INTERRUPTS(). They are checked explicitly in ProcessClientReadInterrupt(). I think the idea is that we can't handle sinval at random places because that might create dangling pointers to cached objects where we don't expect them, and we can't emit NOTIFY-related protocol messages at random times either. There is something a little funky about _IDLE_STATS_UPDATE_TIMEOUT, though. It has a different scheme for running only when idle, where if it opts not to do anything, it doesn't consume the interrupt (a later CFI() will, but the latch is not set so we might sleep). I was confused by that. I think I have changed to be more faithful to master's behaviour now. Hmm, a better terminology for the interupts that CFI handles would be s/standard/regular/, so I have changed that. New idea: it would be less error-prone if we instead had a mask of these special cases, of which there are now only two. Tried that way! > > -/* > > - * Because backends sitting idle will not be reading sinval events, we > > - * need a way to give an idle backend a swift kick in the rear and make > > - * it catch up before the sinval queue overflows and forces it to go > > - * through a cache reset exercise. This is done by sending > > - * PROCSIG_CATCHUP_INTERRUPT to any backend that gets too far behind. > > - * > > - * The signal handler will set an interrupt pending flag and will set the > > - * processes latch. Whenever starting to read from the client, or when > > - * interrupted while doing so, ProcessClientReadInterrupt() will call > > - * ProcessCatchupEvent(). > > - */ > > -volatile sig_atomic_t catchupInterruptPending = false; > > Would be nice to move that comment somewhere else rather than remove it > completely. OK, I moved it to the top of ProcessCatchupInterrupt(). > > --- a/src/backend/storage/lmgr/proc.c > > +++ b/src/backend/storage/lmgr/proc.c > > @@ -444,6 +444,10 @@ InitProcess(void) > > OwnLatch(&MyProc->procLatch); > > SwitchToSharedLatch(); > > > > + /*We're now ready to accept interrupts from other processes. */ > > + pg_atomic_init_u32(&MyProc->pending_interrupts, 0); > > + SwitchToSharedInterrupts(); > > + > > /* now that we have a proc, report wait events to shared memory */ > > pgstat_set_wait_event_storage(&MyProc->wait_event_info); > > > > @@ -611,6 +615,9 @@ InitAuxiliaryProcess(void) > > OwnLatch(&MyProc->procLatch); > > SwitchToSharedLatch(); > > > > + /* We're now ready to accept interrupts from other processes. */ > > + SwitchToSharedInterrupts(); > > + > > /* now that we have a proc, report wait events to shared memory */ > > pgstat_set_wait_event_storage(&MyProc->wait_event_info); > > > > Is there a reason for the different initialization between a regular > backend and aux process? No. But I thought about something else to fix here. Really we don't want to switch to shared interrupts until we are ready for CFI() to do stuff. I think that should probably be at the places where master unblocks signals. Otherwise, for example, if someone sends you an interrupt while you're starting up, something as innocent as elog(DEBUG, ...), which reaches CFI(), might try to do things for which the infrastructure is not yet fully set up, for example INTERRUPT_BARRIER. Not done yet, but wanted to share this new version. > > +/* > > + * Switch to shared memory interrupts. Other backends can send interrupts > > + * to this one if they know its ProcNumber. > > + */ > > +void > > +SwitchToSharedInterrupts(void) > > +{ > > + pg_atomic_fetch_or_u32(&MyProc->pending_interrupts, pg_atomic_read_u32(MyPendingInterrupts)); > > + MyPendingInterrupts = &MyProc->pending_interrupts; > > +} > > Hmm, I think there's a race condition here (and similarly in > SwitchToLocalInterrupts), if the signal handler runs between the > pg_atomic_fetch_or_u32, and changing MyPendingInterrupts. Maybe > something like this instead: > > MyPendingInterrupts = &MyProc->pending_interrupts; > pg_memory_barrier(); > pg_atomic_fetch_or_u32(&MyProc->pending_interrupts, > pg_atomic_read_u32(LocalPendingInterrupts)); Yeah, right, done. > And perhaps this should also clear LocalPendingInterrupts, just to be tidy. I used atomic_exchange() to read and clear the bits in one go. > > @@ -138,7 +139,8 @@ > > typedef struct ProcState > > { > > /* procPid is zero in an inactive ProcState array entry. */ > > - pid_t procPid; /* PID of backend, for signaling */ > > + pid_t procPid; /* pid of backend */ > > + ProcNumber pgprocno; /* for sending interrupts */ > > /* nextMsgNum is meaningless if procPid == 0 or resetState is true. */ > > int nextMsgNum; /* next message number to read */ > > bool resetState; /* backend needs to reset its state */ > > We can easily remove procPid altogether now that we have pgprocno here. Since other things access those values, I propose to remove them in separate patches. > Similarly with the pid/pgprocno in ReplicationSlot and WalSndState. Same. Those pids show up in user interfaces, so I think they should be handled in separate patches. Note to self: I need to change some pgprocno to proc_number... The next problems to remove are, I think, the various SIGUSR2, SIGINT, SIGTERM signals sent by the postmaster. These should clearly become SendInterrupt() or ProcSetLatch(). The problem here is that the postmaster doesn't have the proc numbers yet. One idea is to teach the postmaster to assign them! Not explored yet. This version is passing on Windows. I'll create a CF entry. Still work in progress!
Attachment
On 10/07/2024 09:48, Thomas Munro wrote: > On Mon, Jul 8, 2024 at 9:38 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote: >> The patch actually does both: it still does kill(SIGUSR1) and also sets >> the latch. > > Yeah, I had some ideas about supporting old extension code that really > wanted a SIGUSR1, but on reflection, the only reason anyone ever wants > that is so that sigusr1_handler can SetLatch(), which pairs with > WaitLatch() in WaitForBackgroundWorker*(). Let's go all the way and > assume that. +1 >> I think it would be nice if RegisterDynamicBackgroundWorker() had a >> "bool notify_me" argument, instead of requiring the caller to set >> "bgw_notify_pid = MyProcPid" before the call. That's a >> backwards-compatibility break, but maybe we should bite the bullet and >> do it. Or we could do this in RegisterDynamicBackgroundWorker(): >> >> if (worker->bgw_notify_pid == MyProcPid) >> worker->bgw_notify_pgprocno = MyProcNumber; >> >> I think we can forbid setting pgw_notify_pid to anything else than 0 or >> MyProcPid. > > Another idea: we could keep the bgw_notify_pid field around for a > while, documented as unused and due to be removed in future. We could > automatically capture the caller's proc number. So you get latch > wakeups by default, which I expect many people want, and most people > could cope with even if they don't want them. If you really really > don't want them, you could set a new flag BGW_NO_NOTIFY. Ok. I was going to say that it feels excessive to change the default like that. However, searching for RegisterDynamicBackgroundWorker() in github, I can't actually find any callers that don't set pg_notify_pid. So yeah, make sense. > I have now done this part of the change in a new first patch. This > particular use of kill(SIGUSR1) is separate from the ProcSignal > removal, it's just that it relies on ProcSignal's handler's default > action of calling SetLatch(). It's needed so the ProcSignal-ectomy > can fully delete sigusr1_handler(), but it's not directly the same > thing, so it seemed good to split the patch. PostmasterMarkPIDForWorkerNotify() is now unused. Which means that bgworker_notify is never set, and BackgroundWorkerStopNotifications() is never called either. >> A SetLatchRobust would be nice. Looking at SetLatch(), I don't think it >> can do any damage if you called it on a pointer to garbage, except if >> the pointer itself is bogus, then just dereferencing it an cause a >> segfault. So it would be nice to have a version specifically designed >> with that in mind. For example, it could assume that the Latch's pid is >> never legally equal to MyProcPid, because postmaster cannot own any latches. > > Yeah I'm starting to think that all we need to do here is range-check > the proc number. Here's a version that adds: > ProcSetLatch(proc_number). Another idea would be for SetLatch(latch) > to sanitise the address of a latch, ie its offset and range. Hmm, I don't think postmaster should trust ProcGlobal->allProcCount either. > The next problems to remove are, I think, the various SIGUSR2, SIGINT, > SIGTERM signals sent by the postmaster. These should clearly become > SendInterrupt() or ProcSetLatch(). +1 > The problem here is that the > postmaster doesn't have the proc numbers yet. One idea is to teach > the postmaster to assign them! Not explored yet. I've been thinking that we should: a) assign every child process a PGPROC entry, and make postmaster responsible for assigning them like you suggest. We'll need more PGPROC entries, because currently a process doesn't reserve one until authentication has happened. Or we change that behavior. or b) Use the pmsignal.c slot numbers for this, instead of ProcNumbers. Postmaster already assigns those. I'm kind of leaning towards b) for now, because it feels like a much smaller patch. In the long run, it would be nice if every child process had a ProcNumber, though. It was a nice simplification in v17 that we don't have separate BackendId and ProcNumber anymore; similarly it would be nice to not have separate PMChildSlot and ProcNumber concepts. -- Heikki Linnakangas Neon (https://neon.tech)
On Wed, Jul 24, 2024 at 8:58 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > a) assign every child process a PGPROC entry, and make postmaster > responsible for assigning them like you suggest. We'll need more PGPROC > entries, because currently a process doesn't reserve one until > authentication has happened. Or we change that behavior. I wonder how this works right now. Is there something that limits the number of authentication requests that can be in flight concurrently, or is it completely uncapped (except by machine resources)? My first thought when I read this was that it would be bad to have to put a limit on something that's currently unlimited. But then I started thinking that, even if it is currently unlimited, that might be a bug rather than a feature. If you have hundreds of pending authentication requests, that just means you're using a lot of machine resources on something that doesn't really help anybody. A machine with hundreds of authentication-pending connections is possibly getting DDOS'd and probably getting buried. You'd be better off focusing the machine's limited resources on the already-established connections and a more limited number of new connection attempts. If you accept so many connection attempts that you don't actually have enough memory/CPU/kernel scheduling firepower to complete the authentication process with any of them, it does nobody any good. I'm not sure what's best to do here; just thinking out loud. -- Robert Haas EDB: http://www.enterprisedb.com
On 29/07/2024 19:56, Robert Haas wrote: > On Wed, Jul 24, 2024 at 8:58 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote: >> a) assign every child process a PGPROC entry, and make postmaster >> responsible for assigning them like you suggest. We'll need more PGPROC >> entries, because currently a process doesn't reserve one until >> authentication has happened. Or we change that behavior. > > I wonder how this works right now. Is there something that limits the > number of authentication requests that can be in flight concurrently, > or is it completely uncapped (except by machine resources)? > My first thought when I read this was that it would be bad to have to > put a limit on something that's currently unlimited. But then I > started thinking that, even if it is currently unlimited, that might > be a bug rather than a feature. If you have hundreds of pending > authentication requests, that just means you're using a lot of machine > resources on something that doesn't really help anybody. A machine > with hundreds of authentication-pending connections is possibly > getting DDOS'd and probably getting buried. You'd be better off > focusing the machine's limited resources on the already-established > connections and a more limited number of new connection attempts. If > you accept so many connection attempts that you don't actually have > enough memory/CPU/kernel scheduling firepower to complete the > authentication process with any of them, it does nobody any good. > > I'm not sure what's best to do here; just thinking out loud. Yes, there's a limit, roughly 2x max_connections. see MaxLivePostmasterChildren(). There's another issue with that that I was about to post in another thread, but here goes: the MaxLivePostmasterChildren() limit is shared by all regular backends, bgworkers and autovacuum workers. If you open a lot of TCP connections to postmaster and don't send anything to the server, you exhaust those slots, and the server won't be able to start any autovacuum workers or background workers either. That's not great. I started to work on approach b), with separate pools of slots for different kinds of child processes, which fixes that. Stay tuned for a patch. In addition to that, you can have an unlimited number of "dead-end" backends, which are doomed to just respond with "sorry, too many clients" error. The only limit on those is the amount of resources needed for all the processes and a little memory to track them. -- Heikki Linnakangas Neon (https://neon.tech)
Robert Haas <robertmhaas@gmail.com> writes: > I wonder how this works right now. Is there something that limits the > number of authentication requests that can be in flight concurrently, > or is it completely uncapped (except by machine resources)? The former. IIRC, the postmaster won't spawn more than 2X max_connections subprocesses (don't recall the exact limit, but it's around there). regards, tom lane
On Mon, Jul 29, 2024 at 1:24 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: > > I wonder how this works right now. Is there something that limits the > > number of authentication requests that can be in flight concurrently, > > or is it completely uncapped (except by machine resources)? > > The former. IIRC, the postmaster won't spawn more than 2X max_connections > subprocesses (don't recall the exact limit, but it's around there). Hmm. Not to sidetrack this thread too much, but multiplying by two doesn't really sound like the right idea to me. The basic idea articulated in the comment for canAcceptConnections() makes sense: some backends might fail authentication, or might be about to exit, so it makes sense to allow for some slop. But 2X is a lot of slop even on a machine with the default max_connections=100, and people with connection management problems are likely to be running with max_connections=500 or max_connections=900 or even (insanely) max_connections=2000. Continuing with a connection attempt because we think that hundreds or thousands of connections that are ahead of us in the queue might clear out of the way before we need a PGPROC is not a good bet. I wonder if we ought to restrict this to a small, flat value, like say 50, or by a new GUC that defaults to such a value if a constant seems problematic. Maybe it doesn't really matter. I'm not sure how much work we'd save by booting out the doomed connection attempt earlier. The unlimited number of dead-end backends doesn't sound too great either. I don't have another idea, but presumably resisting DDOS attacks and/or preserving resources for things that still have a chance of working ought to take priority over printing a nicer error message from a connection that's doomed to fail anyway. -- Robert Haas EDB: http://www.enterprisedb.com
On 10/07/2024 09:48, Thomas Munro wrote: > The next problems to remove are, I think, the various SIGUSR2, SIGINT, > SIGTERM signals sent by the postmaster. These should clearly become > SendInterrupt() or ProcSetLatch(). The problem here is that the > postmaster doesn't have the proc numbers yet. One idea is to teach > the postmaster to assign them! Not explored yet. With my latest patches on the "Refactoring postmaster's code to cleanup after child exit" thread [1], every postmaster child process is assigned a slot in the pmsignal.c array, including all the aux processes. If we moved 'pending_interrupts' and the process Latch to the pmsignal.c array, then you could send an interrupt also to a process that doesn't have a PGPROC entry. That includes dead-end backends, backends that are still in authentication, and the syslogger. That would also make it so that the postmaster would never need to poke into the procarray. pmsignal.c is already designated as the limited piece of shared memory that is accessed by the postmaster (BackgroundWorkerSlots is the other exception), so it would be kind of nice if all the information that the postmaster needs to send an interrupt was there. That would mean that where you currently use a ProcNumber to identify a process, you'd use an index into the PMSignalState array instead. I don't insist on changing that right now, I think this patch is OK as it is, but that might be a good next step later. [1] https://www.postgresql.org/message-id/8f2118b9-79e3-4af7-b2c9-bd5818193ca4%40iki.fi I'm also wondering about the relationship between interrupts and latches. Currently, SendInterrupt sets a latch to wake up the target process. I wonder if it should be the other way 'round? Move all the wakeup code, with the signalfd, the self-pipe etc to interrupt.c, and in SetLatch, call SendInterrupt to wake up the target process? Somehow that feels more natural to me, I think. > This version is passing on Windows. I'll create a CF entry. Still > work in progress! Some comments on the patch details: > ereport(WARNING, > (errmsg("NOTIFY queue is %.0f%% full", fillDegree * 100), > - (minPid != InvalidPid ? > - errdetail("The server process with PID %d is among those with the oldest transactions.", minPid) > + (minPgprocno != INVALID_PROC_NUMBER ? > + errdetail("The server process with pgprocno %d is among those with the oldest transactions.", minPgprocno) > : 0), > - (minPid != InvalidPid ? > + (minPgprocno != INVALID_PROC_NUMBER ? > errhint("The NOTIFY queue cannot be emptied until that process ends its current transaction.") > : 0))); This makes the message less useful to the user, as the ProcNumber isn't exposed to users. With the PID, you can do "pg_terminate_backend(pid)" > diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c > index c42742d2c7b..bfb89049020 100644 > --- a/src/backend/optimizer/util/pathnode.c > +++ b/src/backend/optimizer/util/pathnode.c > @@ -18,6 +18,7 @@ > > #include "foreign/fdwapi.h" > #include "miscadmin.h" > +#include "postmaster/interrupt.h" > #include "nodes/extensible.h" > #include "optimizer/appendinfo.h" > #include "optimizer/clauses.h" misordered > + * duplicated interrupts later if we switch backx. typo: backx -> back > - if (IdleInTransactionSessionTimeoutPending) > + if (ConsumeInterrupt(INTERRUPT_IDLE_TRANSACTION_TIMEOUT)) > { > /* > * If the GUC has been reset to zero, ignore the signal. This is > @@ -3412,7 +3361,6 @@ ProcessInterrupts(void) > * interrupt. We need to unset the flag before the injection point, > * otherwise we could loop in interrupts checking. > */ > - IdleInTransactionSessionTimeoutPending = false; > if (IdleInTransactionSessionTimeout > 0) > { > INJECTION_POINT("idle-in-transaction-session-timeout"); The "We need to unset the flag.." comment is a bit out of place now, since the flag was already cleared by ConsumeInterrupt(). Same in the INTERRUPT_TRANSACTION_TIMEOUT and INTERRUPT_IDLE_SESSION_TIMEOUT handling after this. -- Heikki Linnakangas Neon (https://neon.tech)
On Sun, Aug 25, 2024 at 5:17 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > On 07/08/2024 17:59, Heikki Linnakangas wrote: > > I'm also wondering about the relationship between interrupts and > > latches. Currently, SendInterrupt sets a latch to wake up the target > > process. I wonder if it should be the other way 'round? Move all the > > wakeup code, with the signalfd, the self-pipe etc to interrupt.c, and in > > SetLatch, call SendInterrupt to wake up the target process? Somehow that > > feels more natural to me, I think. > > I explored that a little, see attached patch set. It's going towards the > same end state as your patches, I think, but it starts from different > angle. In a nutshell: > > Remove Latch as an abstraction, and replace all use of Latches with > Interrupts. When I originally created the Latch abstraction, I imagined > that we would have different latches for different purposes, but in > reality, almost all code just used the general-purpose "process latch". > this patch accepts that reality and replaces the Latch struct directly > with the interrupt mask in PGPROC. Some very initial reactions: * I like it! * This direction seems to fit quite nicely with future ideas about asynchronous network I/O. That may sound unrelated, but imagine that a future version of WaitEventSet is built on Linux io_uring (or Windows iorings, or Windows IOCP, or kqueue), and waits for the kernel to tell you that network data has been transferred directly into a user space buffer. You could wait for the interrupt word to change at the same time by treating it as a futex[1]. Then all that other stuff -- signalfd, is_set, maybe_sleeping -- just goes away, and all we have left is one single word in memory. (That it is possible to do that is not really a coincidence, as our own Mr Freund asked Mr Axboe to add it[2]. The existing latch implementation techniques could be used as fallbacks, but when looked at from the right angle, once you squish all the wakeup reasons into a single word, it's all just an implementation of a multiplexable futex with extra steps.) * Speaking of other problems in other threads that might be solved by this redesign, I think I see the outline of some solutions to the problem of different classes of wakeup which you can handle at different times, using masks. There is a tension in a few places where we want to handle some kind of interrupts but not others in localised wait points, which we sort of try to address by holding interrupts or holding cancel interrupts, but it's not satisfying and there are some places where it doesn't work well. Needs a lot more thought, but a basic step would be: after old_interrupt_vector = pg_atomic_fetch_or_u32(interrupt_vector, new_bits), if (old_interrupt_vector & new_bits) == new_bits, then you didn't actually change any bits, so you probably don't really need to wake the other backend. If someone is currently unable to handle that type of interrupt (has ignored, ie not cleared, those bits) or is already in the process of handling it (is currently being rescheduled but hasn't cleared those bits yet), then you don't bother to wake it up. Concretely, it could mean that we avoid some of the useless wakeup storm problems we see in vacuum delays or while executing a query and not in a good place to handle sinval wakeups, etc. These are just some raw thoughts, I am not sure about the bigger picture of that topic yet. * Archeological note on terminology: the reason almost every relation database and all the literature uses the term "latch" for something like our LWLocks seems to be that latches were/are one of the kinds of system-provided mutex on IBM System/370 and modern descendents ie z/OS. Oracle and other systems that started as knock-offs of the IBM System R (the original SQL system, of which DB2 is the modern heir) continued that terminology, even though they ran on VMS or Unix or whatever. I would not be sad if we removed our unusual use of the term latch. [1] https://man7.org/linux/man-pages/man3/io_uring_prep_futex_wait.3.html [2] https://lore.kernel.org/lkml/20230720221858.135240-1-axboe@kernel.dk/
On Sat, Aug 31, 2024 at 10:17 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > > * This direction seems to fit quite nicely with future ideas about > > asynchronous network I/O. That may sound unrelated, but imagine that > > a future version of WaitEventSet is built on Linux io_uring (or > > Windows iorings, or Windows IOCP, or kqueue), and waits for the kernel > > to tell you that network data has been transferred directly into a > > user space buffer. You could wait for the interrupt word to change at > > the same time by treating it as a futex[1]. Then all that other stuff > > -- signalfd, is_set, maybe_sleeping -- just goes away, and all we have > > left is one single word in memory. (That it is possible to do that is > > not really a coincidence, as our own Mr Freund asked Mr Axboe to add > > it[2]. The existing latch implementation techniques could be used as > > fallbacks, but when looked at from the right angle, once you squish > > all the wakeup reasons into a single word, it's all just an > > implementation of a multiplexable futex with extra steps.) > > Cool Just by the way, speaking of future tricks and the connections between this code and other problems in other threads, I wanted to mention that the above thought is also connected to CF #3998. When I started working on this, in parallel I had an experimental patch set using futexes[1] (back then, to try out futexes, I had to patch my OS[2] because Linux couldn't multiplex them yet, and macOS/*BSD had something sort of vaguely similar but effectively only usable between threads in one process). I planned to switch to waiting directly on the interrupt vector as a futex when bringing that idea together with the one in this thread, but I guess I assumed we had to keep latches too since they seemed like such a central concept in PostgreSQL. Your idea seems much better, the more I think about it, but maybe only the inventor of latches could have the idea of blowing them up :-) Anyway, in that same experiment I realised I could wake multiple backends in one system call, which led to more discoveries about the negative interactions between latches and locks, and begat CF #3998 (SetLatches()). By way of excuse, unfortunately I got blocked in my progress on interrupt vectors for a couple of release cycles by the recovery conflict system, a set of procsignals that were not like the others, and turned out to be broken more or less as a result. That was tricky to fix (CF #3615), leading to journeys into all kinds of strange places like the regex code... [1] https://github.com/macdice/postgres/commits/kqueue-usermem/ [2] https://reviews.freebsd.org/D37102
One remaining issue with these patches is backwards-compatibility for extensions: 1. Any extensions using WaitLatch, SetLatch, ResetLatch need to be changed to use WaitInterrupt, SetInterrupt/RaiseInterrupt, ResetInterrupt instead. 2. If an extension is defining its own Latch with InitLatch, rather than using MyLatch, it's out of luck. You could probably rewrite it using the INTERRUPT_GENERAL_WAKEUP, which is multiplexed like MyLatch, but it's a bit more effort. We can provide backwards compatibility macros and a new facility to allocate custom interrupt bits. But how big of a problem is this anyway? In an in-person chat last week, Andres said something like "this will break every extension". I don't think it's quite that bad, and more complicated extensions need small tweaks in every major release anyway. But let's try to quantify that. Analysis -------- Joel compiled a list of all extensions that are on github: https://gist.github.com/joelonsql/e5aa27f8cc9bd22b8999b7de8aee9d47. I did "git checkout" on all of them, 1159 repositories in total (some of the links on the list were broken, or required authentication). 86 out of those 1159 extensions contain the word "Latch": $ ls -d */* | xargs -IFOO bash -c "grep -q -r -I Latch FOO && echo FOO" | wc -l 86 For comparison, in PostgreSQL v10, we added a new 'wait_event_info' argument to WaitLatch and WaitLatchForSocket. That breakage was on similar scale: $ ls -d */* | xargs -IFOO bash -c "grep -q -r -I WaitLatch FOO && echo FOO" | wc -l 71 Admittedly that was a long time ago, we might have different priorities now. Deeper analysis --------------- Most of those calls are simple calls to SetLatch(MyLatch), WaitLatch(MyLatch, ...) etc. that could be handled by backwards compatibility macros. Many of the calls don't use MyLatch, but use MyProc->procLatch directly: rc = WaitLatch(&MyProc->procLatch, ...) That makes sense, since MyLatch was added in 2015, while the initial latch code was added in 2010. Any extensions that lived during that era would use MyProc->procLatch for backwards-compatibility, and there's been no need to change that since. That could also be supported by backwards-compatibility macros, or we could tell the extension authors to search & replace MyProc->procLatch to MyLatch. Excluding those, there are a handful of extensions that would need to be updated. Below is a quick analysis of the remaining results of "grep Latch" in all the repos: These calls use the process latch to wake up a different process (I excluded some duplicated forks of the same extension): 2ndQuadrant/pglogical/pglogical_sync.c: SetLatch(&apply->proc->procLatch); 2ndQuadrant/pglogical/pglogical_apply.c: SetLatch(&worker->proc->procLatch); 2ndQuadrant/pglogical/pglogical_worker.c: SetLatch(&w->proc->procLatch); 2ndQuadrant/pglogical/pglogical_worker.c: SetLatch(&PGLogicalCtx->supervisor->procLatch); heterodb/pg-strom/src/gpu_cache.c: Latch *backend; heterodb/pg-strom/src/gpu_cache.c: cmd->backend = (is_async ? NULL : MyLatch); heterodb/pg-strom/src/gpu_cache.c: SetLatch(cmd->backend); heterodb/pg-strom/src/gpu_cache.c: SetLatch(cmd->backend); citusdata/citus/src/backend/distributed/utils/maintenanced.c: Latch *latch; /* pointer to the background worker's latch */ citusdata/citus/src/backend/distributed/utils/maintenanced.c: SetLatch(dbData->latch); liaorc/devel-master/src/cuda_program.c: SetLatch(&proc->procLatch); okbob/generic-scheduler/dbworker.c: SetLatch(&dbworker->parent->procLatch); postgrespro/lsm3/lsm3.c: SetLatch(&entry->merger->procLatch); postgrespro/lsm3/lsm3.c: SetLatch(&entry->merger->procLatch); postgrespro/pg_pageprep/pg_pageprep.c: SetLatch(&starter->procLatch); postgrespro/pg_pageprep/pg_pageprep.c: SetLatch(&starter->procLatch); postgrespro/pg_query_state/pg_query_state.c: Latch *caller; postgrespro/pg_query_state/pg_query_state.c: SetLatch(counterpart_userid->caller); postgrespro/pg_query_state/pg_query_state.c: counterpart_userid->caller = MyLatch; timescale/timescaledb/src/loader/bgw_message_queue.c: SetLatch(&BackendPidGetProc(queue_get_reader(queue))->procLatch); troels/hbase_fdw/process_communication.c: control->latch = MyLatch; troels/hbase_fdw/process_communication.c: SetLatch(control->latch); The above could easily be fairly replaced with the new SendInterrupt() function. A backwards compatibility macro might be possible for some of these, but would be tricky for others. So I think these extensions will need to adapt by switching to SendInterrupt(). At quick glance, it would easy for all of these to do that with a small "#if PG_VERSION_NUM" block. postgrespro/pg_wait_sampling/pg_wait_sampling.c: * null wait events. So instead we make use of DisownLatch() resetting postgrespro/pg_wait_sampling/pg_wait_sampling.c: if (proc->pid == 0 || proc->procLatch.owner_pid == 0 || proc->pid == MyProcPid) postgrespro/pg_wait_sampling/pg_wait_sampling.c: SetLatch(pgws_collector_hdr->latch); postgrespro/pg_wait_sampling/pg_wait_sampling.c: SetLatch(pgws_collector_hdr->latch); postgrespro/pg_wait_sampling/pg_wait_sampling.h: Latch *latch; This case is similar to the previous group: a latch is used to wake up another process. It peeks directly into procLatch.owner_pid however. I suspect this will be simpler and better after rewriting to use interrupts, but not sure. darold/datalink/datalink_bgw.c: (errmsg("Latch status before waitlatch call: %d", MyProc->procLatch.is_set))); This is a DEBUG1 message. Could probably be just removed, it doesn't seem very useful. postgrespro/jsonbd/jsonbd.c: SetLatch(&wd->latch); postgrespro/jsonbd/jsonbd.h: Latch latch; postgrespro/jsonbd/jsonbd.h: Latch launcher_latch; postgrespro/jsonbd/jsonbd_worker.c: SetLatch(&hdr->launcher_latch); postgrespro/jsonbd/jsonbd_worker.c: InitLatch(&worker_state->latch); postgrespro/jsonbd/jsonbd_worker.c: InitLatch(&hdr->launcher_latch); postgrespro/jsonbd/jsonbd_worker.c: rc = WaitLatch(MyLatch, WL_LATCH_SET | WL_POSTMASTER_DEATH, postgrespro/jsonbd/jsonbd_worker.c: ResetLatch(MyLatch); postgrespro/jsonbd/jsonbd_worker.c: rc = WaitLatch(&worker_state->latch, WL_LATCH_SET | WL_POSTMASTER_DEATH, postgrespro/jsonbd/jsonbd_worker.c: ResetLatch(&worker_state->latch); postgrespro/jsonbd/jsonbd_worker.c: WaitLatch(&hdr->launcher_latch, WL_LATCH_SET, 0, PG_WAIT_EXTENSION); postgrespro/jsonbd/jsonbd_worker.c: WaitLatch(&hdr->launcher_latch, WL_LATCH_SET, 0); postgrespro/jsonbd/jsonbd_worker.c: ResetLatch(&hdr->launcher_latch); timescale/timescaledb/test/src/bgw/params.h: Latch timer_latch; timescale/timescaledb/test/src/bgw/params.c: SetLatch(&wrapper->params.timer_latch); timescale/timescaledb/test/src/bgw/params.c: InitLatch(&wrapper->params.timer_latch); timescale/timescaledb/test/src/bgw/params.c: ResetLatch(&wrapper->params.timer_latch); timescale/timescaledb/test/src/bgw/params.c: WaitLatch(&wrapper->params.timer_latch, These two extensions, 'jsonbd' and 'timescaledb', actually use InitLatch to define their own latches. I think the 'jsonbd' usage was copied from logical apply workers. At quick glance, it probably should be using the process latch instead of creating its own. In 'timescaledb', the InitLatch call is in a mock test module. It probably could be written differently.. Summary ------- We have a few options for how to deal with backwards-compatibility for extensions: A) If we rip off the bandaid in one go and don't provide any backwards-compatibility macros, we will break 96 extensions. Most of them can be fixed by replacing WaitLatch, SetLatch, ResetLatch with corresponding WaitInterrupt, RaiseInterrupt, ResetInterrupt calls. (With #ifdef for compatibility with older postgres versions) B) If we provide backwards-compatibility macros so that simple Latch calls on MyLatch continue working, we will break about 14 extensions. They will need some tweaking like in option A). A bit more than the simple cases in option A), but nothing too difficult. C) We could provide "forward-compatibility" macros in a separate header file, to make the new "SetInterrupt" etc calls work in old PostgreSQL versions. Many of the extensions already have a header file like this, see e.g. citusdata/citus/src/include/pg_version_compat.h, pipelinedb/pipelinedb/include/compat.h. It might actually be a good idea to provide a semi-official header file like this on the Postgres wiki, to help extension authors. It would encourage extensions to use the latest idioms, while still being able to compile for older versions. I'm leaning towards option C). Let's rip off the band-aid, but provide documentation for how to adapt your extension code. And provide a forwards-compatibility header on the wiki, that extension authors can use to make the new Interrupt calls work against old server versions. The second question is whether we need to provide a replacement for InitLatch for extensions. ISTM that none of the extensions out there truly need it. But maybe if we had a nicer interface for allocating interrupt bits for custom usage, they would actually find that handy. But I'm leaning towards "no" at this point. Could be added later. -- Heikki Linnakangas Neon (https://neon.tech)
On Wed, Oct 30, 2024 at 12:03 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > 1. Any extensions using WaitLatch, SetLatch, ResetLatch need to be > changed to use WaitInterrupt, SetInterrupt/RaiseInterrupt, > ResetInterrupt instead. > > 2. If an extension is defining its own Latch with InitLatch, rather than > using MyLatch, it's out of luck. You could probably rewrite it using the > INTERRUPT_GENERAL_WAKEUP, which is multiplexed like MyLatch, but it's a > bit more effort. > > We can provide backwards compatibility macros and a new facility to > allocate custom interrupt bits. But how big of a problem is this anyway? > In an in-person chat last week, Andres said something like "this will > break every extension". Seems hyperbolic. > For comparison, in PostgreSQL v10, we added a new 'wait_event_info' > argument to WaitLatch and WaitLatchForSocket. That breakage was on > similar scale: > > $ ls -d */* | xargs -IFOO bash -c "grep -q -r -I WaitLatch FOO && > echo FOO" | wc -l > 71 However, that was also pretty easy to fix. This seems like it might be a little more complicated. > We have a few options for how to deal with backwards-compatibility for > extensions: > > A) If we rip off the bandaid in one go and don't provide any > backwards-compatibility macros, we will break 96 extensions. Most of > them can be fixed by replacing WaitLatch, SetLatch, ResetLatch with > corresponding WaitInterrupt, RaiseInterrupt, ResetInterrupt calls. (With > #ifdef for compatibility with older postgres versions) > > B) If we provide backwards-compatibility macros so that simple Latch > calls on MyLatch continue working, we will break about 14 extensions. > They will need some tweaking like in option A). A bit more than the > simple cases in option A), but nothing too difficult. > > C) We could provide "forward-compatibility" macros in a separate header > file, to make the new "SetInterrupt" etc calls work in old PostgreSQL > versions. Many of the extensions already have a header file like this, > see e.g. citusdata/citus/src/include/pg_version_compat.h, > pipelinedb/pipelinedb/include/compat.h. It might actually be a good idea > to provide a semi-official header file like this on the Postgres wiki, > to help extension authors. It would encourage extensions to use the > latest idioms, while still being able to compile for older versions. > > I'm leaning towards option C). Let's rip off the band-aid, but provide > documentation for how to adapt your extension code. And provide a > forwards-compatibility header on the wiki, that extension authors can > use to make the new Interrupt calls work against old server versions. I don't know which of these options is best, but I don't find any of them categorically unacceptable. -- Robert Haas EDB: http://www.enterprisedb.com
On Mon, 4 Nov 2024 at 20:42, Heikki Linnakangas <hlinnaka@iki.fi> wrote: > Having spent some time playing with this, I quite like option C: break > compatibility, but provide an out-of-tree header file with > *forward*-compatibility macros. That encourages extension authors to > adapt to new idioms, but avoids having to sprinkle extension code with > #if version checks to support old versions. +1 maintaining a subset of these things for every extension is kind of a pain > My plan is to put this on the Wiki Why the wiki and not as a file in the repo? Seems like it would be nice to update this file together with patches that introduce such breakages. To be clear, I think it shouldn't be possible to #include the file, such a forward compatibility file should always be copy-pasted. But having it in the same place as the code seems useful, just like we update docs together with the code. > We could add helpers for previous > incompatibilities in v17 and v16 too, although at quick glance I'm not > sure what those might be. One thing that I personally add to any extension I maintain is the new foreach macros introduced in PG17, because they are just so much nicer to use. > I tested this approach by adapting pg_cron to build with these patches > and the compatibility header, and compiling it with all supported server > versoins. Works great, see adapt-pg_cron.patch. Looks like a simple change indeed. On Mon, 4 Nov 2024 at 20:42, Heikki Linnakangas <hlinnaka@iki.fi> wrote: > > On 31/10/2024 02:32, Michael Paquier wrote: > > On Wed, Oct 30, 2024 at 01:23:54PM -0400, Robert Haas wrote: > >> On Wed, Oct 30, 2024 at 12:03 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > >>> C) We could provide "forward-compatibility" macros in a separate header > >>> file, to make the new "SetInterrupt" etc calls work in old PostgreSQL > >>> versions. Many of the extensions already have a header file like this, > >>> see e.g. citusdata/citus/src/include/pg_version_compat.h, > >>> pipelinedb/pipelinedb/include/compat.h. It might actually be a good idea > >>> to provide a semi-official header file like this on the Postgres wiki, > >>> to help extension authors. It would encourage extensions to use the > >>> latest idioms, while still being able to compile for older versions. > >>> > >>> I'm leaning towards option C). Let's rip off the band-aid, but provide > >>> documentation for how to adapt your extension code. And provide a > >>> forwards-compatibility header on the wiki, that extension authors can > >>> use to make the new Interrupt calls work against old server versions. > >> > >> I don't know which of these options is best, but I don't find any of > >> them categorically unacceptable. > > > > Looking at the compatibility macros of 0008 for the latches with > > INTERRUPT_GENERAL_WAKEUP under latch.h, the changes are not that bad > > to adapt to, IMO. It reminds of f25968c49697: hard breakage, no > > complaints I've heard of because I guess that most folks have been > > using an in-house compatibility headers. > > > > A big disadvantage of B is that someone may decide to add new code in > > core that depends on the past routines, and we'd better avoid that for > > this new layer of APIs for interrupt handling. A is a subset of C: do > > a hard switch in the core code, with C mentioning a compatibility > > layer in the wiki that does not exist in the core code. Any of A or C > > is OK, I would not choose B for the core backend. > Having spent some time playing with this, I quite like option C: break > compatibility, but provide an out-of-tree header file with > *forward*-compatibility macros. That encourages extension authors to > adapt to new idioms, but avoids having to sprinkle extension code with > #if version checks to support old versions. > > See attached pg_version_compatibility.h header. It allows compiling code > that uses basic SendInterrupt, RaiseInterrupt, WaitInterrupt calls with > older server versions. My plan is to put this on the Wiki, and update it > with similar compatibility helpers for other changes we might make in > v18 or future versions. We could add helpers for previous > incompatibilities in v17 and v16 too, although at quick glance I'm not > sure what those might be. > > I tested this approach by adapting pg_cron to build with these patches > and the compatibility header, and compiling it with all supported server > versoins. Works great, see adapt-pg_cron.patch. > > I pushed the preliminary cleanup patches from this patch set earlier, > only the main patches remain. Attached is a new version of those, with > mostly comment cleanups. > > -- > Heikki Linnakangas > Neon (https://neon.tech)
On 05/11/2024 22:21, Jelte Fennema-Nio wrote: > On Mon, 4 Nov 2024 at 20:42, Heikki Linnakangas <hlinnaka@iki.fi> wrote: >> Having spent some time playing with this, I quite like option C: break >> compatibility, but provide an out-of-tree header file with >> *forward*-compatibility macros. That encourages extension authors to >> adapt to new idioms, but avoids having to sprinkle extension code with >> #if version checks to support old versions. > > +1 maintaining a subset of these things for every extension is kind of a pain > >> My plan is to put this on the Wiki > > Why the wiki and not as a file in the repo? Seems like it would be > nice to update this file together with patches that introduce such > breakages. To be clear, I think it shouldn't be possible to #include > the file, such a forward compatibility file should always be > copy-pasted. But having it in the same place as the code seems useful, > just like we update docs together with the code. I thought of the Wiki so that it could updated more casually by extension authors. But sure, it could be a file in the main repo too. -- Heikki Linnakangas Neon (https://neon.tech)
On Sun, 10 Nov 2024 at 22:30, Heikki Linnakangas <hlinnaka@iki.fi> wrote: > I thought of the Wiki so that it could updated more casually by > extension authors. Makes sense, I agree it would be nice not to have to rely on committers to merge these changes. Especially since many extension authors are not committers. So +1 for storing it on the wiki.
Some feedback on v4-0001-Replace-Latches-with-Interrupts.patch: + latch.c -> waiteventset.c +1 + /* + * INTERRUPT_RECOVERY_WAKEUP is used to wake up startup process, to tell + * it that it should continue WAL replay. It's sent by WAL receiver when + * more WAL arrives, or when promotion is requested. + */ + INTERRUPT_RECOVERY_CONTINUE, Names don't match here. I prefer _CONTINUE. As for the general one, I'm on the fence about INTERRUPT_GENERAL_WAKEUP, since wakeups aren't necessarily involved, but I don't have a specific better idea so I'm not objecting... Perhaps it's more like INTERRUPT_GENERAL_NOTIFY, except that _NOTIFY is already a well known thing, and the procsignal patch introduces INTERRUPT_NOTIFY... It looks like maybeSleepingOnInterrupts replaces maybe_sleeping, and SendInterrupt() would need to read it to suppress needless kill() calls, but doesn't yet, or am I missing something? Hmm, I think there are two kinds of kill() suppression that are missing compared to master: 1. No wakeup is needed if the bit is already set: SendInterrupt(InterruptType reason, ProcNumber pgprocno) { PGPROC *proc; + uint32 old_interrupts; Assert(pgprocno != INVALID_PROC_NUMBER); Assert(pgprocno >= 0); Assert(pgprocno < ProcGlobal->allProcCount); proc = &ProcGlobal->allProcs[pgprocno]; - pg_atomic_fetch_or_u32(&proc->pendingInterrupts, 1 << reason); - WakeupOtherProc(proc); + old_interrupts = pg_atomic_fetch_or_u32(&proc->pendingInterrupts, 1 << reason); + if ((old_interrupts & (1 << reason)) == 0) + WakeupOtherProc(proc); } That's equivalent to this removed latch code: - /* Quick exit if already set */ - if (latch->is_set) - return; ... except it's atomic, which I find a lot easier to think about. 2. Would it make sense to use a bit in pendingInterrupts as a replacement for the old maybe_sleeping flag? (Similar to typical implementation of mutexes and other such things, where you adjust the lock and atomically know whether you have to wake anyone.) Then we we could easily extend the check above to test that at the same time, with the same simple race-free atomic goodness: + if ((old_interrupts & (WAKEME | (1 << reason))) == WAKEME) + WakeupOtherProc(proc); I think the waiting side would also be tidier and simpler than what you have: you could use atomic_fetch_or to announce you're about to sleep, while atomically reading the interrupt bits already set up to that moment. More soon...
On Mon, Nov 18, 2024 at 11:09 PM Thomas Munro <thomas.munro@gmail.com> wrote: > Names don't match here. I prefer _CONTINUE. As for the general one, > I'm on the fence about INTERRUPT_GENERAL_WAKEUP, since wakeups aren't > necessarily involved, but I don't have a specific better idea so I'm > not objecting... Perhaps it's more like INTERRUPT_GENERAL_NOTIFY, > except that _NOTIFY is already a well known thing, and the procsignal > patch introduces INTERRUPT_NOTIFY... INTERRUPT_GENERAL with no third word isn't out of the question, either. -- Robert Haas EDB: http://www.enterprisedb.com