Re: Interrupts vs signals - Mailing list pgsql-hackers
From | Heikki Linnakangas |
---|---|
Subject | Re: Interrupts vs signals |
Date | |
Msg-id | 3b996d73-2a10-4072-a58d-0386af0cbb78@iki.fi Whole thread Raw |
In response to | Re: Interrupts vs signals (Thomas Munro <thomas.munro@gmail.com>) |
Responses |
Re: Interrupts vs signals
Re: Interrupts vs signals |
List | pgsql-hackers |
On 08/07/2024 05:56, Thomas Munro wrote: > Here's an updated version of this patch. > > The main idea is that SendProcSignal(pid, PROCSIGNAL_XXX, procno) > becomes SendInterrupt(INTERRUPT_XXX, procno), and all the pending > interrupt global variables and pss_procsignalFlags[] go away, along > with the SIGUSR1 handler. The interrupts are compressed into a single > bitmap. See commit message for more details. > > The patch is failing on Windows CI for reasons I haven't debugged yet, > but I wanted to share what I have so far. Work in progress! > > Here is my attempt to survey the use of signals and write down what I > think we might do about them all so far, to give the context for this > patch: > > https://wiki.postgresql.org/wiki/Signals > > Comments, corrections, edits very welcome. Nice, thanks! > Background worker state notifications are also changed from raw > kill(SIGUSR1) to SetLatch(). That means that SetLatch() is now called > from the postmaster. The main purpose of including that change is to be > able to remove procsignal_sigusr1_handler completely and set SIGUSR1 to > SIG_IGN, and show the system working. > > XXX Do we need to invent SetLatchRobust() that doesn't trust anything in > shared memory, to be able to set latches from the postmaster? The patch actually does both: it still does kill(SIGUSR1) and also sets the latch. I think it would be nice if RegisterDynamicBackgroundWorker() had a "bool notify_me" argument, instead of requiring the caller to set "bgw_notify_pid = MyProcPid" before the call. That's a backwards-compatibility break, but maybe we should bite the bullet and do it. Or we could do this in RegisterDynamicBackgroundWorker(): if (worker->bgw_notify_pid == MyProcPid) worker->bgw_notify_pgprocno = MyProcNumber; I think we can forbid setting pgw_notify_pid to anything else than 0 or MyProcPid. A SetLatchRobust would be nice. Looking at SetLatch(), I don't think it can do any damage if you called it on a pointer to garbage, except if the pointer itself is bogus, then just dereferencing it an cause a segfault. So it would be nice to have a version specifically designed with that in mind. For example, it could assume that the Latch's pid is never legally equal to MyProcPid, because postmaster cannot own any latches. Another approach would be to move the responsibility of background worker state notifications out of postmaster completely. When a new background worker is launched, the worker process itself could send the notification that it has started. And similarly, when a worker exits, it could send the notification just before exiting. There's a little race condition with exiting: if a process is waiting for the bgworker to exit, and launches a new worker immediately when the old one exits, there will be a brief period when the old and new process are alive at the same time. The old worker wouldn't be doing anything interesting anymore since it's exiting, but it still counts towards max_worker_processes, so launching the new process might fail because of hitting the limit. Maybe we should just bump up max_worker_processes. Or postmaster could check PMChildFlags and not count processes that have already deregistered from PMChildFlags towards the limit. > -volatile uint32 InterruptHoldoffCount = 0; > -volatile uint32 QueryCancelHoldoffCount = 0; > -volatile uint32 CritSectionCount = 0; > +uint32 InterruptHoldoffCount = 0; > +uint32 QueryCancelHoldoffCount = 0; > +uint32 CritSectionCount = 0; I wondered if these are used in PG_TRY-CATCH blocks in a way that would still require volatile. I couldn't find any such usage by some quick grepping, so I think we're good, but I thought I'd mention it. > +/* > + * The set of "standard" interrupts that CHECK_FOR_INTERRUPTS() and > + * ProcessInterrupts() handle. These perform work that is safe to run whenever > + * interrupts are not "held". Other kinds of interrupts are only handled at > + * more restricted times. > + */ > +#define INTERRUPT_STANDARD_MASK \ Some interrupts are missing from this mask: - INTERRUPT_PARALLEL_APPLY_MESSAGE - INTERRUPT_IDLE_STATS_UPDATE_TIMEOUT - INTERRUPT_SINVAL_CATCHUP - INTERRUPT_NOTIFY Is that on purpose? > -/* > - * Because backends sitting idle will not be reading sinval events, we > - * need a way to give an idle backend a swift kick in the rear and make > - * it catch up before the sinval queue overflows and forces it to go > - * through a cache reset exercise. This is done by sending > - * PROCSIG_CATCHUP_INTERRUPT to any backend that gets too far behind. > - * > - * The signal handler will set an interrupt pending flag and will set the > - * processes latch. Whenever starting to read from the client, or when > - * interrupted while doing so, ProcessClientReadInterrupt() will call > - * ProcessCatchupEvent(). > - */ > -volatile sig_atomic_t catchupInterruptPending = false; Would be nice to move that comment somewhere else rather than remove it completely. > --- a/src/backend/storage/lmgr/proc.c > +++ b/src/backend/storage/lmgr/proc.c > @@ -444,6 +444,10 @@ InitProcess(void) > OwnLatch(&MyProc->procLatch); > SwitchToSharedLatch(); > > + /*We're now ready to accept interrupts from other processes. */ > + pg_atomic_init_u32(&MyProc->pending_interrupts, 0); > + SwitchToSharedInterrupts(); > + > /* now that we have a proc, report wait events to shared memory */ > pgstat_set_wait_event_storage(&MyProc->wait_event_info); > > @@ -611,6 +615,9 @@ InitAuxiliaryProcess(void) > OwnLatch(&MyProc->procLatch); > SwitchToSharedLatch(); > > + /* We're now ready to accept interrupts from other processes. */ > + SwitchToSharedInterrupts(); > + > /* now that we have a proc, report wait events to shared memory */ > pgstat_set_wait_event_storage(&MyProc->wait_event_info); > Is there a reason for the different initialization between a regular backend and aux process? > +/* > + * Switch to shared memory interrupts. Other backends can send interrupts > + * to this one if they know its ProcNumber. > + */ > +void > +SwitchToSharedInterrupts(void) > +{ > + pg_atomic_fetch_or_u32(&MyProc->pending_interrupts, pg_atomic_read_u32(MyPendingInterrupts)); > + MyPendingInterrupts = &MyProc->pending_interrupts; > +} Hmm, I think there's a race condition here (and similarly in SwitchToLocalInterrupts), if the signal handler runs between the pg_atomic_fetch_or_u32, and changing MyPendingInterrupts. Maybe something like this instead: MyPendingInterrupts = &MyProc->pending_interrupts; pg_memory_barrier(); pg_atomic_fetch_or_u32(&MyProc->pending_interrupts, pg_atomic_read_u32(LocalPendingInterrupts)); And perhaps this should also clear LocalPendingInterrupts, just to be tidy. > @@ -138,7 +139,8 @@ > typedef struct ProcState > { > /* procPid is zero in an inactive ProcState array entry. */ > - pid_t procPid; /* PID of backend, for signaling */ > + pid_t procPid; /* pid of backend */ > + ProcNumber pgprocno; /* for sending interrupts */ > /* nextMsgNum is meaningless if procPid == 0 or resetState is true. */ > int nextMsgNum; /* next message number to read */ > bool resetState; /* backend needs to reset its state */ We can easily remove procPid altogether now that we have pgprocno here. Similarly with the pid/pgprocno in ReplicationSlot and WalSndState. -- Heikki Linnakangas Neon (https://neon.tech)
pgsql-hackers by date: