Re: Make query cancellation keys longer - Mailing list pgsql-hackers
From | Heikki Linnakangas |
---|---|
Subject | Re: Make query cancellation keys longer |
Date | |
Msg-id | 3dceed41-7624-426c-b464-f47b4acb0c8e@iki.fi Whole thread Raw |
In response to | Re: Make query cancellation keys longer (Heikki Linnakangas <hlinnaka@iki.fi>) |
Responses |
Re: Make query cancellation keys longer
|
List | pgsql-hackers |
On 24/07/2024 19:12, Heikki Linnakangas wrote: > On 04/07/2024 15:20, Jelte Fennema-Nio wrote: >> On Thu, 4 Jul 2024 at 12:32, Heikki Linnakangas <hlinnaka@iki.fi> wrote: >>> We currently don't do any locking on the ProcSignal array. For query >>> cancellations, that's good because a query cancel packet is processed >>> without having a PGPROC entry, so we cannot take LWLocks. We could use >>> spinlocks though. In this patch, I used memory barriers to ensure that >>> we load/store the pid and the cancellation key in a sensible order, so >>> that you cannot e.g. send a cancellation signal to a backend that's just >>> starting up and hasn't advertised its cancellation key in ProcSignal >>> yet. But I think this might be simpler and less error-prone by just >>> adding a spinlock to each ProcSignal slot. That would also fix the >>> existing race condition where we might set the pss_signalFlags flag for >>> a slot, when the process concurrently terminates and the slot is reused >>> for a different process. Because of that, we currently have this caveat: >>> "... all the signals are such that no harm is done if they're mistakenly >>> fired". With a spinlock, we could eliminate that race. >> >> I think a spinlock would make this thing a whole concurrency stuff a >> lot easier to reason about. >> >> + slot->pss_cancel_key_valid = false; >> + slot->pss_cancel_key = 0; >> >> If no spinlock is added, I think these accesses should still be made >> atomic writes. Otherwise data-races on those fields are still >> possible, resulting in undefined behaviour. The memory barriers you >> added don't prevent that afaict. With atomic operations there are >> still race conditions, but no data-races. >> >> Actually it seems like that same argument applies to the already >> existing reading/writing of pss_pid: it's written/read using >> non-atomic operations so data-races can occur and thus undefined >> behaviour too. > > Ok, here's a version with spinlocks. > > I went back and forth on what exactly is protected by the spinlock. I > kept the "volatile sig_atomic_t" type for pss_signalFlags, so that it > can still be safely read without holding the spinlock in > CheckProcSignal, but all the functions that set the flags now hold the > spinlock. That removes the race condition that you might set the flag > for wrong slot, if the target backend exits and the slot is recycled. > The race condition was harmless and there were comments to note it, but > it doesn't occur anymore with the spinlock. > > (Note: Thomas's "Interrupts vs signals" patch will remove ps_signalFlags > altogether. I'm looking forward to that.) > >> - volatile pid_t pss_pid; >> + pid_t pss_pid; >> >> Why remove the volatile modifier? > > Because I introduced a memory barrier to ensure the reads/writes of > pss_pid become visible to other processes in right order. That makes the > 'volatile' unnecessary IIUC. With the spinlock, the point is moot however. I: - fixed a few comments, - fixed a straightforward failure with EXEC_BACKEND (ProcSignal needs to be passed down in BackendParameters now), - put back the snippet to signal the whole process group if supported, which you pointed out earlier and committed this refactoring patch. Thanks for the review! -- Heikki Linnakangas Neon (https://neon.tech)
pgsql-hackers by date: