On Thu, 4 Jul 2024 at 12:32, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> We currently don't do any locking on the ProcSignal array. For query
> cancellations, that's good because a query cancel packet is processed
> without having a PGPROC entry, so we cannot take LWLocks. We could use
> spinlocks though. In this patch, I used memory barriers to ensure that
> we load/store the pid and the cancellation key in a sensible order, so
> that you cannot e.g. send a cancellation signal to a backend that's just
> starting up and hasn't advertised its cancellation key in ProcSignal
> yet. But I think this might be simpler and less error-prone by just
> adding a spinlock to each ProcSignal slot. That would also fix the
> existing race condition where we might set the pss_signalFlags flag for
> a slot, when the process concurrently terminates and the slot is reused
> for a different process. Because of that, we currently have this caveat:
> "... all the signals are such that no harm is done if they're mistakenly
> fired". With a spinlock, we could eliminate that race.
I think a spinlock would make this thing a whole concurrency stuff a
lot easier to reason about.
+ slot->pss_cancel_key_valid = false;
+ slot->pss_cancel_key = 0;
If no spinlock is added, I think these accesses should still be made
atomic writes. Otherwise data-races on those fields are still
possible, resulting in undefined behaviour. The memory barriers you
added don't prevent that afaict. With atomic operations there are
still race conditions, but no data-races.
Actually it seems like that same argument applies to the already
existing reading/writing of pss_pid: it's written/read using
non-atomic operations so data-races can occur and thus undefined
behaviour too.
- volatile pid_t pss_pid;
+ pid_t pss_pid;
Why remove the volatile modifier?