Re: Make query cancellation keys longer - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Make query cancellation keys longer
Date
Msg-id 3dceed41-7624-426c-b464-f47b4acb0c8e@iki.fi
Whole thread Raw
In response to Re: Make query cancellation keys longer  (Heikki Linnakangas <hlinnaka@iki.fi>)
Responses Re: Make query cancellation keys longer
List pgsql-hackers
On 24/07/2024 19:12, Heikki Linnakangas wrote:
> On 04/07/2024 15:20, Jelte Fennema-Nio wrote:
>> On Thu, 4 Jul 2024 at 12:32, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>>> We currently don't do any locking on the ProcSignal array. For query
>>> cancellations, that's good because a query cancel packet is processed
>>> without having a PGPROC entry, so we cannot take LWLocks. We could use
>>> spinlocks though. In this patch, I used memory barriers to ensure that
>>> we load/store the pid and the cancellation key in a sensible order, so
>>> that you cannot e.g. send a cancellation signal to a backend that's just
>>> starting up and hasn't advertised its cancellation key in ProcSignal
>>> yet. But I think this might be simpler and less error-prone by just
>>> adding a spinlock to each ProcSignal slot. That would also fix the
>>> existing race condition where we might set the pss_signalFlags flag for
>>> a slot, when the process concurrently terminates and the slot is reused
>>> for a different process. Because of that, we currently have this caveat:
>>> "... all the signals are such that no harm is done if they're mistakenly
>>> fired". With a spinlock, we could eliminate that race.
>>
>> I think a spinlock would make this thing a whole concurrency stuff a
>> lot easier to reason about.
>>
>> +   slot->pss_cancel_key_valid = false;
>> +   slot->pss_cancel_key = 0;
>>
>> If no spinlock is added, I think these accesses should still be made
>> atomic writes. Otherwise data-races on those fields are still
>> possible, resulting in undefined behaviour. The memory barriers you
>> added don't prevent that afaict. With atomic operations there are
>> still race conditions, but no data-races.
>>
>> Actually it seems like that same argument applies to the already
>> existing reading/writing of pss_pid: it's written/read using
>> non-atomic operations so data-races can occur and thus undefined
>> behaviour too.
> 
> Ok, here's a version with spinlocks.
> 
> I went back and forth on what exactly is protected by the spinlock. I
> kept the "volatile sig_atomic_t" type for pss_signalFlags, so that it
> can still be safely read without holding the spinlock in
> CheckProcSignal, but all the functions that set the flags now hold the
> spinlock. That removes the race condition that you might set the flag
> for wrong slot, if the target backend exits and the slot is recycled.
> The race condition was harmless and there were comments to note it, but
> it doesn't occur anymore with the spinlock.
> 
> (Note: Thomas's "Interrupts vs signals" patch will remove ps_signalFlags
> altogether. I'm looking forward to that.)
> 
>> -    volatile pid_t pss_pid;
>> +    pid_t        pss_pid;
>>
>> Why remove the volatile modifier?
> 
> Because I introduced a memory barrier to ensure the reads/writes of
> pss_pid become visible to other processes in right order. That makes the
> 'volatile' unnecessary IIUC. With the spinlock, the point is moot however.

I:
- fixed a few comments,
- fixed a straightforward failure with EXEC_BACKEND (ProcSignal needs to 
be passed down in BackendParameters now),
- put back the snippet to signal the whole process group if supported, 
which you pointed out earlier

and committed this refactoring patch.

Thanks for the review!

-- 
Heikki Linnakangas
Neon (https://neon.tech)




pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: Make COPY format extendable: Extract COPY TO format implementations
Next
From: Tom Lane
Date:
Subject: Re: Remove dead code generation tools in src/backend/utils/mb/