Thread: pgsql: Fix double-release of spinlock
Fix double-release of spinlock Commit 9d9b9d46f3 added spinlocks to protect the fields in ProcSignal flags, but in EmitProcSignalBarrier(), the spinlock was released twice. With most spinlock implementations, releasing a lock that's not held is not easy to notice, because most of the time it does nothing, but if the spinlock was concurrently acquired by another process, it could lead to more serious issues. Fortunately, with the --disable-spinlocks emulation implementation, it caused more visible failures. In the passing, fix a type in comment and add an assertion that the procNumber passed to SendProcSignal looks valid. Discussion: https://www.postgresql.org/message-id/b8ce284c-18a2-4a79-afd3-1991a2e7d246@iki.fi Branch ------ master Details ------- https://git.postgresql.org/pg/commitdiff/0393f542d72c6182271c392d9a83d0fc775113c7 Modified Files -------------- src/backend/storage/ipc/procsignal.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
Heikki Linnakangas <heikki.linnakangas@iki.fi> writes: > Commit 9d9b9d46f3 added spinlocks to protect the fields in ProcSignal > flags, but in EmitProcSignalBarrier(), the spinlock was released > twice. With most spinlock implementations, releasing a lock that's not > held is not easy to notice, because most of the time it does nothing, > but if the spinlock was concurrently acquired by another process, it > could lead to more serious issues. Fortunately, with the > --disable-spinlocks emulation implementation, it caused more visible > failures. There was some recent discussion about getting rid of --disable-spinlocks on the grounds that nobody would use hardware that lacked native spinlocks. But now I wonder if there is a testing/debugging reason to keep it. regards, tom lane
Hi, On 2024-07-29 11:31:56 -0400, Tom Lane wrote: > Heikki Linnakangas <heikki.linnakangas@iki.fi> writes: > > Commit 9d9b9d46f3 added spinlocks to protect the fields in ProcSignal > > flags, but in EmitProcSignalBarrier(), the spinlock was released > > twice. With most spinlock implementations, releasing a lock that's not > > held is not easy to notice, because most of the time it does nothing, > > but if the spinlock was concurrently acquired by another process, it > > could lead to more serious issues. Fortunately, with the > > --disable-spinlocks emulation implementation, it caused more visible > > failures. > > There was some recent discussion about getting rid of > --disable-spinlocks on the grounds that nobody would use > hardware that lacked native spinlocks. But now I wonder > if there is a testing/debugging reason to keep it. Seems it'd be a lot more straightforward to just add an assertion to the x86-64 spinlock implementation verifying that the spinlock isn't already free? Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > On 2024-07-29 11:31:56 -0400, Tom Lane wrote: >> There was some recent discussion about getting rid of >> --disable-spinlocks on the grounds that nobody would use >> hardware that lacked native spinlocks. But now I wonder >> if there is a testing/debugging reason to keep it. > Seems it'd be a lot more straightforward to just add an assertion to the > x86-64 spinlock implementation verifying that the spinlock isn't already free? I dunno, is that the only extra check that the --disable-spinlocks implementation is providing? I'm kind of allergic to putting Asserts into spinlocked code segments, mostly on the grounds that it violates the straight-line-code precept. I suppose it's not really that bad for tests that you don't expect to fail, but still ... regards, tom lane
Hi, On 2024-07-29 12:33:13 -0400, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > On 2024-07-29 11:31:56 -0400, Tom Lane wrote: > >> There was some recent discussion about getting rid of > >> --disable-spinlocks on the grounds that nobody would use > >> hardware that lacked native spinlocks. But now I wonder > >> if there is a testing/debugging reason to keep it. > > > Seems it'd be a lot more straightforward to just add an assertion to the > > x86-64 spinlock implementation verifying that the spinlock isn't already free? FWIW, I quickly hacked that up, and it indeed quickly fails with 0393f542d72^ and passes with 0393f542d72. > I dunno, is that the only extra check that the --disable-spinlocks > implementation is providing? I think it also provides the (valuable!) check that spinlocks were actually initialized. But that again seems like something we'd be better off adding more general infrastructure for - nobody runs --disable-spinlocks locally, we shouldn't need to run this on the buildfarm to find problems like this. > I'm kind of allergic to putting Asserts into spinlocked code segments, > mostly on the grounds that it violates the straight-line-code precept. > I suppose it's not really that bad for tests that you don't expect > to fail, but still ... I don't think the spinlock implementation itself is really affected by that rule - after all, the --disable-spinlocks implementation actually consists out of several layers of external function calls (including syscalls in some cases!). Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > On 2024-07-29 12:33:13 -0400, Tom Lane wrote: >> I dunno, is that the only extra check that the --disable-spinlocks >> implementation is providing? > I think it also provides the (valuable!) check that spinlocks were actually > initialized. But that again seems like something we'd be better off adding > more general infrastructure for - nobody runs --disable-spinlocks locally, we > shouldn't need to run this on the buildfarm to find problems like this. Hmm, but how? One of the things we gave up by nuking HPPA support was that that platform's representation of an initialized, free spinlock was not all-zeroes, so that it'd catch this type of problem. I think all the remaining platforms do use zeroes, so it's hard to see how anything short of valgrind would be likely to catch it. regards, tom lane
On Mon, Jul 29, 2024 at 12:40 PM Andres Freund <andres@anarazel.de> wrote: > I think it also provides the (valuable!) check that spinlocks were actually > initialized. But that again seems like something we'd be better off adding > more general infrastructure for - nobody runs --disable-spinlocks locally, we > shouldn't need to run this on the buildfarm to find problems like this. +1. It sucks to have to do special builds to catch a certain kind of problem. I know I've been guilty of that (ahem, debug_parallel_query f/k/a force_parallel_mode) but I'm not going to put it on my CV as one of my great accomplishments. It's much better if we can find a way for a standard 'make check-world' to tell us about as many things as possible, so that we don't commit and then find out. -- Robert Haas EDB: http://www.enterprisedb.com
On 2024-07-29 12:45:19 -0400, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > On 2024-07-29 12:33:13 -0400, Tom Lane wrote: > >> I dunno, is that the only extra check that the --disable-spinlocks > >> implementation is providing? > > > I think it also provides the (valuable!) check that spinlocks were actually > > initialized. But that again seems like something we'd be better off adding > > more general infrastructure for - nobody runs --disable-spinlocks locally, we > > shouldn't need to run this on the buildfarm to find problems like this. > > Hmm, but how? I think there's a few ways: > One of the things we gave up by nuking HPPA support > was that that platform's representation of an initialized, free > spinlock was not all-zeroes, so that it'd catch this type of problem. > I think all the remaining platforms do use zeroes, so it's hard to > see how anything short of valgrind would be likely to catch it. 1) There's nothing forcing us to use 0/1 for most of the spinlock implementations. E.g. for x86-64 we could use 0 for uninitialized, 1 for free and 2 for locked. 2) We could also change the layout of slock_t in assert enabled builds, adding a dedicated 'initialized' field when assertions are enabled. But that might be annoying from an ABI POV? 1) seems preferrable, so I gave it a quick try. Seems to work. There's a *slight* difference in the instruction sequence: old: 41f6: f0 86 10 lock xchg %dl,(%rax) 41f9: 84 d2 test %dl,%dl 41fb: 75 1b jne 4218 <GetRecoveryState+0x38> new: 4216: f0 86 10 lock xchg %dl,(%rax) 4219: 80 fa 02 cmp $0x2,%dl 421c: 74 22 je 4240 <GetRecoveryState+0x40> I.e. the version using 2 as the locked state uses a three byte instruction vs a two byte instruction before. *If* we are worried about this, we could a) Change the representation only for assert enabled builds, but that'd have ABI issues again. b) Instead define the spinlock to have 1 as the unlocked state and 0 as the locked state. That makes it a bit harder to understand that initialization is missing, compared to a dedicated state, as the first use of the spinlock just blocks. To make 1) b) easier to understand it might be worth changing the slock_t typedef to be something like typedef struct slock_t { char is_free; } slock_t; which also might help catch some cases of type confusion - the char typedef is too forgiving imo. Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > On 2024-07-29 12:45:19 -0400, Tom Lane wrote: >> Hmm, but how? > ... > I.e. the version using 2 as the locked state uses a three byte instruction vs > a two byte instruction before. > *If* we are worried about this, we could > a) Change the representation only for assert enabled builds, but that'd have > ABI issues again. Agreed, that would be a very bad idea. It would for example break the case of a non-assert-enabled extension used with an assert-enabled core or vice versa, which is something we've gone out of our way to allow. > b) Instead define the spinlock to have 1 as the unlocked state and 0 as the > locked state. That makes it a bit harder to understand that initialization > is missing, compared to a dedicated state, as the first use of the spinlock > just blocks. This option works for me. > To make 1) b) easier to understand it might be worth changing the slock_t > typedef to be something like > typedef struct slock_t > { > char is_free; > } slock_t; +1 How much of this would we change across platforms, and how much would be x86-only? I think there are enough people developing on ARM (e.g. Mac) now to make it worth covering that, but maybe we don't care so much about anything else. regards, tom lane