Thread: pgsql: Fix double-release of spinlock

pgsql: Fix double-release of spinlock

From
Heikki Linnakangas
Date:
Fix double-release of spinlock

Commit 9d9b9d46f3 added spinlocks to protect the fields in ProcSignal
flags, but in EmitProcSignalBarrier(), the spinlock was released
twice. With most spinlock implementations, releasing a lock that's not
held is not easy to notice, because most of the time it does nothing,
but if the spinlock was concurrently acquired by another process, it
could lead to more serious issues. Fortunately, with the
--disable-spinlocks emulation implementation, it caused more visible
failures.

In the passing, fix a type in comment and add an assertion that the
procNumber passed to SendProcSignal looks valid.

Discussion: https://www.postgresql.org/message-id/b8ce284c-18a2-4a79-afd3-1991a2e7d246@iki.fi

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/0393f542d72c6182271c392d9a83d0fc775113c7

Modified Files
--------------
src/backend/storage/ipc/procsignal.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)


Re: pgsql: Fix double-release of spinlock

From
Tom Lane
Date:
Heikki Linnakangas <heikki.linnakangas@iki.fi> writes:
> Commit 9d9b9d46f3 added spinlocks to protect the fields in ProcSignal
> flags, but in EmitProcSignalBarrier(), the spinlock was released
> twice. With most spinlock implementations, releasing a lock that's not
> held is not easy to notice, because most of the time it does nothing,
> but if the spinlock was concurrently acquired by another process, it
> could lead to more serious issues. Fortunately, with the
> --disable-spinlocks emulation implementation, it caused more visible
> failures.

There was some recent discussion about getting rid of
--disable-spinlocks on the grounds that nobody would use
hardware that lacked native spinlocks.  But now I wonder
if there is a testing/debugging reason to keep it.

            regards, tom lane



Re: pgsql: Fix double-release of spinlock

From
Andres Freund
Date:
Hi,

On 2024-07-29 11:31:56 -0400, Tom Lane wrote:
> Heikki Linnakangas <heikki.linnakangas@iki.fi> writes:
> > Commit 9d9b9d46f3 added spinlocks to protect the fields in ProcSignal
> > flags, but in EmitProcSignalBarrier(), the spinlock was released
> > twice. With most spinlock implementations, releasing a lock that's not
> > held is not easy to notice, because most of the time it does nothing,
> > but if the spinlock was concurrently acquired by another process, it
> > could lead to more serious issues. Fortunately, with the
> > --disable-spinlocks emulation implementation, it caused more visible
> > failures.
> 
> There was some recent discussion about getting rid of
> --disable-spinlocks on the grounds that nobody would use
> hardware that lacked native spinlocks.  But now I wonder
> if there is a testing/debugging reason to keep it.

Seems it'd be a lot more straightforward to just add an assertion to the
x86-64 spinlock implementation verifying that the spinlock isn't already free?

Greetings,

Andres Freund



Re: pgsql: Fix double-release of spinlock

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> On 2024-07-29 11:31:56 -0400, Tom Lane wrote:
>> There was some recent discussion about getting rid of
>> --disable-spinlocks on the grounds that nobody would use
>> hardware that lacked native spinlocks.  But now I wonder
>> if there is a testing/debugging reason to keep it.

> Seems it'd be a lot more straightforward to just add an assertion to the
> x86-64 spinlock implementation verifying that the spinlock isn't already free?

I dunno, is that the only extra check that the --disable-spinlocks
implementation is providing?

I'm kind of allergic to putting Asserts into spinlocked code segments,
mostly on the grounds that it violates the straight-line-code precept.
I suppose it's not really that bad for tests that you don't expect
to fail, but still ...

            regards, tom lane



Re: pgsql: Fix double-release of spinlock

From
Andres Freund
Date:
Hi,

On 2024-07-29 12:33:13 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2024-07-29 11:31:56 -0400, Tom Lane wrote:
> >> There was some recent discussion about getting rid of
> >> --disable-spinlocks on the grounds that nobody would use
> >> hardware that lacked native spinlocks.  But now I wonder
> >> if there is a testing/debugging reason to keep it.
> 
> > Seems it'd be a lot more straightforward to just add an assertion to the
> > x86-64 spinlock implementation verifying that the spinlock isn't already free?

FWIW, I quickly hacked that up, and it indeed quickly fails with 0393f542d72^
and passes with 0393f542d72.


> I dunno, is that the only extra check that the --disable-spinlocks
> implementation is providing?

I think it also provides the (valuable!) check that spinlocks were actually
initialized. But that again seems like something we'd be better off adding
more general infrastructure for - nobody runs --disable-spinlocks locally, we
shouldn't need to run this on the buildfarm to find problems like this.


> I'm kind of allergic to putting Asserts into spinlocked code segments,
> mostly on the grounds that it violates the straight-line-code precept.
> I suppose it's not really that bad for tests that you don't expect
> to fail, but still ...

I don't think the spinlock implementation itself is really affected by that
rule - after all, the --disable-spinlocks implementation actually consists out
of several layers of external function calls (including syscalls in some
cases!).

Greetings,

Andres Freund



Re: pgsql: Fix double-release of spinlock

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> On 2024-07-29 12:33:13 -0400, Tom Lane wrote:
>> I dunno, is that the only extra check that the --disable-spinlocks
>> implementation is providing?

> I think it also provides the (valuable!) check that spinlocks were actually
> initialized. But that again seems like something we'd be better off adding
> more general infrastructure for - nobody runs --disable-spinlocks locally, we
> shouldn't need to run this on the buildfarm to find problems like this.

Hmm, but how?  One of the things we gave up by nuking HPPA support
was that that platform's representation of an initialized, free
spinlock was not all-zeroes, so that it'd catch this type of problem.
I think all the remaining platforms do use zeroes, so it's hard to
see how anything short of valgrind would be likely to catch it.

            regards, tom lane



Re: pgsql: Fix double-release of spinlock

From
Robert Haas
Date:
On Mon, Jul 29, 2024 at 12:40 PM Andres Freund <andres@anarazel.de> wrote:
> I think it also provides the (valuable!) check that spinlocks were actually
> initialized. But that again seems like something we'd be better off adding
> more general infrastructure for - nobody runs --disable-spinlocks locally, we
> shouldn't need to run this on the buildfarm to find problems like this.

+1. It sucks to have to do special builds to catch a certain kind of
problem. I know I've been guilty of that (ahem, debug_parallel_query
f/k/a force_parallel_mode) but I'm not going to put it on my CV as one
of my great accomplishments. It's much better if we can find a way for
a standard 'make check-world' to tell us about as many things as
possible, so that we don't commit and then find out.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: pgsql: Fix double-release of spinlock

From
Andres Freund
Date:
On 2024-07-29 12:45:19 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2024-07-29 12:33:13 -0400, Tom Lane wrote:
> >> I dunno, is that the only extra check that the --disable-spinlocks
> >> implementation is providing?
>
> > I think it also provides the (valuable!) check that spinlocks were actually
> > initialized. But that again seems like something we'd be better off adding
> > more general infrastructure for - nobody runs --disable-spinlocks locally, we
> > shouldn't need to run this on the buildfarm to find problems like this.
>
> Hmm, but how?

I think there's a few ways:

> One of the things we gave up by nuking HPPA support
> was that that platform's representation of an initialized, free
> spinlock was not all-zeroes, so that it'd catch this type of problem.
> I think all the remaining platforms do use zeroes, so it's hard to
> see how anything short of valgrind would be likely to catch it.

1) There's nothing forcing us to use 0/1 for most of the spinlock
implementations. E.g. for x86-64 we could use 0 for uninitialized, 1 for free
and 2 for locked.

2) We could also change the layout of slock_t in assert enabled builds, adding
a dedicated 'initialized' field when assertions are enabled. But that might be
annoying from an ABI POV?


1) seems preferrable, so I gave it a quick try. Seems to work. There's a
*slight* difference in the instruction sequence:

old:
    41f6:    f0 86 10                 lock xchg %dl,(%rax)
    41f9:    84 d2                    test   %dl,%dl
    41fb:    75 1b                    jne    4218 <GetRecoveryState+0x38>

new:
    4216:    f0 86 10                 lock xchg %dl,(%rax)
    4219:    80 fa 02                 cmp    $0x2,%dl
    421c:    74 22                    je     4240 <GetRecoveryState+0x40>

I.e. the version using 2 as the locked state uses a three byte instruction vs
a two byte instruction before.


*If* we are worried about this, we could

a) Change the representation only for assert enabled builds, but that'd have
   ABI issues again.

b) Instead define the spinlock to have 1 as the unlocked state and 0 as the
   locked state. That makes it a bit harder to understand that initialization
   is missing, compared to a dedicated state, as the first use of the spinlock
   just blocks.


To make 1) b) easier to understand it might be worth changing the slock_t
typedef to be something like

typedef struct slock_t
{
        char is_free;
} slock_t;

which also might help catch some cases of type confusion - the char typedef is
too forgiving imo.

Greetings,

Andres Freund



Re: pgsql: Fix double-release of spinlock

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> On 2024-07-29 12:45:19 -0400, Tom Lane wrote:
>> Hmm, but how?

> ...
> I.e. the version using 2 as the locked state uses a three byte instruction vs
> a two byte instruction before.
> *If* we are worried about this, we could
> a) Change the representation only for assert enabled builds, but that'd have
>    ABI issues again.

Agreed, that would be a very bad idea.  It would for example break the
case of a non-assert-enabled extension used with an assert-enabled
core or vice versa, which is something we've gone out of our way to
allow.

> b) Instead define the spinlock to have 1 as the unlocked state and 0 as the
>    locked state. That makes it a bit harder to understand that initialization
>    is missing, compared to a dedicated state, as the first use of the spinlock
>    just blocks.

This option works for me.

> To make 1) b) easier to understand it might be worth changing the slock_t
> typedef to be something like

> typedef struct slock_t
> {
>         char is_free;
> } slock_t;

+1

How much of this would we change across platforms, and how much
would be x86-only?  I think there are enough people developing on
ARM (e.g. Mac) now to make it worth covering that, but maybe we
don't care so much about anything else.

            regards, tom lane