Re: Atomic operations within spinlocks - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Atomic operations within spinlocks
Date
Msg-id 20200606001926.zin5getvvhqppnm2@alap3.anarazel.de
Whole thread Raw
In response to Re: Atomic operations within spinlocks  (Andres Freund <andres@anarazel.de>)
Responses Re: Atomic operations within spinlocks  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Atomic operations within spinlocks  (Andres Freund <andres@anarazel.de>)
Re: Atomic operations within spinlocks  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
Hi,

On 2020-06-04 19:33:02 -0700, Andres Freund wrote:
> But it looks like that code is currently buggy (and looks like it always
> has been), because we don't look at the nested argument when
> initializing the semaphore.  So we currently allocate too many
> semaphores, without benefiting from them :(.

I wrote a patch for this, and when I got around to to testing it, I
found that our tests currently don't pass when using both
--disable-spinlocks and --disable-atomics. Turns out to not be related
to the issue above, but the global barrier support added in 13.

That *reads* two 64 bit atomics in a signal handler. Which is normally
fine, but not at all cool when atomics (or just 64 bit atomics) are
backed by spinlocks. Because we can "self interrupt" while already
holding the spinlock.

It looks to me that that's a danger whenever 64bit atomics are backed by
spinlocks, not just when both --disable-spinlocks and --disable-atomics
are used. But I suspect that it's really hard to hit the tiny window of
danger when those options aren't used. While we have buildfarm animals
testing each of those separately, we don't have one that tests both
together...

I'm not really sure what to do about that issue. The easisest thing
would probably be to change the barrier generation to 32bit (which
doesn't have to use locks for reads in any situation).  I tested doing
that, and it fixes the hangs for me.


Randomly noticed while looking at the code:
    uint64        flagbit = UINT64CONST(1) << (uint64) type;

that shouldn't be 64bit, right?

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: Trouble with hashagg spill I/O pattern and costing
Next
From: Stephen Frost
Date:
Subject: Re: WIP: WAL prefetch (another approach)