Re: postgres has no spinlock support on riscv rv64imafdc - Mailing list pgsql-bugs

From Thomas Munro
Subject Re: postgres has no spinlock support on riscv rv64imafdc
Date
Msg-id CA+hUKGL1m-zJeFvY_yOAJbNUh-aPR5fk_zHM=HuD6KVzYRhCsg@mail.gmail.com
Whole thread Raw
In response to Re: postgres has no spinlock support on riscv rv64imafdc  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-bugs
On Fri, Apr 16, 2021 at 2:04 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Thomas Munro <thomas.munro@gmail.com> writes:
> > As a data point: the obvious change builds and passes basic testing
> > with flying colours, on FreeBSD + clang 11 running under qemu.  RISC-V
> > has been "non-experimental" since clang 9.  FWIW I've signed up to the
> > preorder list to try to get my hands on some BeagleV hardware to test
> > this stuff for real ASAP...
>
> Yeah, I'm on that list too ... crickets so far.

I passed this change on to the brave souls who are bringing up the
whole universe of open source software on this new architecture, in
the FreeBSD ports tree.  PostgreSQL's failure to compile was blocking
hundreds of dependent packages so it seemed good to at least try
something at this early stage (other databases are still blockers;
apparently you can trust database hackers to use weird arch-specific
stuff).  It'll be interesting to see if any reports of brokenness come
in from that, but for now I just know that a whole mountain of new
stuff now builds there.

One thing I noticed while worrying about whether it might have
insufficiently strong barriers is that Clang and GCC are generating
different code:

__sync_lock_test_and_set():
 Clang 11: amoswap.w.aqrl  a0,a1,(a0)
 GCC 9:    amoswap.w.aq    a4,a4,(a5)

__sync_lock_release():
 Clang 11: fence   rw,w
           li      a1,0
           sw      a1,0(a0)
 GCC 9:    fence   iorw,ow
           amoswap.w       zero,zero,(a5)

S_LOCK() looks a bit like figure A7 of the manual[1], with at least
acquire semantics (but also release on Clang), and S_UNLOCK() looks a
bit like figure A8, with an explicit release.  I don't immediately
know why acquire and release semantics wouldn't be enough here.
Anyone?  I do see that ARM, Itanium and PowerPC include a full
barrier, though in different places: on ARM,
__sync_lock_test_and_set() on ARM generates ldxr, stxr, dmb (= full
memory barrier) rather than using ldaxr (= acquire) and stlxr (=
release), and this seems to be since [2], which is quite interesting
and might be a clue; on Itanium and PowerPC, it's explicit in
S_UNLOCK().

Some of the questions I have are: can anyone with real hardware access
reproduce the failure reported back in 2016[3], or was that entirely
due to the pre-alpha toolchain?  If it can be reproduced, is it with
GCC only and not Clang, and does adding an initial
__sync_synchronize() help?

[1] https://riscv.org/wp-content/uploads/2019/06/riscv-spec.pdf
[2] https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=a96297a28065d27559d17ebe1e0eda308a05e965
[3] https://www.postgresql.org/message-id/20161120184942.GP11243%40redhat.com



pgsql-bugs by date:

Previous
From: RekGRpth
Date:
Subject: Re: BUG #16974: memory leak
Next
From: Michael Paquier
Date:
Subject: Re: BUG #16972: parameter parallel_leader_participation's category problem