Thread: PostgreSQL cannot be compiled on RISC-V

PostgreSQL cannot be compiled on RISC-V

From
"Richard W.M. Jones"
Date:
The full log is here:
https://fedorapeople.org/groups/risc-v/logs/postgresql/9.5.5-1.fc25.0.riscv64/build.log

The extract which fails is:

gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute
-Wformat-security-fno-strict-aliasing -fwrapv -fexcess-precision=standard -O2 -g -pipe -Wall -Werror=format-security
-Wp,-D_FORTIFY_SOURCE=2-fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches
-specs=/usr/lib/rpm/redhat/redhat-hardened-cc1-DLINUX_OOM_SCORE_ADJ=0 -fpic -I. -I. -I/usr/include/python3.5m
-I../../../src/include-D_GNU_SOURCE -I/usr/include/libxml2   -c -o plpy_cursorobject.o plpy_cursorobject.c 
In file included from ../../../src/include/storage/lwlock.h:18:0,
                 from ../../../src/include/storage/lock.h:18,
                 from ../../../src/include/access/genam.h:20,
                 from ../../../src/include/nodes/execnodes.h:17,
                 from ../../../src/include/executor/execdesc.h:18,
                 from ../../../src/include/utils/portal.h:50,
                 from ../../../src/include/executor/spi.h:18,
                 from plpy_planobject.h:8,
                 from plpy_cursorobject.c:18:
../../../src/include/storage/s_lock.h:890:2: error: #error PostgreSQL does not have native spinlock support on this
platform.To continue the compilation, rerun configure using --disable-spinlocks. However, performance will be poor.
Pleasereport this to pgsql-bugs@postgresql.org. 
 #error PostgreSQL does not have native spinlock support on this platform.  To continue the compilation, rerun
configureusing --disable-spinlocks.  However, performance will be poor.  Please report this to
pgsql-bugs@postgresql.org.
  ^~~~~
../../../src/include/storage/s_lock.h:962:25: error: unknown type name 'slock_t'
 extern int tas(volatile slock_t *lock);  /* in port/.../tas.s, or
                         ^~~~~~~
../../../src/include/storage/s_lock.h:972:8: error: unknown type name 'slock_t'
 extern slock_t dummy_spinlock;
        ^~~~~~~
../../../src/include/storage/s_lock.h:977:28: error: unknown type name 'slock_t'
 extern int s_lock(volatile slock_t *lock, const char *file, int line);
                            ^~~~~~~
In file included from ../../../src/include/storage/lock.h:18:0,
                 from ../../../src/include/access/genam.h:20,
                 from ../../../src/include/nodes/execnodes.h:17,
                 from ../../../src/include/executor/execdesc.h:18,
                 from ../../../src/include/utils/portal.h:50,
                 from ../../../src/include/executor/spi.h:18,
                 from plpy_planobject.h:8,
                 from plpy_cursorobject.c:18:
../../../src/include/storage/lwlock.h:50:2: error: unknown type name 'slock_t'
  slock_t  mutex;   /* Protects LWLock and queue of PGPROCs */
  ^~~~~~~
<builtin>: recipe for target 'plpy_cursorobject.o' failed
make: *** [plpy_cursorobject.o] Error 1
error: Bad exit status from /var/tmp/rpm-tmp.ZVdDip (%build)




--
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-builder quickly builds VMs from scratch
http://libguestfs.org/virt-builder.1.html

Re: PostgreSQL cannot be compiled on RISC-V

From
Tom Lane
Date:
"Richard W.M. Jones" <rjones@redhat.com> writes:
> ../../../src/include/storage/s_lock.h:890:2: error: #error PostgreSQL do=
es not have native spinlock support on this platform. To continue the comp=
ilation, rerun configure using --disable-spinlocks. However, performance w=
ill be poor. Please report this to pgsql-bugs@postgresql.org.

Hi Richard,

What's a RISC-V, and can you provide some gcc assembler implementing
spinlocks for it?  See commentary and code for other platforms in
src/include/storage/s_lock.h.

            regards, tom lane

Re: PostgreSQL cannot be compiled on RISC-V

From
John R Pierce
Date:
On 11/19/2016 7:08 PM, Tom Lane wrote:
> What's a RISC-V

was new term to me, too, so I googled it, its a new open source Risc
architecture created by UC Berkeley.  See
https://en.wikipedia.org/wiki/RISC-V for a summary.


--
john r pierce, recycling bits in santa cruz

Re: PostgreSQL cannot be compiled on RISC-V

From
"Richard W.M. Jones"
Date:
On Sat, Nov 19, 2016 at 10:08:07PM -0500, Tom Lane wrote:
> "Richard W.M. Jones" <rjones@redhat.com> writes:
> > ../../../src/include/storage/s_lock.h:890:2: error: #error PostgreSQL does not have native spinlock support on this
platform.To continue the compilation, rerun configure using --disable-spinlocks. However, performance will be poor.
Pleasereport this to pgsql-bugs@postgresql.org. 
>
> Hi Richard,
>
> What's a RISC-V, and can you provide some gcc assembler implementing
> spinlocks for it?  See commentary and code for other platforms in
> src/include/storage/s_lock.h.

The answer to the first question is a lot easier than the second :-)

RISC-V is an open source instruction set architecture.  https://riscv.org/
I'm currently compiling Fedora for RISC-V and this was the thing
that stops PostgreSQL from being compiled.
https://fedoraproject.org/wiki/Architectures/RISC-V

I looked at the file you pointed to.  I believe it should be possible
to follow the ARM implementation and call __sync_lock_test_and_set (a
GCC builtin).  I will try this out, but note that compiling anything
on RISC-V by hand is currently a very tedious process that can take
many hours.

    - - -

Yesterday I added --disable-spinlocks and built PostgreSQL again
overnight, and it compiles fine but fails in the tests:

  https://fedorapeople.org/groups/risc-v/logs/postgresql/9.5.5-1.fc25.0.riscv64/build.log

The errors mostly seem to be:

  ERROR:  unexpected EOF on client connection with an open transaction

This is *probably* because our kernel lacks networking support, but
I didn't look at this in great detail yet.

Rich.

--
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-df lists disk usage of guests without needing to install any
software inside the virtual machine.  Supports Linux and Windows.
http://people.redhat.com/~rjones/virt-df/

Re: PostgreSQL cannot be compiled on RISC-V

From
"Richard W.M. Jones"
Date:
On Sat, Nov 19, 2016 at 10:08:07PM -0500, Tom Lane wrote:
> "Richard W.M. Jones" <rjones@redhat.com> writes:
> > ../../../src/include/storage/s_lock.h:890:2: error: #error PostgreSQL does not have native spinlock support on this
platform.To continue the compilation, rerun configure using --disable-spinlocks. However, performance will be poor.
Pleasereport this to pgsql-bugs@postgresql.org. 
>
> Hi Richard,
>
> What's a RISC-V, and can you provide some gcc assembler implementing
> spinlocks for it?  See commentary and code for other platforms in
> src/include/storage/s_lock.h.

That attached patch allows PostgreSQL to compile successfully.  I'm
still examining the test failures, but I think they are irrelevant to
this.

Please note that you also need to update config/config.{sub,guess} to
the latest versions from upstream
(https://www.gnu.org/software/gettext/manual/html_node/config_002eguess.html)
since your current versions are too old to understand the riscv{32,64}
architectures.

Rich.

--
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-builder quickly builds VMs from scratch
http://libguestfs.org/virt-builder.1.html

Attachment

Re: PostgreSQL cannot be compiled on RISC-V

From
"Richard W.M. Jones"
Date:
On Sun, Nov 20, 2016 at 03:09:05PM +0000, Richard W.M. Jones wrote:
> On Sat, Nov 19, 2016 at 10:08:07PM -0500, Tom Lane wrote:
> > "Richard W.M. Jones" <rjones@redhat.com> writes:
> > > ../../../src/include/storage/s_lock.h:890:2: error: #error PostgreSQL does not have native spinlock support on
thisplatform. To continue the compilation, rerun configure using --disable-spinlocks. However, performance will be
poor.Please report this to pgsql-bugs@postgresql.org. 
> >
> > Hi Richard,
> >
> > What's a RISC-V, and can you provide some gcc assembler implementing
> > spinlocks for it?  See commentary and code for other platforms in
> > src/include/storage/s_lock.h.
>
> That attached patch allows PostgreSQL to compile successfully.  I'm
> still examining the test failures, but I think they are irrelevant to
> this.
>
> Please note that you also need to update config/config.{sub,guess} to
> the latest versions from upstream
> (https://www.gnu.org/software/gettext/manual/html_node/config_002eguess.html)
> since your current versions are too old to understand the riscv{32,64}
> architectures.

I'm happy with the patch I previously attached, and I don't think it
would be controversial to apply it to upstream PostgreSQL now, along
with updating config.guess/config.sub.  I didn't test spinlocks, but
if there was a bug in the GCC builtin, then we would fix it in GCC.
(In fact, why not use the GCC builtin every time PostgreSQL is
compiled with GCC?)

For the sake of full disclosure, the test suite is rather "crashy",
with about half of the tests failing.  Basic database creation,
creating tables, etc all works, but the detailed tests fail with "test
process exited with exit code 2" often.

None of this is surprising as the entire toolchain is pre-alpha.

I found that dialing MAX_CONNECTIONS down to 2 or 3 helps.

What would be helpful is to get more detail on how the tests fail.  I
don't even know if it is the client or server side which fails
(although I assume the server, because `psql' will exit with code 2 if
it loses the network connection).  Is there some way to run tests with
lots of extra verbosity?

Rich.

--
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
Fedora Windows cross-compiler. Compile Windows programs, test, and
build Windows installers. Over 100 libraries supported.
http://fedoraproject.org/wiki/MinGW

Re: PostgreSQL cannot be compiled on RISC-V

From
Tom Lane
Date:
"Richard W.M. Jones" <rjones@redhat.com> writes:
> I'm happy with the patch I previously attached, and I don't think it
> would be controversial to apply it to upstream PostgreSQL now, along
> with updating config.guess/config.sub.  I didn't test spinlocks, but
> if there was a bug in the GCC builtin, then we would fix it in GCC.
> (In fact, why not use the GCC builtin every time PostgreSQL is
> compiled with GCC?)

Well, to be blunt, our experience with the GCC builtins has not been
good.  The quality of implementation seems to vary drastically across
architectures, with, e.g., some word widths not working or having
very poor performance compared to others.  So while we will take an
s_lock implementation based on GCC builtins, we view it as needing
to be validated for the given architecture just as much as any other
method would be.  For platforms where we have already-debugged assembly
code, there's no visible upside in switching to the builtins, either.

> For the sake of full disclosure, the test suite is rather "crashy",
> with about half of the tests failing.  Basic database creation,
> creating tables, etc all works, but the detailed tests fail with "test
> process exited with exit code 2" often.
> None of this is surprising as the entire toolchain is pre-alpha.

Yeah, compiler bugs would lead to that sort of thing.

> I found that dialing MAX_CONNECTIONS down to 2 or 3 helps.

Hmm, makes me wonder if the spinlock primitives actually work ...

> What would be helpful is to get more detail on how the tests fail.  I
> don't even know if it is the client or server side which fails
> (although I assume the server, because `psql' will exit with code 2 if
> it loses the network connection).  Is there some way to run tests with
> lots of extra verbosity?

Not directly, but I'd guess that the server processes are crashing and
leaving core dumps behind (or would be if you run under a suitable
ulimit).  Assuming that you've got working core dump support and gdb,
getting stack traces from some of the crashes would be useful info.
Also, if you can't tell from the server logs which core is which,
"p debug_query_string" is a good way to see the current SQL command
that a crashed process was working on.

Given that you seem to be pretty early in this process (ie a long
way from production grade), my feeling is that we should apply RISC-V
related fixes to HEAD only, meaning that they'd reach the field with
Postgres v10 next fall.  For your own purposes, you could carry the
fixes as patches against 9.6.x, or work with snapshots of our master
branch.

            regards, tom lane

Re: PostgreSQL cannot be compiled on RISC-V

From
"Richard W.M. Jones"
Date:
On Sun, Nov 20, 2016 at 01:02:51PM -0500, Tom Lane wrote:
> > I found that dialing MAX_CONNECTIONS down to 2 or 3 helps.
>
> Hmm, makes me wonder if the spinlock primitives actually work ...

Yes, my thought too.

With MAX_CONNECTIONS=1 only 5 tests fail, and reliably too:

     opr_sanity               ... FAILED
test errors                   ... FAILED
     psql_crosstab            ... FAILED
     select_views             ... FAILED
     largeobject              ... FAILED (test process exited with exit code 2)

========================
 5 of 168 tests failed.
========================

> > What would be helpful is to get more detail on how the tests fail.  I
> > don't even know if it is the client or server side which fails
> > (although I assume the server, because `psql' will exit with code 2 if
> > it loses the network connection).  Is there some way to run tests with
> > lots of extra verbosity?
>
> Not directly, but I'd guess that the server processes are crashing and
> leaving core dumps behind (or would be if you run under a suitable
> ulimit).

I checked this already and I don't think the server process is
crashing, or if it is then it's not leaving coredumps around even
though /proc/sys/kernel/core_pattern and the 'ulimit -c unlimited'
ought to allow them.  Maybe the tests or server process is adjusting
ulimit?

> Assuming that you've got working core dump support and gdb,
> getting stack traces from some of the crashes would be useful info.

Agreed.  Unfortunately there's no gdb yet, and as above no core dumps
in any case.

> Also, if you can't tell from the server logs which core is which,
> "p debug_query_string" is a good way to see the current SQL command
> that a crashed process was working on.

OK I will keep that in mind if we get gdb working.

> Given that you seem to be pretty early in this process (ie a long
> way from production grade), my feeling is that we should apply RISC-V
> related fixes to HEAD only, meaning that they'd reach the field with
> Postgres v10 next fall.  For your own purposes, you could carry the
> fixes as patches against 9.6.x, or work with snapshots of our master
> branch.

Yes no problems at all.

I really wouldn't want anyone to entrust precious data to PostgreSQL
on RISC-V at this point, so long schedules are fine :-/

Rich.

--
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
Fedora Windows cross-compiler. Compile Windows programs, test, and
build Windows installers. Over 100 libraries supported.
http://fedoraproject.org/wiki/MinGW

Re: PostgreSQL cannot be compiled on RISC-V

From
John R Pierce
Date:
On 11/20/2016 2:36 AM, Richard W.M. Jones wrote:
> This is*probably*  because our kernel lacks networking support, but
> I didn't look at this in great detail yet.

wait, what??   postgres can't function at all without basic networking
support, including at least named pipes and tcp over localhost.


--
john r pierce, recycling bits in santa cruz

Re: PostgreSQL cannot be compiled on RISC-V

From
Greg Stark
Date:
On Nov 20, 2016 6:57 PM, "John R Pierce" <pierce@hogranch.com> wrote:
>
> On 11/20/2016 2:36 AM, Richard W.M. Jones wrote:
>>
>> This is *probably* because our kernel lacks networking support, but I
didn't look at this in great detail yet.
>
>
> wait, what??   postgres can't function at all without basic networking
support, including at least named pipes and tcp over localhost.

I think you mean the named UNIX domain socket in /tmp. Technically not the
same thing as named pipes.

I don't think we require tcp whether loopback or not. You can certainly
configure the server not to accept tcp at all and only the UNIX domain
connections which is presumably good enough if it's a kernel issue.

Re: PostgreSQL cannot be compiled on RISC-V

From
Tom Lane
Date:
Greg Stark <stark@mit.edu> writes:
> On Nov 20, 2016 6:57 PM, "John R Pierce" <pierce@hogranch.com> wrote:
>> wait, what??   postgres can't function at all without basic networking
>> support, including at least named pipes and tcp over localhost.

> I think you mean the named UNIX domain socket in /tmp. Technically not the
> same thing as named pipes.

Right, we would need either UNIX-domain or TCP sockets for server
connections to work at all.

> I don't think we require tcp whether loopback or not.

We require UDP packet loopback to work for the stats collector to work.
If it weren't working, certain regression tests would probably fail,
and definitely the "stats" test would fail.  Richard's last message
shows that as having passed.

So I think his kernel must have more networking ability than he
supposes.

            regards, tom lane

Re: PostgreSQL cannot be compiled on RISC-V

From
"Richard W.M. Jones"
Date:
(Sorry about the threading and quoting, I am not subscribed to this list)

> > This is*probably*  because our kernel lacks networking support, but
> > I didn't look at this in great detail yet.
>
> wait, what??   postgres can't function at all without basic networking
> support, including at least named pipes and tcp over localhost.

What I said wasn't very precise.  The emulators and hardware that we
use for RISC-V lack any external networking.  As you can probably
imagine, this makes them very frustrating to use.

In addition, there are bugs in the kernel network stack on RISC-V
which seem to cause problems with localhost connections and Unix
domain sockets in some circumstances.  These particularly cause test
failures in our autobuilder when the tests (of any package, not just
PostgreSQL) use the network.  I have looked into this, but so far not
been able to narrow it down or make a simple reproducer.  It could be
a syscall ABI problem.

The Linux kernel we are using does have networking enabled.  If you
are interested in this more, then you can see our CONFIG_* settings
here:

  https://github.com/rwmjones/fedora-riscv-kernel

I hope that clears things up.

Rich.

--
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
Fedora Windows cross-compiler. Compile Windows programs, test, and
build Windows installers. Over 100 libraries supported.
http://fedoraproject.org/wiki/MinGW

Re: PostgreSQL cannot be compiled on RISC-V

From
Tom Lane
Date:
"Richard W.M. Jones" <rjones@redhat.com> writes:
> On Sun, Nov 20, 2016 at 01:02:51PM -0500, Tom Lane wrote:
>> Hmm, makes me wonder if the spinlock primitives actually work ...

> Yes, my thought too.

> With MAX_CONNECTIONS=3D1 only 5 tests fail, and reliably too:
>      opr_sanity               ... FAILED
> test errors                   ... FAILED
>      psql_crosstab            ... FAILED
>      select_views             ... FAILED
>      largeobject              ... FAILED (test process exited with exit =
code 2)

That's a smoking gun then.  Your GCC builtins don't work.

>> Assuming that you've got working core dump support and gdb,
>> getting stack traces from some of the crashes would be useful info.

> Agreed.  Unfortunately there's no gdb yet, and as above no core dumps
> in any case.

Hm.  You could look at the regression.diffs file to get more info
about what's happening.  But TBH, if you don't have GDB working
yet, it doesn't seem like making Postgres work is a higher priority
problem than that.

Once you've got GDB working, I'd be willing to help investigate,
if you can give me ssh access to a RISC-V box with a devel
environment installed.  Before that, it seems like a pretty
back-burner problem.

            regards, tom lane

Re: PostgreSQL cannot be compiled on RISC-V

From
John R Pierce
Date:
On 11/20/2016 11:35 AM, Tom Lane wrote:
> Greg Stark<stark@mit.edu>  writes:
>> >On Nov 20, 2016 6:57 PM, "John R Pierce"<pierce@hogranch.com>  wrote:
>>> >>wait, what??   postgres can't function at all without basic networking
>>> >>support, including at least named pipes and tcp over localhost.
>> >I think you mean the named UNIX domain socket in /tmp. Technically not the
>> >same thing as named pipes.

err, yeah, thats what I meant, all right.  typing on my first cup of
coffee, hah!

> Right, we would need either UNIX-domain or TCP sockets for server
> connections to work at all.
>
>> >I don't think we require tcp whether loopback or not.
> We require UDP packet loopback to work for the stats collector to work.
> If it weren't working, certain regression tests would probably fail,
> and definitely the "stats" test would fail.  Richard's last message
> shows that as having passed.

ahh, yeah, I knew there was something like that, I just couldn't
remember what.      I'm vaguely remembering someone on this list  a year
or so ago who was upset that localhost was hard coded to 127.0.0.1
because he was trying to do something weirdly 'clever' with it.



--
john r pierce, recycling bits in santa cruz

Re: PostgreSQL cannot be compiled on RISC-V

From
"Richard W.M. Jones"
Date:
On Sun, Nov 20, 2016 at 02:45:18PM -0500, Tom Lane wrote:
> "Richard W.M. Jones" <rjones@redhat.com> writes:
> > On Sun, Nov 20, 2016 at 01:02:51PM -0500, Tom Lane wrote:
> >> Hmm, makes me wonder if the spinlock primitives actually work ...
>
> > Yes, my thought too.
>
> > With MAX_CONNECTIONS=1 only 5 tests fail, and reliably too:
> >      opr_sanity               ... FAILED
> > test errors                   ... FAILED
> >      psql_crosstab            ... FAILED
> >      select_views             ... FAILED
> >      largeobject              ... FAILED (test process exited with exit code 2)
>
> That's a smoking gun then.  Your GCC builtins don't work.

Interesting .. I will look at this further.  In any case it's
a problem to be fixed in GCC.

> >> Assuming that you've got working core dump support and gdb,
> >> getting stack traces from some of the crashes would be useful info.
>
> > Agreed.  Unfortunately there's no gdb yet, and as above no core dumps
> > in any case.
>
> Hm.  You could look at the regression.diffs file to get more info
> about what's happening.  But TBH, if you don't have GDB working
> yet, it doesn't seem like making Postgres work is a higher priority
> problem than that.
>
> Once you've got GDB working, I'd be willing to help investigate,
> if you can give me ssh access to a RISC-V box with a devel
> environment installed.  Before that, it seems like a pretty
> back-burner problem.

No network => no ssh :-(

However if you really want to pursue this then we provide a qemu
emulator which can run Fedora/RISC-V:

  https://fedoraproject.org/wiki/Architectures/RISC-V/Disk_images

Unfortunately (also because of the lack of networking) you either have
to download the whole set of RPMs (27GB [sic]) or resolve RPM
dependencies by hand, so it's not especially practical at the moment.

We are hoping to have ASICs next year.

Rich.

--
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-top is 'top' for virtual machines.  Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.
http://people.redhat.com/~rjones/virt-top