Thread: PostgreSQL cannot be compiled on RISC-V
The full log is here: https://fedorapeople.org/groups/risc-v/logs/postgresql/9.5.5-1.fc25.0.riscv64/build.log The extract which fails is: gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security-fno-strict-aliasing -fwrapv -fexcess-precision=standard -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2-fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1-DLINUX_OOM_SCORE_ADJ=0 -fpic -I. -I. -I/usr/include/python3.5m -I../../../src/include-D_GNU_SOURCE -I/usr/include/libxml2 -c -o plpy_cursorobject.o plpy_cursorobject.c In file included from ../../../src/include/storage/lwlock.h:18:0, from ../../../src/include/storage/lock.h:18, from ../../../src/include/access/genam.h:20, from ../../../src/include/nodes/execnodes.h:17, from ../../../src/include/executor/execdesc.h:18, from ../../../src/include/utils/portal.h:50, from ../../../src/include/executor/spi.h:18, from plpy_planobject.h:8, from plpy_cursorobject.c:18: ../../../src/include/storage/s_lock.h:890:2: error: #error PostgreSQL does not have native spinlock support on this platform.To continue the compilation, rerun configure using --disable-spinlocks. However, performance will be poor. Pleasereport this to pgsql-bugs@postgresql.org. #error PostgreSQL does not have native spinlock support on this platform. To continue the compilation, rerun configureusing --disable-spinlocks. However, performance will be poor. Please report this to pgsql-bugs@postgresql.org. ^~~~~ ../../../src/include/storage/s_lock.h:962:25: error: unknown type name 'slock_t' extern int tas(volatile slock_t *lock); /* in port/.../tas.s, or ^~~~~~~ ../../../src/include/storage/s_lock.h:972:8: error: unknown type name 'slock_t' extern slock_t dummy_spinlock; ^~~~~~~ ../../../src/include/storage/s_lock.h:977:28: error: unknown type name 'slock_t' extern int s_lock(volatile slock_t *lock, const char *file, int line); ^~~~~~~ In file included from ../../../src/include/storage/lock.h:18:0, from ../../../src/include/access/genam.h:20, from ../../../src/include/nodes/execnodes.h:17, from ../../../src/include/executor/execdesc.h:18, from ../../../src/include/utils/portal.h:50, from ../../../src/include/executor/spi.h:18, from plpy_planobject.h:8, from plpy_cursorobject.c:18: ../../../src/include/storage/lwlock.h:50:2: error: unknown type name 'slock_t' slock_t mutex; /* Protects LWLock and queue of PGPROCs */ ^~~~~~~ <builtin>: recipe for target 'plpy_cursorobject.o' failed make: *** [plpy_cursorobject.o] Error 1 error: Bad exit status from /var/tmp/rpm-tmp.ZVdDip (%build) -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-builder quickly builds VMs from scratch http://libguestfs.org/virt-builder.1.html
"Richard W.M. Jones" <rjones@redhat.com> writes: > ../../../src/include/storage/s_lock.h:890:2: error: #error PostgreSQL do= es not have native spinlock support on this platform. To continue the comp= ilation, rerun configure using --disable-spinlocks. However, performance w= ill be poor. Please report this to pgsql-bugs@postgresql.org. Hi Richard, What's a RISC-V, and can you provide some gcc assembler implementing spinlocks for it? See commentary and code for other platforms in src/include/storage/s_lock.h. regards, tom lane
On 11/19/2016 7:08 PM, Tom Lane wrote: > What's a RISC-V was new term to me, too, so I googled it, its a new open source Risc architecture created by UC Berkeley. See https://en.wikipedia.org/wiki/RISC-V for a summary. -- john r pierce, recycling bits in santa cruz
On Sat, Nov 19, 2016 at 10:08:07PM -0500, Tom Lane wrote: > "Richard W.M. Jones" <rjones@redhat.com> writes: > > ../../../src/include/storage/s_lock.h:890:2: error: #error PostgreSQL does not have native spinlock support on this platform.To continue the compilation, rerun configure using --disable-spinlocks. However, performance will be poor. Pleasereport this to pgsql-bugs@postgresql.org. > > Hi Richard, > > What's a RISC-V, and can you provide some gcc assembler implementing > spinlocks for it? See commentary and code for other platforms in > src/include/storage/s_lock.h. The answer to the first question is a lot easier than the second :-) RISC-V is an open source instruction set architecture. https://riscv.org/ I'm currently compiling Fedora for RISC-V and this was the thing that stops PostgreSQL from being compiled. https://fedoraproject.org/wiki/Architectures/RISC-V I looked at the file you pointed to. I believe it should be possible to follow the ARM implementation and call __sync_lock_test_and_set (a GCC builtin). I will try this out, but note that compiling anything on RISC-V by hand is currently a very tedious process that can take many hours. - - - Yesterday I added --disable-spinlocks and built PostgreSQL again overnight, and it compiles fine but fails in the tests: https://fedorapeople.org/groups/risc-v/logs/postgresql/9.5.5-1.fc25.0.riscv64/build.log The errors mostly seem to be: ERROR: unexpected EOF on client connection with an open transaction This is *probably* because our kernel lacks networking support, but I didn't look at this in great detail yet. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-df lists disk usage of guests without needing to install any software inside the virtual machine. Supports Linux and Windows. http://people.redhat.com/~rjones/virt-df/
On Sat, Nov 19, 2016 at 10:08:07PM -0500, Tom Lane wrote: > "Richard W.M. Jones" <rjones@redhat.com> writes: > > ../../../src/include/storage/s_lock.h:890:2: error: #error PostgreSQL does not have native spinlock support on this platform.To continue the compilation, rerun configure using --disable-spinlocks. However, performance will be poor. Pleasereport this to pgsql-bugs@postgresql.org. > > Hi Richard, > > What's a RISC-V, and can you provide some gcc assembler implementing > spinlocks for it? See commentary and code for other platforms in > src/include/storage/s_lock.h. That attached patch allows PostgreSQL to compile successfully. I'm still examining the test failures, but I think they are irrelevant to this. Please note that you also need to update config/config.{sub,guess} to the latest versions from upstream (https://www.gnu.org/software/gettext/manual/html_node/config_002eguess.html) since your current versions are too old to understand the riscv{32,64} architectures. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-builder quickly builds VMs from scratch http://libguestfs.org/virt-builder.1.html
Attachment
On Sun, Nov 20, 2016 at 03:09:05PM +0000, Richard W.M. Jones wrote: > On Sat, Nov 19, 2016 at 10:08:07PM -0500, Tom Lane wrote: > > "Richard W.M. Jones" <rjones@redhat.com> writes: > > > ../../../src/include/storage/s_lock.h:890:2: error: #error PostgreSQL does not have native spinlock support on thisplatform. To continue the compilation, rerun configure using --disable-spinlocks. However, performance will be poor.Please report this to pgsql-bugs@postgresql.org. > > > > Hi Richard, > > > > What's a RISC-V, and can you provide some gcc assembler implementing > > spinlocks for it? See commentary and code for other platforms in > > src/include/storage/s_lock.h. > > That attached patch allows PostgreSQL to compile successfully. I'm > still examining the test failures, but I think they are irrelevant to > this. > > Please note that you also need to update config/config.{sub,guess} to > the latest versions from upstream > (https://www.gnu.org/software/gettext/manual/html_node/config_002eguess.html) > since your current versions are too old to understand the riscv{32,64} > architectures. I'm happy with the patch I previously attached, and I don't think it would be controversial to apply it to upstream PostgreSQL now, along with updating config.guess/config.sub. I didn't test spinlocks, but if there was a bug in the GCC builtin, then we would fix it in GCC. (In fact, why not use the GCC builtin every time PostgreSQL is compiled with GCC?) For the sake of full disclosure, the test suite is rather "crashy", with about half of the tests failing. Basic database creation, creating tables, etc all works, but the detailed tests fail with "test process exited with exit code 2" often. None of this is surprising as the entire toolchain is pre-alpha. I found that dialing MAX_CONNECTIONS down to 2 or 3 helps. What would be helpful is to get more detail on how the tests fail. I don't even know if it is the client or server side which fails (although I assume the server, because `psql' will exit with code 2 if it loses the network connection). Is there some way to run tests with lots of extra verbosity? Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com Fedora Windows cross-compiler. Compile Windows programs, test, and build Windows installers. Over 100 libraries supported. http://fedoraproject.org/wiki/MinGW
"Richard W.M. Jones" <rjones@redhat.com> writes: > I'm happy with the patch I previously attached, and I don't think it > would be controversial to apply it to upstream PostgreSQL now, along > with updating config.guess/config.sub. I didn't test spinlocks, but > if there was a bug in the GCC builtin, then we would fix it in GCC. > (In fact, why not use the GCC builtin every time PostgreSQL is > compiled with GCC?) Well, to be blunt, our experience with the GCC builtins has not been good. The quality of implementation seems to vary drastically across architectures, with, e.g., some word widths not working or having very poor performance compared to others. So while we will take an s_lock implementation based on GCC builtins, we view it as needing to be validated for the given architecture just as much as any other method would be. For platforms where we have already-debugged assembly code, there's no visible upside in switching to the builtins, either. > For the sake of full disclosure, the test suite is rather "crashy", > with about half of the tests failing. Basic database creation, > creating tables, etc all works, but the detailed tests fail with "test > process exited with exit code 2" often. > None of this is surprising as the entire toolchain is pre-alpha. Yeah, compiler bugs would lead to that sort of thing. > I found that dialing MAX_CONNECTIONS down to 2 or 3 helps. Hmm, makes me wonder if the spinlock primitives actually work ... > What would be helpful is to get more detail on how the tests fail. I > don't even know if it is the client or server side which fails > (although I assume the server, because `psql' will exit with code 2 if > it loses the network connection). Is there some way to run tests with > lots of extra verbosity? Not directly, but I'd guess that the server processes are crashing and leaving core dumps behind (or would be if you run under a suitable ulimit). Assuming that you've got working core dump support and gdb, getting stack traces from some of the crashes would be useful info. Also, if you can't tell from the server logs which core is which, "p debug_query_string" is a good way to see the current SQL command that a crashed process was working on. Given that you seem to be pretty early in this process (ie a long way from production grade), my feeling is that we should apply RISC-V related fixes to HEAD only, meaning that they'd reach the field with Postgres v10 next fall. For your own purposes, you could carry the fixes as patches against 9.6.x, or work with snapshots of our master branch. regards, tom lane
On Sun, Nov 20, 2016 at 01:02:51PM -0500, Tom Lane wrote: > > I found that dialing MAX_CONNECTIONS down to 2 or 3 helps. > > Hmm, makes me wonder if the spinlock primitives actually work ... Yes, my thought too. With MAX_CONNECTIONS=1 only 5 tests fail, and reliably too: opr_sanity ... FAILED test errors ... FAILED psql_crosstab ... FAILED select_views ... FAILED largeobject ... FAILED (test process exited with exit code 2) ======================== 5 of 168 tests failed. ======================== > > What would be helpful is to get more detail on how the tests fail. I > > don't even know if it is the client or server side which fails > > (although I assume the server, because `psql' will exit with code 2 if > > it loses the network connection). Is there some way to run tests with > > lots of extra verbosity? > > Not directly, but I'd guess that the server processes are crashing and > leaving core dumps behind (or would be if you run under a suitable > ulimit). I checked this already and I don't think the server process is crashing, or if it is then it's not leaving coredumps around even though /proc/sys/kernel/core_pattern and the 'ulimit -c unlimited' ought to allow them. Maybe the tests or server process is adjusting ulimit? > Assuming that you've got working core dump support and gdb, > getting stack traces from some of the crashes would be useful info. Agreed. Unfortunately there's no gdb yet, and as above no core dumps in any case. > Also, if you can't tell from the server logs which core is which, > "p debug_query_string" is a good way to see the current SQL command > that a crashed process was working on. OK I will keep that in mind if we get gdb working. > Given that you seem to be pretty early in this process (ie a long > way from production grade), my feeling is that we should apply RISC-V > related fixes to HEAD only, meaning that they'd reach the field with > Postgres v10 next fall. For your own purposes, you could carry the > fixes as patches against 9.6.x, or work with snapshots of our master > branch. Yes no problems at all. I really wouldn't want anyone to entrust precious data to PostgreSQL on RISC-V at this point, so long schedules are fine :-/ Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com Fedora Windows cross-compiler. Compile Windows programs, test, and build Windows installers. Over 100 libraries supported. http://fedoraproject.org/wiki/MinGW
On 11/20/2016 2:36 AM, Richard W.M. Jones wrote: > This is*probably* because our kernel lacks networking support, but > I didn't look at this in great detail yet. wait, what?? postgres can't function at all without basic networking support, including at least named pipes and tcp over localhost. -- john r pierce, recycling bits in santa cruz
On Nov 20, 2016 6:57 PM, "John R Pierce" <pierce@hogranch.com> wrote: > > On 11/20/2016 2:36 AM, Richard W.M. Jones wrote: >> >> This is *probably* because our kernel lacks networking support, but I didn't look at this in great detail yet. > > > wait, what?? postgres can't function at all without basic networking support, including at least named pipes and tcp over localhost. I think you mean the named UNIX domain socket in /tmp. Technically not the same thing as named pipes. I don't think we require tcp whether loopback or not. You can certainly configure the server not to accept tcp at all and only the UNIX domain connections which is presumably good enough if it's a kernel issue.
Greg Stark <stark@mit.edu> writes: > On Nov 20, 2016 6:57 PM, "John R Pierce" <pierce@hogranch.com> wrote: >> wait, what?? postgres can't function at all without basic networking >> support, including at least named pipes and tcp over localhost. > I think you mean the named UNIX domain socket in /tmp. Technically not the > same thing as named pipes. Right, we would need either UNIX-domain or TCP sockets for server connections to work at all. > I don't think we require tcp whether loopback or not. We require UDP packet loopback to work for the stats collector to work. If it weren't working, certain regression tests would probably fail, and definitely the "stats" test would fail. Richard's last message shows that as having passed. So I think his kernel must have more networking ability than he supposes. regards, tom lane
(Sorry about the threading and quoting, I am not subscribed to this list) > > This is*probably* because our kernel lacks networking support, but > > I didn't look at this in great detail yet. > > wait, what?? postgres can't function at all without basic networking > support, including at least named pipes and tcp over localhost. What I said wasn't very precise. The emulators and hardware that we use for RISC-V lack any external networking. As you can probably imagine, this makes them very frustrating to use. In addition, there are bugs in the kernel network stack on RISC-V which seem to cause problems with localhost connections and Unix domain sockets in some circumstances. These particularly cause test failures in our autobuilder when the tests (of any package, not just PostgreSQL) use the network. I have looked into this, but so far not been able to narrow it down or make a simple reproducer. It could be a syscall ABI problem. The Linux kernel we are using does have networking enabled. If you are interested in this more, then you can see our CONFIG_* settings here: https://github.com/rwmjones/fedora-riscv-kernel I hope that clears things up. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com Fedora Windows cross-compiler. Compile Windows programs, test, and build Windows installers. Over 100 libraries supported. http://fedoraproject.org/wiki/MinGW
"Richard W.M. Jones" <rjones@redhat.com> writes: > On Sun, Nov 20, 2016 at 01:02:51PM -0500, Tom Lane wrote: >> Hmm, makes me wonder if the spinlock primitives actually work ... > Yes, my thought too. > With MAX_CONNECTIONS=3D1 only 5 tests fail, and reliably too: > opr_sanity ... FAILED > test errors ... FAILED > psql_crosstab ... FAILED > select_views ... FAILED > largeobject ... FAILED (test process exited with exit = code 2) That's a smoking gun then. Your GCC builtins don't work. >> Assuming that you've got working core dump support and gdb, >> getting stack traces from some of the crashes would be useful info. > Agreed. Unfortunately there's no gdb yet, and as above no core dumps > in any case. Hm. You could look at the regression.diffs file to get more info about what's happening. But TBH, if you don't have GDB working yet, it doesn't seem like making Postgres work is a higher priority problem than that. Once you've got GDB working, I'd be willing to help investigate, if you can give me ssh access to a RISC-V box with a devel environment installed. Before that, it seems like a pretty back-burner problem. regards, tom lane
On 11/20/2016 11:35 AM, Tom Lane wrote: > Greg Stark<stark@mit.edu> writes: >> >On Nov 20, 2016 6:57 PM, "John R Pierce"<pierce@hogranch.com> wrote: >>> >>wait, what?? postgres can't function at all without basic networking >>> >>support, including at least named pipes and tcp over localhost. >> >I think you mean the named UNIX domain socket in /tmp. Technically not the >> >same thing as named pipes. err, yeah, thats what I meant, all right. typing on my first cup of coffee, hah! > Right, we would need either UNIX-domain or TCP sockets for server > connections to work at all. > >> >I don't think we require tcp whether loopback or not. > We require UDP packet loopback to work for the stats collector to work. > If it weren't working, certain regression tests would probably fail, > and definitely the "stats" test would fail. Richard's last message > shows that as having passed. ahh, yeah, I knew there was something like that, I just couldn't remember what. I'm vaguely remembering someone on this list a year or so ago who was upset that localhost was hard coded to 127.0.0.1 because he was trying to do something weirdly 'clever' with it. -- john r pierce, recycling bits in santa cruz
On Sun, Nov 20, 2016 at 02:45:18PM -0500, Tom Lane wrote: > "Richard W.M. Jones" <rjones@redhat.com> writes: > > On Sun, Nov 20, 2016 at 01:02:51PM -0500, Tom Lane wrote: > >> Hmm, makes me wonder if the spinlock primitives actually work ... > > > Yes, my thought too. > > > With MAX_CONNECTIONS=1 only 5 tests fail, and reliably too: > > opr_sanity ... FAILED > > test errors ... FAILED > > psql_crosstab ... FAILED > > select_views ... FAILED > > largeobject ... FAILED (test process exited with exit code 2) > > That's a smoking gun then. Your GCC builtins don't work. Interesting .. I will look at this further. In any case it's a problem to be fixed in GCC. > >> Assuming that you've got working core dump support and gdb, > >> getting stack traces from some of the crashes would be useful info. > > > Agreed. Unfortunately there's no gdb yet, and as above no core dumps > > in any case. > > Hm. You could look at the regression.diffs file to get more info > about what's happening. But TBH, if you don't have GDB working > yet, it doesn't seem like making Postgres work is a higher priority > problem than that. > > Once you've got GDB working, I'd be willing to help investigate, > if you can give me ssh access to a RISC-V box with a devel > environment installed. Before that, it seems like a pretty > back-burner problem. No network => no ssh :-( However if you really want to pursue this then we provide a qemu emulator which can run Fedora/RISC-V: https://fedoraproject.org/wiki/Architectures/RISC-V/Disk_images Unfortunately (also because of the lack of networking) you either have to download the whole set of RPMs (27GB [sic]) or resolve RPM dependencies by hand, so it's not especially practical at the moment. We are hoping to have ASICs next year. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-top is 'top' for virtual machines. Tiny program with many powerful monitoring features, net stats, disk stats, logging, etc. http://people.redhat.com/~rjones/virt-top