Thread: master make check fails on Solaris 10
Hello, hackers! I got a permanent failure of master (commit ca454b9bd34c75995eda4d07c9858f7c22890c2b) make check on Solaris 10. Regression output and diffs are attached. I used the following commands: ./configure CC="ccache gcc" CFLAGS="-m64 -I/opt/csw/include" LDFLAGS="-L/opt/csw/lib/sparcv9 -L/usr/local/lib/64" --enable-cassert --enable-debug --enable-nls --with-perl --with-tcl --with-python --with-gssapi --with-openssl --with-ldap --with-libxml --with-libxslt gmake > make_results.txt gmake check About the system: SunOS, Release 5.10, KernelID Generic_141444-09. -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
Marina Polyakova <m.polyakova@postgrespro.ru> writes: > Hello, hackers! I got a permanent failure of master (commit > ca454b9bd34c75995eda4d07c9858f7c22890c2b) make check on Solaris 10. > Regression output and diffs are attached. Hm, buildfarm member protosciurus is running a similar configuration without problems. Looking at its configuration, maybe you need to fool with LD_LIBRARY_PATH and/or LDFLAGS_SL? regards, tom lane
Hi, On 2018-01-11 20:21:11 +0300, Marina Polyakova wrote: > Hello, hackers! I got a permanent failure of master (commit > ca454b9bd34c75995eda4d07c9858f7c22890c2b) make check on Solaris 10. Did this use to work? If so, could you check whether it worked before 69c3936a1499b772a749ae629fc59b2d72722332? Greetings, Andres Freund
On 11-01-2018 20:34, Tom Lane wrote: > Marina Polyakova <m.polyakova@postgrespro.ru> writes: >> Hello, hackers! I got a permanent failure of master (commit >> ca454b9bd34c75995eda4d07c9858f7c22890c2b) make check on Solaris 10. >> Regression output and diffs are attached. > > Hm, buildfarm member protosciurus is running a similar configuration > without problems. Looking at its configuration, maybe you need to > fool with LD_LIBRARY_PATH and/or LDFLAGS_SL? I added these parameters with the same values in configure (LDFLAGS_SL="-m64" LD_LIBRARY_PATH="/lib/64:/usr/lib/64:/usr/sfw/lib/64:/usr/local/lib"), there're the same failures :( (see the attached regression diffs and output) -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
On 11-01-2018 20:39, Andres Freund wrote: > Hi, > > On 2018-01-11 20:21:11 +0300, Marina Polyakova wrote: >> Hello, hackers! I got a permanent failure of master (commit >> ca454b9bd34c75995eda4d07c9858f7c22890c2b) make check on Solaris 10. > > Did this use to work? It always fails if you have asked about this.. > If so, could you check whether it worked before > 69c3936a1499b772a749ae629fc59b2d72722332? - on the previous commit (272c2ab9fd0a604e3200030b1ea26fd464c44935) the same failures occur (see the attached regression diffs and output); - on commit bf54c0f05c0a58db17627724a83e1b6d4ec2712c make check-world passes. I'll try to find out from what commit it started.. Don't you have any suspicions?) -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
Marina Polyakova wrote: > - on the previous commit (272c2ab9fd0a604e3200030b1ea26fd464c44935) the same > failures occur (see the attached regression diffs and output); > - on commit bf54c0f05c0a58db17627724a83e1b6d4ec2712c make check-world > passes. > I'll try to find out from what commit it started.. Don't you have any > suspicions?) Perhaps you can use "git bisect". -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 12-01-2018 18:12, Alvaro Herrera wrote: > Marina Polyakova wrote: > >> - on the previous commit (272c2ab9fd0a604e3200030b1ea26fd464c44935) >> the same >> failures occur (see the attached regression diffs and output); >> - on commit bf54c0f05c0a58db17627724a83e1b6d4ec2712c make check-world >> passes. >> I'll try to find out from what commit it started.. Don't you have any >> suspicions?) > > Perhaps you can use "git bisect". Thanks, I'm doing the same thing :) -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On 12-01-2018 14:05, Marina Polyakova wrote: > - on the previous commit (272c2ab9fd0a604e3200030b1ea26fd464c44935) > the same failures occur (see the attached regression diffs and > output); > - on commit bf54c0f05c0a58db17627724a83e1b6d4ec2712c make check-world > passes. > I'll try to find out from what commit it started.. Binary search has shown that all these failures begin with commit 7518049980be1d90264addab003476ae105f70d4 (Prevent int128 from requiring more than MAXALIGN alignment.). On the previous commit (91aec93e6089a5ba49cce0aca3bf7f7022d62ea4) make check-world passes. -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Marina Polyakova <m.polyakova@postgrespro.ru> writes: > On 12-01-2018 14:05, Marina Polyakova wrote: >> - on the previous commit (272c2ab9fd0a604e3200030b1ea26fd464c44935) >> the same failures occur (see the attached regression diffs and >> output); >> - on commit bf54c0f05c0a58db17627724a83e1b6d4ec2712c make check-world >> passes. >> I'll try to find out from what commit it started.. > Binary search has shown that all these failures begin with commit > 7518049980be1d90264addab003476ae105f70d4 (Prevent int128 from requiring > more than MAXALIGN alignment.). Hm ... so apparently, that compiler has bugs in handling nondefault alignment specs. You said upthread it was gcc, but what version exactly? regards, tom lane
On 12-01-2018 21:00, Tom Lane wrote: > Marina Polyakova <m.polyakova@postgrespro.ru> writes: >> ... >> Binary search has shown that all these failures begin with commit >> 7518049980be1d90264addab003476ae105f70d4 (Prevent int128 from >> requiring >> more than MAXALIGN alignment.). > > Hm ... so apparently, that compiler has bugs in handling nondefault > alignment specs. You said upthread it was gcc, but what version > exactly? This is 5.2.0: $ gcc -v Reading specs from /opt/csw/lib/gcc/sparc-sun-solaris2.10/5.2.0/specs COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/opt/csw/libexec/gcc/sparc-sun-solaris2.10/5.2.0/lto-wrapper Target: sparc-sun-solaris2.10 Configured with: /home/dam/mgar/pkg/gcc5/trunk/work/solaris10-sparc/build-isa-sparcv8plus/gcc-5.2.0/configure --prefix=/opt/csw --exec_prefix=/opt/csw --bindir=/opt/csw/bin --sbindir=/opt/csw/sbin --libexecdir=/opt/csw/libexec --datadir=/opt/csw/share --sysconfdir=/etc/opt/csw --sharedstatedir=/opt/csw/share --localstatedir=/var/opt/csw --libdir=/opt/csw/lib --infodir=/opt/csw/share/info --includedir=/opt/csw/include --mandir=/opt/csw/share/man --enable-cloog-backend=isl --enable-java-awt=xlib --enable-languages=ada,c,c++,fortran,go,java,objc --enable-libada --enable-libssp --enable-nls --enable-objc-gc --enable-threads=posix --program-suffix=-5.2 --with-cloog=/opt/csw --with-gmp=/opt/csw --with-included-gettext --with-ld=/usr/ccs/bin/ld --without-gnu-ld --with-libiconv-prefix=/opt/csw --with-mpfr=/opt/csw --with-ppl=/opt/csw --with-system-zlib=/opt/csw --with-as=/usr/ccs/bin/as --without-gnu-as Thread model: posix gcc version 5.2.0 (GCC) -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Marina Polyakova <m.polyakova@postgrespro.ru> writes: > On 12-01-2018 21:00, Tom Lane wrote: >> Hm ... so apparently, that compiler has bugs in handling nondefault >> alignment specs. You said upthread it was gcc, but what version >> exactly? > This is 5.2.0: Ugh ... protosciurus has 3.4.3, but I see that configure detects that as *not* having __int128. Probably what's happening on your machine is that gcc knows __int128 but generates buggy code for it when an alignment spec is given. So that's unfortunate, but it's not really a regression from 3.4.3. I'm not sure there's much we can do about this. Dropping the use of the alignment spec isn't a workable option. If there were a simple way for configure to detect that the compiler generates bad code for that, we could have it do so and reject use of __int128, but it'd be up to you to come up with a workable test. In the end this might just be an instance of the old saw about avoiding dot-zero releases. Have you tried a newer gcc? (Digging in their bugzilla finds quite a number of __int128 bugs fixed in 5.4.x, though none look to be specifically about misaligned data.) Also, if it still happens with current gcc on that hardware, there'd be grounds for a new bug report to them. regards, tom lane
Thank you very much! On 13-01-2018 21:10, Tom Lane wrote: > Marina Polyakova <m.polyakova@postgrespro.ru> writes: >> On 12-01-2018 21:00, Tom Lane wrote: >>> Hm ... so apparently, that compiler has bugs in handling nondefault >>> alignment specs. You said upthread it was gcc, but what version >>> exactly? > >> This is 5.2.0: > > Ugh ... protosciurus has 3.4.3, but I see that configure detects that > as *not* having __int128. Probably what's happening on your machine > is that gcc knows __int128 but generates buggy code for it when an > alignment spec is given. So that's unfortunate, but it's not really > a regression from 3.4.3. > > I'm not sure there's much we can do about this. Dropping the use > of the alignment spec isn't a workable option. If there were a > simple way for configure to detect that the compiler generates bad > code for that, we could have it do so and reject use of __int128, > but it'd be up to you to come up with a workable test. I'll think about it.. > In the end this might just be an instance of the old saw about > avoiding dot-zero releases. Have you tried a newer gcc? > (Digging in their bugzilla finds quite a number of __int128 bugs > fixed in 5.4.x, though none look to be specifically about > misaligned data.) As I was told offlist, 5.2.0 is already a fairly new version of gcc for this system.. > Also, if it still happens with current gcc on that hardware, > there'd be grounds for a new bug report to them. -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Marina Polyakova <m.polyakova@postgrespro.ru> writes: > On 13-01-2018 21:10, Tom Lane wrote: >> I'm not sure there's much we can do about this. Dropping the use >> of the alignment spec isn't a workable option. If there were a >> simple way for configure to detect that the compiler generates bad >> code for that, we could have it do so and reject use of __int128, >> but it'd be up to you to come up with a workable test. > I'll think about it.. Attached is a possible test program. I can confirm it passes on a machine with working __int128, but I have no idea whether it will detect the problem on yours. If not, maybe you can tweak it? regards, tom lane #include <stddef.h> #include <stdio.h> /* GCC, Sunpro and XLC support aligned */ #if defined(__GNUC__) || defined(__SUNPRO_C) || defined(__IBMC__) #define pg_attribute_aligned(a) __attribute__((aligned(a))) #endif typedef __int128 int128a #if defined(pg_attribute_aligned) pg_attribute_aligned(8) #endif ; /* * These are globals to discourage the compiler from folding all the * arithmetic tests down to compile-time constants. We do not have * convenient support for 128bit literals at this point... */ struct glob128 { __int128 start; char pad; int128a a; int128a b; int128a c; int128a d; } g = {0, 'p', 48828125, 97656255, 0, 0}; int main() { if (offsetof(struct glob128, a) < 17 || offsetof(struct glob128, a) > 24) { printf("wrong alignment, %d\n", (int) offsetof(struct glob128, a)); return 1; } g.a = (g.a << 12) + 1; /* 200000000001 */ g.b = (g.b << 12) + 5; /* 400000000005 */ /* use the most relevant arithmetic ops */ g.c = g.a * g.b; g.d = (g.c + g.b) / g.b; /* return different values, to prevent optimizations */ if (g.d != g.a + 1) { printf("wrong arithmetic result\n"); return 1; } printf("A-OK!\n"); return 0; }
[I added Victor Wagner as co-researcher of this problem] On 13-01-2018 21:10, Tom Lane wrote: > In the end this might just be an instance of the old saw about > avoiding dot-zero releases. Have you tried a newer gcc? > (Digging in their bugzilla finds quite a number of __int128 bugs > fixed in 5.4.x, though none look to be specifically about > misaligned data.) gcc 5.5.0 (from [1]) did not fix the problem.. On 16-01-2018 2:41, Tom Lane wrote: > Marina Polyakova <m.polyakova@postgrespro.ru> writes: >> On 13-01-2018 21:10, Tom Lane wrote: >>> I'm not sure there's much we can do about this. Dropping the use >>> of the alignment spec isn't a workable option. If there were a >>> simple way for configure to detect that the compiler generates bad >>> code for that, we could have it do so and reject use of __int128, >>> but it'd be up to you to come up with a workable test. > >> I'll think about it.. > > Attached is a possible test program. I can confirm it passes on a > machine with working __int128, but I have no idea whether it will > detect the problem on yours. If not, maybe you can tweak it? Thank you! Using gcc 5.5.0 it prints that everything is ok. But, investigating the regression diffs, we found out that the error occurs when we pass int128 as not the first argument to the function (perhaps its value is replaced by the value of some address): -- Use queries from random.sql SELECT count(*) FROM onek; -- Everything is ok ... SELECT random, count(random) FROM RANDOM_TBL GROUP BY random HAVING count(random) > 3; -- Everything is ok postgres=# SELECT * FROM RANDOM_TBL ORDER BY random; -- Print current data random -------- 78 86 98 98 (4 rows) postgres=# SELECT AVG(random) FROM RANDOM_TBL postgres-# HAVING AVG(random) NOT BETWEEN 80 AND 120; -- Oops! avg ------------------------------- 79446934848446476698976780288 (1 row) Debug output from the last query (see attached diff.patch, it is based on commit 9c7d06d60680c7f00d931233873dee81fdb311c6 of master): makeInt128AggState int8_avg_accum val 98 int8_avg_accum val_int128 as 2 x int64: 0 98 int8_avg_accum val_int128 bytes: 00000000000000000000000000000062 int8_avg_accum state 100e648d8 int8_avg_accum 1007f2e94 do_int128_accum int128 newval as 2 x int64: 4306826968 0 do_int128_accum int128 newval bytes: 0000000100B4F6D80000000000000000 do_int128_accum state 100e648d8 do_int128_accum 1007f1e30 int8_avg_accum val 86 int8_avg_accum val_int128 as 2 x int64: 0 86 int8_avg_accum val_int128 bytes: 00000000000000000000000000000056 int8_avg_accum state 100e648d8 int8_avg_accum 1007f2e94 do_int128_accum int128 newval as 2 x int64: 4306826968 0 do_int128_accum int128 newval bytes: 0000000100B4F6D80000000000000000 do_int128_accum state 100e648d8 do_int128_accum 1007f1e30 int8_avg_accum val 98 int8_avg_accum val_int128 as 2 x int64: 0 98 int8_avg_accum val_int128 bytes: 00000000000000000000000000000062 int8_avg_accum state 100e648d8 int8_avg_accum 1007f2e94 do_int128_accum int128 newval as 2 x int64: 4306826968 0 do_int128_accum int128 newval bytes: 0000000100B4F6D80000000000000000 do_int128_accum state 100e648d8 do_int128_accum 1007f1e30 int8_avg_accum val 78 int8_avg_accum val_int128 as 2 x int64: 0 78 int8_avg_accum val_int128 bytes: 0000000000000000000000000000004E int8_avg_accum state 100e648d8 int8_avg_accum 1007f2e94 do_int128_accum int128 newval as 2 x int64: 4306826968 0 do_int128_accum int128 newval bytes: 0000000100B4F6D80000000000000000 do_int128_accum state 100e648d8 do_int128_accum 1007f1e30 numeric_poly_avg int128_to_numericvar int128_to_numericvar int128 val as 2 x int64: 17227307872 0 int128_to_numericvar int128 val bytes: 0000000402D3DB600000000000000000 (val_int128 in the function int8_avg_accum is correct, but newval in the function do_int128_accum is not equal to it. val in the function int128_to_numericvar is (4 * 4306826968).) Based on this, we modified the test program (see attached). Here is its output on Solaris 10 for different alignments requirements for int128 (on my machine where make check-world passes everything is OK) (ALIGNOF_PG_INT128_TYPE is 16 on Solaris 10): $ gcc -D PG_ALIGN_128=16 -m64 -o int128test2 int128test2.c $ ./int128test2 basic aritmetic OK pass int 16 OK pass uint 16 OK pass int 32 OK pass int 64 OK pass int 128 OK $ gcc -D PG_ALIGN_128=8 -m64 -o int128test2 int128test2.c $ ./int128test2 basic aritmetic OK pass int 16 FAILED pass uint 16 FAILED pass int 32 FAILED pass int 64 FAILED pass int 128 OK Maybe some pass test from int128test2.c can be used to test __int128? P.S. I suppose, g.b should be 97656250 to get 400000000005: > struct glob128 > { > __int128 start; > char pad; > int128a a; > int128a b; > int128a c; > int128a d; > } g = {0, 'p', 48828125, 97656255, 0, 0}; > ... > g.b = (g.b << 12) + 5; /* 400000000005 */ [1] https://www.opencsw.org -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
Marina Polyakova <m.polyakova@postgrespro.ru> writes: > investigating the regression diffs, we found out that the error occurs > when we pass int128 as not the first argument to the function (perhaps > its value is replaced by the value of some address): > ... > Based on this, we modified the test program (see attached). Here is its > output on Solaris 10 for different alignments requirements for int128 > (on my machine where make check-world passes everything is OK) > (ALIGNOF_PG_INT128_TYPE is 16 on Solaris 10): Excellent. This fails the same way on gcc 5.2.0 and 5.5.0? > Maybe some pass test from int128test2.c can be used to test __int128? Yeah, I can work with this. What I propose to do is use a somewhat stripped-down version of this test as an AC_RUN_IFELSE test normally, but if cross-compiling, fall back to just seeing if we can link. Thanks for investigating! regards, tom lane
Sorry, diff.patch is attached now. On 17-01-2018 18:02, Marina Polyakova wrote: > [I added Victor Wagner as co-researcher of this problem] > > On 13-01-2018 21:10, Tom Lane wrote: >> In the end this might just be an instance of the old saw about >> avoiding dot-zero releases. Have you tried a newer gcc? >> (Digging in their bugzilla finds quite a number of __int128 bugs >> fixed in 5.4.x, though none look to be specifically about >> misaligned data.) > > gcc 5.5.0 (from [1]) did not fix the problem.. > > On 16-01-2018 2:41, Tom Lane wrote: >> Marina Polyakova <m.polyakova@postgrespro.ru> writes: >>> On 13-01-2018 21:10, Tom Lane wrote: >>>> I'm not sure there's much we can do about this. Dropping the use >>>> of the alignment spec isn't a workable option. If there were a >>>> simple way for configure to detect that the compiler generates bad >>>> code for that, we could have it do so and reject use of __int128, >>>> but it'd be up to you to come up with a workable test. >> >>> I'll think about it.. >> >> Attached is a possible test program. I can confirm it passes on a >> machine with working __int128, but I have no idea whether it will >> detect the problem on yours. If not, maybe you can tweak it? > > Thank you! Using gcc 5.5.0 it prints that everything is ok. But, > investigating the regression diffs, we found out that the error occurs > when we pass int128 as not the first argument to the function (perhaps > its value is replaced by the value of some address): > > -- Use queries from random.sql > SELECT count(*) FROM onek; -- Everything is ok > ... > SELECT random, count(random) FROM RANDOM_TBL > GROUP BY random HAVING count(random) > 3; -- Everything is ok > > postgres=# SELECT * FROM RANDOM_TBL ORDER BY random; -- Print current > data > random > -------- > 78 > 86 > 98 > 98 > (4 rows) > > postgres=# SELECT AVG(random) FROM RANDOM_TBL > postgres-# HAVING AVG(random) NOT BETWEEN 80 AND 120; -- Oops! > avg > ------------------------------- > 79446934848446476698976780288 > (1 row) > > Debug output from the last query (see attached diff.patch, it is based > on commit 9c7d06d60680c7f00d931233873dee81fdb311c6 of master): > > makeInt128AggState > int8_avg_accum val 98 > int8_avg_accum val_int128 as 2 x int64: 0 98 > int8_avg_accum val_int128 bytes: 00000000000000000000000000000062 > int8_avg_accum state 100e648d8 > int8_avg_accum 1007f2e94 > do_int128_accum int128 newval as 2 x int64: 4306826968 0 > do_int128_accum int128 newval bytes: 0000000100B4F6D80000000000000000 > do_int128_accum state 100e648d8 > do_int128_accum 1007f1e30 > int8_avg_accum val 86 > int8_avg_accum val_int128 as 2 x int64: 0 86 > int8_avg_accum val_int128 bytes: 00000000000000000000000000000056 > int8_avg_accum state 100e648d8 > int8_avg_accum 1007f2e94 > do_int128_accum int128 newval as 2 x int64: 4306826968 0 > do_int128_accum int128 newval bytes: 0000000100B4F6D80000000000000000 > do_int128_accum state 100e648d8 > do_int128_accum 1007f1e30 > int8_avg_accum val 98 > int8_avg_accum val_int128 as 2 x int64: 0 98 > int8_avg_accum val_int128 bytes: 00000000000000000000000000000062 > int8_avg_accum state 100e648d8 > int8_avg_accum 1007f2e94 > do_int128_accum int128 newval as 2 x int64: 4306826968 0 > do_int128_accum int128 newval bytes: 0000000100B4F6D80000000000000000 > do_int128_accum state 100e648d8 > do_int128_accum 1007f1e30 > int8_avg_accum val 78 > int8_avg_accum val_int128 as 2 x int64: 0 78 > int8_avg_accum val_int128 bytes: 0000000000000000000000000000004E > int8_avg_accum state 100e648d8 > int8_avg_accum 1007f2e94 > do_int128_accum int128 newval as 2 x int64: 4306826968 0 > do_int128_accum int128 newval bytes: 0000000100B4F6D80000000000000000 > do_int128_accum state 100e648d8 > do_int128_accum 1007f1e30 > numeric_poly_avg > int128_to_numericvar > int128_to_numericvar int128 val as 2 x int64: 17227307872 0 > int128_to_numericvar int128 val bytes: 0000000402D3DB600000000000000000 > > (val_int128 in the function int8_avg_accum is correct, but newval in > the function do_int128_accum is not equal to it. val in the function > int128_to_numericvar is (4 * 4306826968).) > > Based on this, we modified the test program (see attached). Here is > its output on Solaris 10 for different alignments requirements for > int128 (on my machine where make check-world passes everything is OK) > (ALIGNOF_PG_INT128_TYPE is 16 on Solaris 10): > > $ gcc -D PG_ALIGN_128=16 -m64 -o int128test2 int128test2.c > $ ./int128test2 > basic aritmetic OK > pass int 16 OK > pass uint 16 OK > pass int 32 OK > pass int 64 OK > pass int 128 OK > $ gcc -D PG_ALIGN_128=8 -m64 -o int128test2 int128test2.c > $ ./int128test2 > basic aritmetic OK > pass int 16 FAILED > pass uint 16 FAILED > pass int 32 FAILED > pass int 64 FAILED > pass int 128 OK > > Maybe some pass test from int128test2.c can be used to test __int128? > > P.S. I suppose, g.b should be 97656250 to get 400000000005: > >> struct glob128 >> { >> __int128 start; >> char pad; >> int128a a; >> int128a b; >> int128a c; >> int128a d; >> } g = {0, 'p', 48828125, 97656255, 0, 0}; >> ... >> g.b = (g.b << 12) + 5; /* 400000000005 */ > > [1] https://www.opencsw.org -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
On Wed, 17 Jan 2018 18:02:26 +0300 Marina Polyakova <m.polyakova@postgrespro.ru> wrote: > > Attached is a possible test program. I can confirm it passes on a > > machine with working __int128, but I have no idea whether it will > > detect the problem on yours. If not, maybe you can tweak it? > > Thank you! Using gcc 5.5.0 it prints that everything is ok. But, > investigating the regression diffs, we found out that the error > occurs when we pass int128 as not the first argument to the function > (perhaps its value is replaced by the value of some address): I'm attaching stripped-down version of test program, which demonstrate the problem and two assembler listings produced with this C source using alignment 8 and 16. May be this stripped-down version can be used as base for configure test. As it turns out, Sparc GCC passes function arguments via register ring which is referenced as %on in the calling code and as %in in function. And somehow it happens that alignment attribute of typedef affects access to arguments in the function, but doesn't affect how regiser ring is filled before call. Looks like bug in GCC. Unfortunately, we have only one Sparc machine and started our investigation by upgrading GCC 5.2.0 to GCC 5.5.0, so it is hard to downgrade and test with older GCC. --
Attachment
BTW, now that you've demonstrated that the bug exists in a current gcc release, you should definitely file a bug at https://gcc.gnu.org/bugzilla/ I think you can just give them int128test2.c as-is as a test case. Please do that and let me know the PR number --- I think it would be good to cite the bug specifically in the comments for our configure code. regards, tom lane
On Wed, 17 Jan 2018 10:07:37 -0500 Tom Lane <tgl@sss.pgh.pa.us> wrote: > Marina Polyakova <m.polyakova@postgrespro.ru> writes: > Yeah, I can work with this. What I propose to do is use a somewhat > stripped-down version of this test as an AC_RUN_IFELSE test normally, > but if cross-compiling, fall back to just seeing if we can link. I'd suggest to add a configure option to switch off 128-bit support (--disable-int128), especially for these cross-compile cases where link test cannot give us enough information to decide automatically. --
Victor Wagner <vitus@wagner.pp.ru> writes: > On Wed, 17 Jan 2018 10:07:37 -0500 > Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Yeah, I can work with this. What I propose to do is use a somewhat >> stripped-down version of this test as an AC_RUN_IFELSE test normally, >> but if cross-compiling, fall back to just seeing if we can link. > I'd suggest to add a configure option to switch off 128-bit support > (--disable-int128), especially for these cross-compile cases where link > test cannot give us enough information to decide automatically. I don't want to go there without some evidence that the problem is much more widespread than it appears now. A disable switch will be a permanent documentation and maintenance overhead, plus anyone who puts it into their build scripts will probably never remember to remove it :-(. And how many people will be cross-compiling to Solaris/SPARC anyway? (If there are any, they can always manually change pg_config.h ...) regards, tom lane
Victor Wagner <vitus@wagner.pp.ru> writes: > I'm attaching stripped-down version of test program, which demonstrate > the problem and two assembler listings produced with this C source using > alignment 8 and 16. May be this stripped-down version can be used as > base for configure test. Ah, thanks, this will be easier to work with. > Unfortunately, we have only one Sparc machine and started our > investigation by upgrading GCC 5.2.0 to GCC 5.5.0, so it is hard to > downgrade and test with older GCC. OK. Well, the fact that you get the same failures in PG itself seems like good evidence that the compiler's behavior hasn't changed here. regards, tom lane
On 17-01-2018 18:07, Tom Lane wrote: > Marina Polyakova <m.polyakova@postgrespro.ru> writes: >> investigating the regression diffs, we found out that the error occurs >> when we pass int128 as not the first argument to the function (perhaps >> its value is replaced by the value of some address): >> ... >> Based on this, we modified the test program (see attached). Here is >> its >> output on Solaris 10 for different alignments requirements for int128 >> (on my machine where make check-world passes everything is OK) >> (ALIGNOF_PG_INT128_TYPE is 16 on Solaris 10): > > Excellent. This fails the same way on gcc 5.2.0 and 5.5.0? As Victor answered in [1]: > Unfortunately, we have only one Sparc machine and started our > investigation by upgrading GCC 5.2.0 to GCC 5.5.0, so it is hard to > downgrade and test with older GCC. >> Maybe some pass test from int128test2.c can be used to test __int128? > > Yeah, I can work with this. What I propose to do is use a somewhat > stripped-down version of this test as an AC_RUN_IFELSE test normally, > but if cross-compiling, fall back to just seeing if we can link. Thanks, I'll try to do this.. And Victor attached a stripped-down version of this in [1]. > Thanks for investigating! Thank you! :) [1] https://www.postgresql.org/message-id/20180117181359.3a6cc06c%40fafnir.local.vm -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Marina Polyakova <m.polyakova@postgrespro.ru> writes: > On 17-01-2018 18:07, Tom Lane wrote: >> Yeah, I can work with this. What I propose to do is use a somewhat >> stripped-down version of this test as an AC_RUN_IFELSE test normally, >> but if cross-compiling, fall back to just seeing if we can link. > Thanks, I'll try to do this.. And Victor attached a stripped-down > version of this in [1]. Oh, I was already working on a patch, thanks. regards, tom lane
On 17-01-2018 18:28, Tom Lane wrote: > BTW, now that you've demonstrated that the bug exists in a current > gcc release, you should definitely file a bug at > https://gcc.gnu.org/bugzilla/ > I think you can just give them int128test2.c as-is as a test case. > > Please do that and let me know the PR number --- I think it would be > good to cite the bug specifically in the comments for our configure > code. Thanks, I'll try to do it. -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attached is a draft patch to incorporate Victor's slimmed-down test into configure. If you have a chance, could you confirm it does the right thing on your Sparc machine? BTW, it would be a good idea to set up a buildfarm member on that machine, if you care about whether that configuration continues to work in the future. regards, tom lane diff --git a/config/c-compiler.m4 b/config/c-compiler.m4 index 076656c..cbfccf6 100644 *** a/config/c-compiler.m4 --- b/config/c-compiler.m4 *************** AC_DEFUN([PGAC_TYPE_128BIT_INT], *** 108,136 **** [AC_CACHE_CHECK([for __int128], [pgac_cv__128bit_int], [AC_LINK_IFELSE([AC_LANG_PROGRAM([ /* * These are globals to discourage the compiler from folding all the * arithmetic tests down to compile-time constants. We do not have ! * convenient support for 64bit literals at this point... */ __int128 a = 48828125; ! __int128 b = 97656255; ],[ __int128 c,d; a = (a << 12) + 1; /* 200000000001 */ b = (b << 12) + 5; /* 400000000005 */ ! /* use the most relevant arithmetic ops */ c = a * b; d = (c + b) / b; ! /* return different values, to prevent optimizations */ if (d != a+1) ! return 0; ! return 1; ])], [pgac_cv__128bit_int=yes], [pgac_cv__128bit_int=no])]) if test x"$pgac_cv__128bit_int" = xyes ; then ! AC_DEFINE(PG_INT128_TYPE, __int128, [Define to the name of a signed 128-bit integer type.]) ! AC_CHECK_ALIGNOF(PG_INT128_TYPE) fi])# PGAC_TYPE_128BIT_INT --- 108,166 ---- [AC_CACHE_CHECK([for __int128], [pgac_cv__128bit_int], [AC_LINK_IFELSE([AC_LANG_PROGRAM([ /* + * We don't actually run this test, just link it to verify that any support + * functions needed for __int128 are present. + * * These are globals to discourage the compiler from folding all the * arithmetic tests down to compile-time constants. We do not have ! * convenient support for 128bit literals at this point... */ __int128 a = 48828125; ! __int128 b = 97656250; ],[ __int128 c,d; a = (a << 12) + 1; /* 200000000001 */ b = (b << 12) + 5; /* 400000000005 */ ! /* try the most relevant arithmetic ops */ c = a * b; d = (c + b) / b; ! /* must use the results, else compiler may optimize arithmetic away */ if (d != a+1) ! return 1; ])], [pgac_cv__128bit_int=yes], [pgac_cv__128bit_int=no])]) if test x"$pgac_cv__128bit_int" = xyes ; then ! # Some versions of gcc have problems passing __int128 function arguments ! # when using non-default alignment. Test that, if not cross-compiling. ! AC_CACHE_CHECK([for __int128 alignment bug], [pgac_cv__128bit_int_bug], ! [AC_RUN_IFELSE([AC_LANG_PROGRAM([ ! /* This must match the corresponding code in c.h: */ ! #if defined(__GNUC__) || defined(__SUNPRO_C) || defined(__IBMC__) ! #define pg_attribute_aligned(a) __attribute__((aligned(a))) ! #endif ! typedef __int128 int128a ! #if defined(pg_attribute_aligned) ! pg_attribute_aligned(8) ! #endif ! ; ! int128a holder; ! void pass_by_val(void *buffer, int128a par) { holder = par; } ! ],[ ! long int i64 = 97656225L << 12; ! int128a q; ! pass_by_val(main, (int128a) i64); ! q = (int128a) i64; ! if (q != holder) ! return 1; ! ])], ! [pgac_cv__128bit_int_bug=ok], ! [pgac_cv__128bit_int_bug=broken], ! [pgac_cv__128bit_int_bug=assuming-ok])]) ! if test x"$pgac_cv__128bit_int_bug" != xbroken ; then ! AC_DEFINE(PG_INT128_TYPE, __int128, [Define to the name of a signed 128-bit integer type.]) ! AC_CHECK_ALIGNOF(PG_INT128_TYPE) ! fi fi])# PGAC_TYPE_128BIT_INT diff --git a/configure b/configure index 45221e1..6eaed45 100755 *** a/configure --- b/configure *************** else *** 14996,15007 **** /* end confdefs.h. */ /* * These are globals to discourage the compiler from folding all the * arithmetic tests down to compile-time constants. We do not have ! * convenient support for 64bit literals at this point... */ __int128 a = 48828125; ! __int128 b = 97656255; int main () --- 14996,15010 ---- /* end confdefs.h. */ /* + * We don't actually run this test, just link it to verify that any support + * functions needed for __int128 are present. + * * These are globals to discourage the compiler from folding all the * arithmetic tests down to compile-time constants. We do not have ! * convenient support for 128bit literals at this point... */ __int128 a = 48828125; ! __int128 b = 97656250; int main () *************** main () *** 15010,15022 **** __int128 c,d; a = (a << 12) + 1; /* 200000000001 */ b = (b << 12) + 5; /* 400000000005 */ ! /* use the most relevant arithmetic ops */ c = a * b; d = (c + b) / b; ! /* return different values, to prevent optimizations */ if (d != a+1) ! return 0; ! return 1; ; return 0; --- 15013,15024 ---- __int128 c,d; a = (a << 12) + 1; /* 200000000001 */ b = (b << 12) + 5; /* 400000000005 */ ! /* try the most relevant arithmetic ops */ c = a * b; d = (c + b) / b; ! /* must use the results, else compiler may optimize arithmetic away */ if (d != a+1) ! return 1; ; return 0; *************** fi *** 15033,15042 **** { $as_echo "$as_me:${as_lineno-$LINENO}: result: $pgac_cv__128bit_int" >&5 $as_echo "$pgac_cv__128bit_int" >&6; } if test x"$pgac_cv__128bit_int" = xyes ; then $as_echo "#define PG_INT128_TYPE __int128" >>confdefs.h ! # The cast to long int works around a bug in the HP C Compiler, # see AC_CHECK_SIZEOF for more information. { $as_echo "$as_me:${as_lineno-$LINENO}: checking alignment of PG_INT128_TYPE" >&5 $as_echo_n "checking alignment of PG_INT128_TYPE... " >&6; } --- 15035,15097 ---- { $as_echo "$as_me:${as_lineno-$LINENO}: result: $pgac_cv__128bit_int" >&5 $as_echo "$pgac_cv__128bit_int" >&6; } if test x"$pgac_cv__128bit_int" = xyes ; then + # Some versions of gcc have problems passing __int128 function arguments + # when using non-default alignment. Test that, if not cross-compiling. + { $as_echo "$as_me:${as_lineno-$LINENO}: checking for __int128 alignment bug" >&5 + $as_echo_n "checking for __int128 alignment bug... " >&6; } + if ${pgac_cv__128bit_int_bug+:} false; then : + $as_echo_n "(cached) " >&6 + else + if test "$cross_compiling" = yes; then : + pgac_cv__128bit_int_bug=assuming-ok + else + cat confdefs.h - <<_ACEOF >conftest.$ac_ext + /* end confdefs.h. */ + + /* This must match the corresponding code in c.h: */ + #if defined(__GNUC__) || defined(__SUNPRO_C) || defined(__IBMC__) + #define pg_attribute_aligned(a) __attribute__((aligned(a))) + #endif + typedef __int128 int128a + #if defined(pg_attribute_aligned) + pg_attribute_aligned(8) + #endif + ; + int128a holder; + void pass_by_val(void *buffer, int128a par) { holder = par; } + + int + main () + { + + long int i64 = 97656225L << 12; + int128a q; + pass_by_val(main, (int128a) i64); + q = (int128a) i64; + if (q != holder) + return 1; + + ; + return 0; + } + _ACEOF + if ac_fn_c_try_run "$LINENO"; then : + pgac_cv__128bit_int_bug=ok + else + pgac_cv__128bit_int_bug=broken + fi + rm -f core *.core core.conftest.* gmon.out bb.out conftest$ac_exeext \ + conftest.$ac_objext conftest.beam conftest.$ac_ext + fi + + fi + { $as_echo "$as_me:${as_lineno-$LINENO}: result: $pgac_cv__128bit_int_bug" >&5 + $as_echo "$pgac_cv__128bit_int_bug" >&6; } + if test x"$pgac_cv__128bit_int_bug" != xbroken ; then $as_echo "#define PG_INT128_TYPE __int128" >>confdefs.h ! # The cast to long int works around a bug in the HP C Compiler, # see AC_CHECK_SIZEOF for more information. { $as_echo "$as_me:${as_lineno-$LINENO}: checking alignment of PG_INT128_TYPE" >&5 $as_echo_n "checking alignment of PG_INT128_TYPE... " >&6; } *************** cat >>confdefs.h <<_ACEOF *** 15071,15076 **** --- 15126,15132 ---- _ACEOF + fi fi # Check for various atomic operations now that we have checked how to declare
В Wed, 17 Jan 2018 11:33:09 -0500 Tom Lane <tgl@sss.pgh.pa.us> пишет: > Attached is a draft patch to incorporate Victor's slimmed-down test > into configure. If you have a chance, could you confirm it does > the right thing on your Sparc machine? > Definitely. As soon as next work day begins in Moscow. > BTW, it would be a good idea to set up a buildfarm member on that > machine, if you care about whether that configuration continues > to work in the future. Really we already have buildfarm member on this machine. It is just member of PostgresPro private buildfarm, not of big comminity buildfarm. So, I'll register it in the big buildfarm as soon as I figure out how to distribute limited resources of this machine between two buildfarms. -- Victor Wagner <vitus@wagner.pp.ru>
On Wed, 17 Jan 2018 11:33:09 -0500 Tom Lane <tgl@sss.pgh.pa.us> wrote: > Attached is a draft patch to incorporate Victor's slimmed-down test > into configure. If you have a chance, could you confirm it does > the right thing on your Sparc machine? It seems that what it does is not exactly a right thing. I've applied it to commit 9c7d06d60680 in master and see following $ ./configure CC="gcc -m64" [skip] checking for __int128... yes checking for __int128 alignment bug... ok checking alignment of PG_INT128_TYPE... 16 checking for builtin __sync char locking functions... yes [skip] As far as I understand your patch, there should be: checking for __int128 alignment bug... broken Then in the pg_config.h I see /* The normal alignment of `PG_INT128_TYPE', in bytes. */ #define ALIGNOF_PG_INT128_TYPE 16 /* Define to the name of a signed 128-bit integer type. */ #define PG_INT128_TYPE __int128 However, make check passes. There are two things which puzzle me 1. Why test program doesn't detect bug. If I cut'n'paste it from configure, compile with flags, cut'n'pasted from config log and run, it returns 1. But configure tells that all is ok 2. If bug exist and is not detected by configure why make check passes. We, Marina and I would continue investigation.
Victor Wagner <vitus@wagner.pp.ru> writes: > Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Attached is a draft patch to incorporate Victor's slimmed-down test >> into configure. If you have a chance, could you confirm it does >> the right thing on your Sparc machine? > It seems that what it does is not exactly a right thing. > I've applied it to commit 9c7d06d60680 in master and see following > checking for __int128 alignment bug... ok > As far as I understand your patch, there should be: > checking for __int128 alignment bug... broken Yes, that's what I expected to happen. > Then in the pg_config.h I see > /* The normal alignment of `PG_INT128_TYPE', in bytes. */ > #define ALIGNOF_PG_INT128_TYPE 16 > /* Define to the name of a signed 128-bit integer type. */ > #define PG_INT128_TYPE __int128 That's what I'd expect if configure thinks all's well :-( > However, make check passes. Uh ... how could that be? If the output of configure is exactly the same as before the patch, how could the end result be different? > We, Marina and I would continue investigation. I look forward to some results ... but I'm going to bed now ... regards, tom lane
Thank you! On 17-01-2018 19:33, Tom Lane wrote: > Attached is a draft patch to incorporate Victor's slimmed-down test > into configure. If you have a chance, could you confirm it does > the right thing on your Sparc machine? > Victor Wagner <vitus(at)wagner(dot)pp(dot)ru> writes: >> It seems that what it does is not exactly a right thing. >> I've applied it to commit 9c7d06d60680 in master and see following >> checking for __int128 alignment bug... ok >> As far as I understand your patch, there should be: >> checking for __int128 alignment bug... broken > ... >> Then in the pg_config.h I see >> /* The normal alignment of `PG_INT128_TYPE', in bytes. */ >> #define ALIGNOF_PG_INT128_TYPE 16 >> /* Define to the name of a signed 128-bit integer type. */ >> #define PG_INT128_TYPE __int128 > ... >> However, make check passes. > Uh ... how could that be? If the output of configure is exactly > the same as before the patch, how could the end result be different? Applying your patch on commit f033462d8f77c40b7d6b33c5116e50118fb4699d and using the configuration command from [1], I got: checking for __int128... yes checking for __int128 alignment bug... broken Nothing is defined for int128 in pg_config.h: /* The normal alignment of `PG_INT128_TYPE', in bytes. */ /* #undef ALIGNOF_PG_INT128_TYPE */ ... /* Define to the name of a signed 128-bit integer type. */ /* #undef PG_INT128_TYPE */ And make check-world passes. Victor said that he used a much simpler configuration command, and I'm trying to figure out what's changed.. > BTW, it would be a good idea to set up a buildfarm member on that > machine, if you care about whether that configuration continues > to work in the future. Victor answered this in [2]: > Really we already have buildfarm member on this machine. It is just > member of PostgresPro private buildfarm, not of big comminity > buildfarm. > > So, I'll register it in the big buildfarm as soon as I figure out how > to distribute limited resources of this machine between two buildfarms. P.S. I found the trailing whitespace in line 80: ! int128a q; [1] https://www.postgresql.org/message-id/0d3a9fa264cebe1cb9966f37b7c06e86%40postgrespro.ru [2] https://www.postgresql.org/message-id/20180117203648.2626d97a%40wagner.wagner.home -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On 17-01-2018 18:50, Marina Polyakova wrote: > On 17-01-2018 18:28, Tom Lane wrote: >> BTW, now that you've demonstrated that the bug exists in a current >> gcc release, you should definitely file a bug at >> https://gcc.gnu.org/bugzilla/ >> I think you can just give them int128test2.c as-is as a test case. >> >> Please do that and let me know the PR number --- I think it would be >> good to cite the bug specifically in the comments for our configure >> code. > > Thanks, I'll try to do it. If I understand correctly, its PR number is 83925 (see [1]). [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83925 -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On Thu, 18 Jan 2018 01:47:46 -0500 Tom Lane <tgl@sss.pgh.pa.us> wrote: > Victor Wagner <vitus@wagner.pp.ru> writes: > > Tom Lane <tgl@sss.pgh.pa.us> wrote: > > checking for __int128 alignment bug... ok > > As far as I understand your patch, there should be: > > checking for __int128 alignment bug... broken > > Yes, that's what I expected to happen. It seems that I've made some unreproducible mistake last night applying your patch. Marina repeapplied it later and everything works. I'we cherrypicked in as far back as 9.5, and with these versions it works too.
Marina Polyakova <m.polyakova@postgrespro.ru> writes: > Applying your patch on commit f033462d8f77c40b7d6b33c5116e50118fb4699d > and using the configuration command from [1], I got: > checking for __int128... yes > checking for __int128 alignment bug... broken > ... > And make check-world passes. Victor said that he used a much simpler > configuration command, and I'm trying to figure out what's changed.. Weird. Maybe the gcc bug only manifests with certain optimization flags? That's not what I'd have expected from Victor's theory about why the code is wrong, but if it only shows up some of the time, it's hard to think of another explanation. > P.S. I found the trailing whitespace in line 80: > ! int128a q; Ah, thanks. Probably git would've whined about that when I went to commit, but it's good to catch sooner. regards, tom lane
On Thu, 18 Jan 2018 09:56:48 -0500 Tom Lane <tgl@sss.pgh.pa.us> wrote: > Marina Polyakova <m.polyakova@postgrespro.ru> writes: > > Applying your patch on commit > > f033462d8f77c40b7d6b33c5116e50118fb4699d and using the > > configuration command from [1], I got: checking for __int128... yes > > checking for __int128 alignment bug... broken > > ... > > And make check-world passes. Victor said that he used a much > > simpler configuration command, and I'm trying to figure out what's > > changed.. > > Weird. Maybe the gcc bug only manifests with certain optimization > flags? That's not what I'd have expected from Victor's theory about No. I've compiled test program without any optimizationf flags. Just -m64, which tells compiler to generate 64-bit code. (in 32-bit mode there is no __int128, so problem wouldn't manifest inself). From the other side, when I've tried to resolve issue with not worked test, I've copied all gcc flags from config.log, and test program returned 1 with exactly same flags. Probably, I should have to regenerate configure with autoconf. instead of applying patch to configure. --
Victor Wagner <vitus@wagner.pp.ru> writes: > It seems that I've made some unreproducible mistake last night > applying your patch. Marina repeapplied it later and everything works. Ah, thanks for following up. I'll adjust the comment to include the gcc PR and push it shortly. regards, tom lane
On 18-01-2018 17:56, Tom Lane wrote: > Marina Polyakova <m.polyakova@postgrespro.ru> writes: >> Applying your patch on commit f033462d8f77c40b7d6b33c5116e50118fb4699d >> and using the configuration command from [1], I got: >> checking for __int128... yes >> checking for __int128 alignment bug... broken >> ... >> And make check-world passes. Victor said that he used a much simpler >> configuration command, and I'm trying to figure out what's changed.. > > Weird. Maybe the gcc bug only manifests with certain optimization > flags? That's not what I'd have expected from Victor's theory about > why the code is wrong, but if it only shows up some of the time, > it's hard to think of another explanation. Thank you! Using ./configure CC="gcc" CFLAGS="-m64 -O1" on commit 9c7d06d60680 with your patch, I got this: checking for __int128... yes checking for __int128 alignment bug... ok checking alignment of PG_INT128_TYPE... 16 In pg_config.h: /* The normal alignment of `PG_INT128_TYPE', in bytes. */ #define ALIGNOF_PG_INT128_TYPE 16 ... /* Define to the name of a signed 128-bit integer type. */ #define PG_INT128_TYPE __int128 But make check got the same failures, and I see the same debug output as in [1].. P.S. As I understand it, this comment on bugzilla [2] is also about this. [1] https://www.postgresql.org/message-id/90ab676392c8f9c84431976147097cf0%40postgrespro.ru [2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83925#c6 -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Marina Polyakova <m.polyakova@postgrespro.ru> writes: > On 18-01-2018 17:56, Tom Lane wrote: >> Weird. Maybe the gcc bug only manifests with certain optimization >> flags? That's not what I'd have expected from Victor's theory about >> why the code is wrong, but if it only shows up some of the time, >> it's hard to think of another explanation. > Thank you! Using ./configure CC="gcc" CFLAGS="-m64 -O1" on commit > 9c7d06d60680 with your patch, I got this: > [ configure check passes ] > But make check got the same failures, and I see the same debug output as > in [1].. Interesting. Maybe the parameter-passing misbehavior that Victor's test is looking for isn't the only associated bug. > P.S. As I understand it, this comment on bugzilla [2] is also about > this. > [2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83925#c6 Even more interesting, see c7 that was just posted there: >> Eric Botcazou 2018-01-18 16:22:48 UTC >> 128-bit types requite 128-bit alignment on SPARC64 so we cannot support that. So basically, we're outta luck and we have to consider __int128 as unsupportable on SPARC. I'm inclined to mechanize that as a test on $host_cpu. At least that means we don't need an AC_RUN test ;-) regards, tom lane
On 18-01-2018 19:53, Tom Lane wrote: > Marina Polyakova <m.polyakova@postgrespro.ru> writes: >> On 18-01-2018 17:56, Tom Lane wrote: >>> Weird. Maybe the gcc bug only manifests with certain optimization >>> flags? That's not what I'd have expected from Victor's theory about >>> why the code is wrong, but if it only shows up some of the time, >>> it's hard to think of another explanation. > >> Thank you! Using ./configure CC="gcc" CFLAGS="-m64 -O1" on commit >> 9c7d06d60680 with your patch, I got this: >> [ configure check passes ] >> But make check got the same failures, and I see the same debug output >> as >> in [1].. > > Interesting. Maybe the parameter-passing misbehavior that Victor's > test is looking for isn't the only associated bug. > >> P.S. As I understand it, this comment on bugzilla [2] is also about >> this. >> [2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83925#c6 > > Even more interesting, see c7 that was just posted there: > >>> Eric Botcazou 2018-01-18 16:22:48 UTC >>> 128-bit types requite 128-bit alignment on SPARC64 so we cannot >>> support that. > > So basically, we're outta luck and we have to consider __int128 as > unsupportable on SPARC. I'm inclined to mechanize that as a test on > $host_cpu. At least that means we don't need an AC_RUN test ;-) %-)) :-) Can I do something else about this problem?.. -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Marina Polyakova <m.polyakova@postgrespro.ru> writes: > On 18-01-2018 19:53, Tom Lane wrote: >> So basically, we're outta luck and we have to consider __int128 as >> unsupportable on SPARC. I'm inclined to mechanize that as a test on >> $host_cpu. At least that means we don't need an AC_RUN test ;-) > %-)) :-) > Can I do something else about this problem?.. I don't see any other workable alternative. The box we're in as far as the interaction with MAXALIGN goes is still the same as it was a month ago: raising MAXALIGN is impractical, and so is allowing some datatypes to have more-than-MAXALIGN alignment specs. I suppose you could imagine declaring int128s that are in any sort of palloc'd storage as, in effect, char[16], and always memcpy'ing to and from local variables that're declared int128 whenever you want to do arithmetic with them. But ugh. I can't see taking that sort of notational and performance hit for just one non-mainstream architecture. Really, this is something that the compiler ought to do for us, IMO. If the gcc guys don't want to be bothered, OK, but that tells you more about the priority they place on SPARC support than anything else. regards, tom lane
I wrote: > Marina Polyakova <m.polyakova@postgrespro.ru> writes: >> On 18-01-2018 19:53, Tom Lane wrote: >>> So basically, we're outta luck and we have to consider __int128 as >>> unsupportable on SPARC. I'm inclined to mechanize that as a test on >>> $host_cpu. At least that means we don't need an AC_RUN test ;-) >> %-)) :-) >> Can I do something else about this problem?.. > I don't see any other workable alternative. But ... let's not panic, but wait and see the final result of the discussion on the gcc PR. Jakub at least seems to think it ought to be a supportable case. What you could do in the meantime is work on finding a variation of Victor's test that will detect the bug regardless of -O level. If we do have hope that future gcc versions will handle this correctly, we'll need a better test rather than just summarily dismissing host_cpu = sparc. regards, tom lane
On 18-01-2018 20:24, Tom Lane wrote: > Marina Polyakova <m.polyakova@postgrespro.ru> writes: >> On 18-01-2018 19:53, Tom Lane wrote: >>> So basically, we're outta luck and we have to consider __int128 as >>> unsupportable on SPARC. I'm inclined to mechanize that as a test on >>> $host_cpu. At least that means we don't need an AC_RUN test ;-) > >> %-)) :-) >> Can I do something else about this problem?.. > > I don't see any other workable alternative. The box we're in as far > as the interaction with MAXALIGN goes is still the same as it was > a month ago: raising MAXALIGN is impractical, and so is allowing > some datatypes to have more-than-MAXALIGN alignment specs. > > I suppose you could imagine declaring int128s that are in any sort > of palloc'd storage as, in effect, char[16], and always memcpy'ing > to and from local variables that're declared int128 whenever you > want to do arithmetic with them. But ugh. I can't see taking that > sort of notational and performance hit for just one non-mainstream > architecture. > > Really, this is something that the compiler ought to do for us, IMO. > If the gcc guys don't want to be bothered, OK, but that tells you more > about the priority they place on SPARC support than anything else. Thank you very much for your explanations! So I'll go to all of your comments to my patch about stable functions when the next work day begins in Moscow) -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On Thu, Jan 18, 2018 at 12:24 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Marina Polyakova <m.polyakova@postgrespro.ru> writes: >> On 18-01-2018 19:53, Tom Lane wrote: >>> So basically, we're outta luck and we have to consider __int128 as >>> unsupportable on SPARC. I'm inclined to mechanize that as a test on >>> $host_cpu. At least that means we don't need an AC_RUN test ;-) > >> %-)) :-) >> Can I do something else about this problem?.. > > I don't see any other workable alternative. The box we're in as far > as the interaction with MAXALIGN goes is still the same as it was > a month ago: raising MAXALIGN is impractical, and so is allowing > some datatypes to have more-than-MAXALIGN alignment specs. > > I suppose you could imagine declaring int128s that are in any sort > of palloc'd storage as, in effect, char[16], and always memcpy'ing > to and from local variables that're declared int128 whenever you > want to do arithmetic with them. But ugh. I can't see taking that > sort of notational and performance hit for just one non-mainstream > architecture. It's not like we'd have to take the performance hit everywhere; we could do the expensive things only on platforms that need them. The trick would be to avoid too much notation. But it's not like we don't live with a lot of DatumGetThing and ThingGetDatum notation already. > Really, this is something that the compiler ought to do for us, IMO. > If the gcc guys don't want to be bothered, OK, but that tells you more > about the priority they place on SPARC support than anything else. Of course, the same accusation could be leveled at us. We don't require int128 support for correctness; we just use it for performance where it's available and works the way we want. Prolly, that means mainstream platforms. If we wanted to work harder, we could get it working in other places too. Or some other fix that delivers much of the same performance benefit. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 18-01-2018 20:34, Tom Lane wrote: > I wrote: > ... > But ... let's not panic, but wait and see the final result of the > discussion on the gcc PR. Jakub at least seems to think it ought > to be a supportable case. > > What you could do in the meantime is work on finding a variation of > Victor's test that will detect the bug regardless of -O level. > If we do have hope that future gcc versions will handle this correctly, > we'll need a better test rather than just summarily dismissing > host_cpu = sparc. Thanks, I'll try.. -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Robert Haas <robertmhaas@gmail.com> writes: > On Thu, Jan 18, 2018 at 12:24 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Really, this is something that the compiler ought to do for us, IMO. >> If the gcc guys don't want to be bothered, OK, but that tells you more >> about the priority they place on SPARC support than anything else. > Of course, the same accusation could be leveled at us. We don't > require int128 support for correctness; we just use it for performance > where it's available and works the way we want. Prolly, that means > mainstream platforms. If we wanted to work harder, we could get it > working in other places too. Or some other fix that delivers much of > the same performance benefit. Sure. Part of the equation here is that (IMO anyway) int128 isn't sufficiently performance-critical to us to justify putting enormous amounts of work into trying to make it go on non-mainstream platforms. It's possible that that could change in future ... but if part of the cost is notational changes that make it harder and more bug-prone to use int128 at all, then I daresay int128 will never become that performance-critical, because it would always remain a niche thing. regards, tom lane
On Thu, Jan 18, 2018 at 1:03 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Sure. Part of the equation here is that (IMO anyway) int128 isn't > sufficiently performance-critical to us to justify putting enormous > amounts of work into trying to make it go on non-mainstream platforms. > It's possible that that could change in future ... but if part of the > cost is notational changes that make it harder and more bug-prone > to use int128 at all, then I daresay int128 will never become that > performance-critical, because it would always remain a niche thing. That's possible. On the other hand, we lived for many years with painful workarounds for systems without working 64-bit integers, and those eventually became mainstream enough that we made them mandatory - and then ripped out some of the notational changes that we'd introduced to cope with platforms that didn't support them. So, the same thing might happen here, whatever we decide about this. Then again, 64 bit counters are already so large that it's hard to imagine ever having one overflow, so perhaps 128-bit values will never catch on in quite the same way. On the third hand, 640kB ought to be enough for anybody. Anyway, that's really an academic debate. My real point is: I do not think we should reject out of hand the idea that a patch introducing some new notation to deal with this might be acceptable. I am not volunteering to write such a patch, and anyone who tries should be aware that there is a chance that it will be rejected on grounds of ugliness. However, if they decide to try anyway, we should read the patch and see how ugly it really is. Maybe it's not that bad. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > Anyway, that's really an academic debate. My real point is: I do not > think we should reject out of hand the idea that a patch introducing > some new notation to deal with this might be acceptable. I am not > volunteering to write such a patch, and anyone who tries should be > aware that there is a chance that it will be rejected on grounds of > ugliness. However, if they decide to try anyway, we should read the > patch and see how ugly it really is. Maybe it's not that bad. Sure. I'm not intending to write such a patch either. regards, tom lane
On 18-01-2018 20:49, Marina Polyakova wrote: > On 18-01-2018 20:34, Tom Lane wrote: >> ... >> What you could do in the meantime is work on finding a variation of >> Victor's test that will detect the bug regardless of -O level. >> If we do have hope that future gcc versions will handle this >> correctly, >> we'll need a better test rather than just summarily dismissing >> host_cpu = sparc. > > Thanks, I'll try.. I tried different options of gcc but it did not help.. Perhaps searching in the source code of gcc will clarify something, but I'm sorry that I'm now too busy for this.. -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company