Thread: master make check fails on Solaris 10

master make check fails on Solaris 10

From
Marina Polyakova
Date:
Hello, hackers! I got a permanent failure of master (commit 
ca454b9bd34c75995eda4d07c9858f7c22890c2b) make check on Solaris 10. 
Regression output and diffs are attached.

I used the following commands:
./configure CC="ccache gcc" CFLAGS="-m64 -I/opt/csw/include" 
LDFLAGS="-L/opt/csw/lib/sparcv9 -L/usr/local/lib/64" --enable-cassert 
--enable-debug --enable-nls --with-perl --with-tcl --with-python 
--with-gssapi --with-openssl --with-ldap --with-libxml --with-libxslt
gmake > make_results.txt
gmake check

About the system: SunOS, Release 5.10, KernelID Generic_141444-09.

-- 
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachment

Re: master make check fails on Solaris 10

From
Tom Lane
Date:
Marina Polyakova <m.polyakova@postgrespro.ru> writes:
> Hello, hackers! I got a permanent failure of master (commit 
> ca454b9bd34c75995eda4d07c9858f7c22890c2b) make check on Solaris 10. 
> Regression output and diffs are attached.

Hm, buildfarm member protosciurus is running a similar configuration
without problems.  Looking at its configuration, maybe you need to
fool with LD_LIBRARY_PATH and/or LDFLAGS_SL?

            regards, tom lane


Re: master make check fails on Solaris 10

From
Andres Freund
Date:
Hi,

On 2018-01-11 20:21:11 +0300, Marina Polyakova wrote:
> Hello, hackers! I got a permanent failure of master (commit
> ca454b9bd34c75995eda4d07c9858f7c22890c2b) make check on Solaris 10.

Did this use to work? If so, could you check whether it worked before
69c3936a1499b772a749ae629fc59b2d72722332?

Greetings,

Andres Freund


Re: master make check fails on Solaris 10

From
Marina Polyakova
Date:
On 11-01-2018 20:34, Tom Lane wrote:
> Marina Polyakova <m.polyakova@postgrespro.ru> writes:
>> Hello, hackers! I got a permanent failure of master (commit
>> ca454b9bd34c75995eda4d07c9858f7c22890c2b) make check on Solaris 10.
>> Regression output and diffs are attached.
> 
> Hm, buildfarm member protosciurus is running a similar configuration
> without problems.  Looking at its configuration, maybe you need to
> fool with LD_LIBRARY_PATH and/or LDFLAGS_SL?

I added these parameters with the same values in configure 
(LDFLAGS_SL="-m64" 
LD_LIBRARY_PATH="/lib/64:/usr/lib/64:/usr/sfw/lib/64:/usr/local/lib"), 
there're the same failures :( (see the attached regression diffs and 
output)

-- 
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachment

Re: master make check fails on Solaris 10

From
Marina Polyakova
Date:
On 11-01-2018 20:39, Andres Freund wrote:
> Hi,
> 
> On 2018-01-11 20:21:11 +0300, Marina Polyakova wrote:
>> Hello, hackers! I got a permanent failure of master (commit
>> ca454b9bd34c75995eda4d07c9858f7c22890c2b) make check on Solaris 10.
> 
> Did this use to work?

It always fails if you have asked about this..

> If so, could you check whether it worked before
> 69c3936a1499b772a749ae629fc59b2d72722332?

- on the previous commit (272c2ab9fd0a604e3200030b1ea26fd464c44935) the 
same failures occur (see the attached regression diffs and output);
- on commit bf54c0f05c0a58db17627724a83e1b6d4ec2712c make check-world 
passes.
I'll try to find out from what commit it started.. Don't you have any 
suspicions?)

-- 
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachment

Re: master make check fails on Solaris 10

From
Alvaro Herrera
Date:
Marina Polyakova wrote:

> - on the previous commit (272c2ab9fd0a604e3200030b1ea26fd464c44935) the same
> failures occur (see the attached regression diffs and output);
> - on commit bf54c0f05c0a58db17627724a83e1b6d4ec2712c make check-world
> passes.
> I'll try to find out from what commit it started.. Don't you have any
> suspicions?)

Perhaps you can use "git bisect".

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: master make check fails on Solaris 10

From
Marina Polyakova
Date:
On 12-01-2018 18:12, Alvaro Herrera wrote:
> Marina Polyakova wrote:
> 
>> - on the previous commit (272c2ab9fd0a604e3200030b1ea26fd464c44935) 
>> the same
>> failures occur (see the attached regression diffs and output);
>> - on commit bf54c0f05c0a58db17627724a83e1b6d4ec2712c make check-world
>> passes.
>> I'll try to find out from what commit it started.. Don't you have any
>> suspicions?)
> 
> Perhaps you can use "git bisect".

Thanks, I'm doing the same thing :)

-- 
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: master make check fails on Solaris 10

From
Marina Polyakova
Date:
On 12-01-2018 14:05, Marina Polyakova wrote:
> - on the previous commit (272c2ab9fd0a604e3200030b1ea26fd464c44935)
> the same failures occur (see the attached regression diffs and
> output);
> - on commit bf54c0f05c0a58db17627724a83e1b6d4ec2712c make check-world 
> passes.
> I'll try to find out from what commit it started..

Binary search has shown that all these failures begin with commit 
7518049980be1d90264addab003476ae105f70d4 (Prevent int128 from requiring 
more than MAXALIGN alignment.). On the previous commit 
(91aec93e6089a5ba49cce0aca3bf7f7022d62ea4) make check-world passes.

-- 
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: master make check fails on Solaris 10

From
Tom Lane
Date:
Marina Polyakova <m.polyakova@postgrespro.ru> writes:
> On 12-01-2018 14:05, Marina Polyakova wrote:
>> - on the previous commit (272c2ab9fd0a604e3200030b1ea26fd464c44935)
>> the same failures occur (see the attached regression diffs and
>> output);
>> - on commit bf54c0f05c0a58db17627724a83e1b6d4ec2712c make check-world 
>> passes.
>> I'll try to find out from what commit it started..

> Binary search has shown that all these failures begin with commit 
> 7518049980be1d90264addab003476ae105f70d4 (Prevent int128 from requiring 
> more than MAXALIGN alignment.).

Hm ... so apparently, that compiler has bugs in handling nondefault
alignment specs.  You said upthread it was gcc, but what version
exactly?

            regards, tom lane


Re: master make check fails on Solaris 10

From
Marina Polyakova
Date:
On 12-01-2018 21:00, Tom Lane wrote:
> Marina Polyakova <m.polyakova@postgrespro.ru> writes:
>> ...
>> Binary search has shown that all these failures begin with commit
>> 7518049980be1d90264addab003476ae105f70d4 (Prevent int128 from 
>> requiring
>> more than MAXALIGN alignment.).
> 
> Hm ... so apparently, that compiler has bugs in handling nondefault
> alignment specs.  You said upthread it was gcc, but what version
> exactly?

This is 5.2.0:

$ gcc -v
Reading specs from /opt/csw/lib/gcc/sparc-sun-solaris2.10/5.2.0/specs
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/opt/csw/libexec/gcc/sparc-sun-solaris2.10/5.2.0/lto-wrapper
Target: sparc-sun-solaris2.10
Configured with: 
/home/dam/mgar/pkg/gcc5/trunk/work/solaris10-sparc/build-isa-sparcv8plus/gcc-5.2.0/configure 
--prefix=/opt/csw --exec_prefix=/opt/csw --bindir=/opt/csw/bin 
--sbindir=/opt/csw/sbin --libexecdir=/opt/csw/libexec 
--datadir=/opt/csw/share --sysconfdir=/etc/opt/csw 
--sharedstatedir=/opt/csw/share --localstatedir=/var/opt/csw 
--libdir=/opt/csw/lib --infodir=/opt/csw/share/info 
--includedir=/opt/csw/include --mandir=/opt/csw/share/man 
--enable-cloog-backend=isl --enable-java-awt=xlib 
--enable-languages=ada,c,c++,fortran,go,java,objc --enable-libada 
--enable-libssp --enable-nls --enable-objc-gc --enable-threads=posix 
--program-suffix=-5.2 --with-cloog=/opt/csw --with-gmp=/opt/csw 
--with-included-gettext --with-ld=/usr/ccs/bin/ld --without-gnu-ld 
--with-libiconv-prefix=/opt/csw --with-mpfr=/opt/csw --with-ppl=/opt/csw 
--with-system-zlib=/opt/csw --with-as=/usr/ccs/bin/as --without-gnu-as
Thread model: posix
gcc version 5.2.0 (GCC)

-- 
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: master make check fails on Solaris 10

From
Tom Lane
Date:
Marina Polyakova <m.polyakova@postgrespro.ru> writes:
> On 12-01-2018 21:00, Tom Lane wrote:
>> Hm ... so apparently, that compiler has bugs in handling nondefault
>> alignment specs.  You said upthread it was gcc, but what version
>> exactly?

> This is 5.2.0:

Ugh ... protosciurus has 3.4.3, but I see that configure detects that
as *not* having __int128.  Probably what's happening on your machine
is that gcc knows __int128 but generates buggy code for it when an
alignment spec is given.  So that's unfortunate, but it's not really
a regression from 3.4.3.

I'm not sure there's much we can do about this.  Dropping the use
of the alignment spec isn't a workable option.  If there were a
simple way for configure to detect that the compiler generates bad
code for that, we could have it do so and reject use of __int128,
but it'd be up to you to come up with a workable test.

In the end this might just be an instance of the old saw about
avoiding dot-zero releases.  Have you tried a newer gcc?
(Digging in their bugzilla finds quite a number of __int128 bugs
fixed in 5.4.x, though none look to be specifically about
misaligned data.)

Also, if it still happens with current gcc on that hardware,
there'd be grounds for a new bug report to them.

            regards, tom lane


Re: master make check fails on Solaris 10

From
Marina Polyakova
Date:
Thank you very much!

On 13-01-2018 21:10, Tom Lane wrote:
> Marina Polyakova <m.polyakova@postgrespro.ru> writes:
>> On 12-01-2018 21:00, Tom Lane wrote:
>>> Hm ... so apparently, that compiler has bugs in handling nondefault
>>> alignment specs.  You said upthread it was gcc, but what version
>>> exactly?
> 
>> This is 5.2.0:
> 
> Ugh ... protosciurus has 3.4.3, but I see that configure detects that
> as *not* having __int128.  Probably what's happening on your machine
> is that gcc knows __int128 but generates buggy code for it when an
> alignment spec is given.  So that's unfortunate, but it's not really
> a regression from 3.4.3.
> 
> I'm not sure there's much we can do about this.  Dropping the use
> of the alignment spec isn't a workable option.  If there were a
> simple way for configure to detect that the compiler generates bad
> code for that, we could have it do so and reject use of __int128,
> but it'd be up to you to come up with a workable test.

I'll think about it..

> In the end this might just be an instance of the old saw about
> avoiding dot-zero releases.  Have you tried a newer gcc?
> (Digging in their bugzilla finds quite a number of __int128 bugs
> fixed in 5.4.x, though none look to be specifically about
> misaligned data.)

As I was told offlist, 5.2.0 is already a fairly new version of gcc for 
this system..

> Also, if it still happens with current gcc on that hardware,
> there'd be grounds for a new bug report to them.

-- 
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: master make check fails on Solaris 10

From
Tom Lane
Date:
Marina Polyakova <m.polyakova@postgrespro.ru> writes:
> On 13-01-2018 21:10, Tom Lane wrote:
>> I'm not sure there's much we can do about this.  Dropping the use
>> of the alignment spec isn't a workable option.  If there were a
>> simple way for configure to detect that the compiler generates bad
>> code for that, we could have it do so and reject use of __int128,
>> but it'd be up to you to come up with a workable test.

> I'll think about it..

Attached is a possible test program.  I can confirm it passes on a
machine with working __int128, but I have no idea whether it will
detect the problem on yours.  If not, maybe you can tweak it?

            regards, tom lane

#include <stddef.h>
#include <stdio.h>

/* GCC, Sunpro and XLC support aligned */
#if defined(__GNUC__) || defined(__SUNPRO_C) || defined(__IBMC__)
#define pg_attribute_aligned(a) __attribute__((aligned(a)))
#endif

typedef __int128 int128a
#if defined(pg_attribute_aligned)
pg_attribute_aligned(8)
#endif
;

/*
 * These are globals to discourage the compiler from folding all the
 * arithmetic tests down to compile-time constants.  We do not have
 * convenient support for 128bit literals at this point...
 */
struct glob128
{
    __int128    start;
    char        pad;
    int128a        a;
    int128a        b;
    int128a        c;
    int128a        d;
} g = {0, 'p', 48828125, 97656255, 0, 0};

int
main()
{
    if (offsetof(struct glob128, a) < 17 ||
        offsetof(struct glob128, a) > 24)
    {
        printf("wrong alignment, %d\n", (int) offsetof(struct glob128, a));
        return 1;
    }
    g.a = (g.a << 12) + 1;        /* 200000000001 */
    g.b = (g.b << 12) + 5;        /* 400000000005 */
    /* use the most relevant arithmetic ops */
    g.c = g.a * g.b;
    g.d = (g.c + g.b) / g.b;
    /* return different values, to prevent optimizations */
    if (g.d != g.a + 1)
    {
        printf("wrong arithmetic result\n");
        return 1;
    }
    printf("A-OK!\n");
    return 0;
}

Re: master make check fails on Solaris 10

From
Marina Polyakova
Date:
[I added Victor Wagner as co-researcher of this problem]

On 13-01-2018 21:10, Tom Lane wrote:
> In the end this might just be an instance of the old saw about
> avoiding dot-zero releases.  Have you tried a newer gcc?
> (Digging in their bugzilla finds quite a number of __int128 bugs
> fixed in 5.4.x, though none look to be specifically about
> misaligned data.)

gcc 5.5.0 (from [1]) did not fix the problem..

On 16-01-2018 2:41, Tom Lane wrote:
> Marina Polyakova <m.polyakova@postgrespro.ru> writes:
>> On 13-01-2018 21:10, Tom Lane wrote:
>>> I'm not sure there's much we can do about this.  Dropping the use
>>> of the alignment spec isn't a workable option.  If there were a
>>> simple way for configure to detect that the compiler generates bad
>>> code for that, we could have it do so and reject use of __int128,
>>> but it'd be up to you to come up with a workable test.
> 
>> I'll think about it..
> 
> Attached is a possible test program.  I can confirm it passes on a
> machine with working __int128, but I have no idea whether it will
> detect the problem on yours.  If not, maybe you can tweak it?

Thank you! Using gcc 5.5.0 it prints that everything is ok. But, 
investigating the regression diffs, we found out that the error occurs 
when we pass int128 as not the first argument to the function (perhaps 
its value is replaced by the value of some address):

-- Use queries from random.sql
SELECT count(*) FROM onek; -- Everything is ok
...
SELECT random, count(random) FROM RANDOM_TBL
   GROUP BY random HAVING count(random) > 3; -- Everything is ok

postgres=# SELECT * FROM RANDOM_TBL ORDER BY random; -- Print current 
data
  random
--------
      78
      86
      98
      98
(4 rows)

postgres=# SELECT AVG(random) FROM RANDOM_TBL
postgres-#   HAVING AVG(random) NOT BETWEEN 80 AND 120; -- Oops!
               avg
-------------------------------
  79446934848446476698976780288
(1 row)

Debug output from the last query (see attached diff.patch, it is based 
on commit 9c7d06d60680c7f00d931233873dee81fdb311c6 of master):

makeInt128AggState
int8_avg_accum val 98
int8_avg_accum val_int128 as 2 x int64: 0 98
int8_avg_accum val_int128 bytes: 00000000000000000000000000000062
int8_avg_accum state 100e648d8
int8_avg_accum 1007f2e94
do_int128_accum int128 newval as 2 x int64: 4306826968 0
do_int128_accum int128 newval bytes: 0000000100B4F6D80000000000000000
do_int128_accum state 100e648d8
do_int128_accum 1007f1e30
int8_avg_accum val 86
int8_avg_accum val_int128 as 2 x int64: 0 86
int8_avg_accum val_int128 bytes: 00000000000000000000000000000056
int8_avg_accum state 100e648d8
int8_avg_accum 1007f2e94
do_int128_accum int128 newval as 2 x int64: 4306826968 0
do_int128_accum int128 newval bytes: 0000000100B4F6D80000000000000000
do_int128_accum state 100e648d8
do_int128_accum 1007f1e30
int8_avg_accum val 98
int8_avg_accum val_int128 as 2 x int64: 0 98
int8_avg_accum val_int128 bytes: 00000000000000000000000000000062
int8_avg_accum state 100e648d8
int8_avg_accum 1007f2e94
do_int128_accum int128 newval as 2 x int64: 4306826968 0
do_int128_accum int128 newval bytes: 0000000100B4F6D80000000000000000
do_int128_accum state 100e648d8
do_int128_accum 1007f1e30
int8_avg_accum val 78
int8_avg_accum val_int128 as 2 x int64: 0 78
int8_avg_accum val_int128 bytes: 0000000000000000000000000000004E
int8_avg_accum state 100e648d8
int8_avg_accum 1007f2e94
do_int128_accum int128 newval as 2 x int64: 4306826968 0
do_int128_accum int128 newval bytes: 0000000100B4F6D80000000000000000
do_int128_accum state 100e648d8
do_int128_accum 1007f1e30
numeric_poly_avg
int128_to_numericvar
int128_to_numericvar int128 val as 2 x int64: 17227307872 0
int128_to_numericvar int128 val bytes: 0000000402D3DB600000000000000000

(val_int128 in the function int8_avg_accum is correct, but newval in the 
function do_int128_accum is not equal to it. val in the function 
int128_to_numericvar is (4 * 4306826968).)

Based on this, we modified the test program (see attached). Here is its 
output on Solaris 10 for different alignments requirements for int128 
(on my machine where make check-world passes everything is OK) 
(ALIGNOF_PG_INT128_TYPE is 16 on Solaris 10):

$ gcc -D PG_ALIGN_128=16 -m64 -o int128test2 int128test2.c
$ ./int128test2
basic aritmetic OK
pass int 16 OK
pass uint 16 OK
pass int 32 OK
pass int 64 OK
pass int 128 OK
$ gcc -D PG_ALIGN_128=8 -m64 -o int128test2 int128test2.c
$ ./int128test2
basic aritmetic OK
pass int 16 FAILED
pass uint 16 FAILED
pass int 32 FAILED
pass int 64 FAILED
pass int 128 OK

Maybe some pass test from int128test2.c can be used to test __int128?

P.S. I suppose, g.b should be 97656250 to get 400000000005:

> struct glob128
> {
>     __int128    start;
>     char        pad;
>     int128a        a;
>     int128a        b;
>     int128a        c;
>     int128a        d;
> } g = {0, 'p', 48828125, 97656255, 0, 0};
> ...
> g.b = (g.b << 12) + 5;        /* 400000000005 */

[1] https://www.opencsw.org

-- 
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachment

Re: master make check fails on Solaris 10

From
Tom Lane
Date:
Marina Polyakova <m.polyakova@postgrespro.ru> writes:
> investigating the regression diffs, we found out that the error occurs 
> when we pass int128 as not the first argument to the function (perhaps 
> its value is replaced by the value of some address):
> ...
> Based on this, we modified the test program (see attached). Here is its 
> output on Solaris 10 for different alignments requirements for int128 
> (on my machine where make check-world passes everything is OK) 
> (ALIGNOF_PG_INT128_TYPE is 16 on Solaris 10):

Excellent.  This fails the same way on gcc 5.2.0 and 5.5.0?

> Maybe some pass test from int128test2.c can be used to test __int128?

Yeah, I can work with this.  What I propose to do is use a somewhat
stripped-down version of this test as an AC_RUN_IFELSE test normally,
but if cross-compiling, fall back to just seeing if we can link.

Thanks for investigating!

            regards, tom lane


Re: master make check fails on Solaris 10

From
Marina Polyakova
Date:
Sorry, diff.patch is attached now.

On 17-01-2018 18:02, Marina Polyakova wrote:
> [I added Victor Wagner as co-researcher of this problem]
> 
> On 13-01-2018 21:10, Tom Lane wrote:
>> In the end this might just be an instance of the old saw about
>> avoiding dot-zero releases.  Have you tried a newer gcc?
>> (Digging in their bugzilla finds quite a number of __int128 bugs
>> fixed in 5.4.x, though none look to be specifically about
>> misaligned data.)
> 
> gcc 5.5.0 (from [1]) did not fix the problem..
> 
> On 16-01-2018 2:41, Tom Lane wrote:
>> Marina Polyakova <m.polyakova@postgrespro.ru> writes:
>>> On 13-01-2018 21:10, Tom Lane wrote:
>>>> I'm not sure there's much we can do about this.  Dropping the use
>>>> of the alignment spec isn't a workable option.  If there were a
>>>> simple way for configure to detect that the compiler generates bad
>>>> code for that, we could have it do so and reject use of __int128,
>>>> but it'd be up to you to come up with a workable test.
>> 
>>> I'll think about it..
>> 
>> Attached is a possible test program.  I can confirm it passes on a
>> machine with working __int128, but I have no idea whether it will
>> detect the problem on yours.  If not, maybe you can tweak it?
> 
> Thank you! Using gcc 5.5.0 it prints that everything is ok. But,
> investigating the regression diffs, we found out that the error occurs
> when we pass int128 as not the first argument to the function (perhaps
> its value is replaced by the value of some address):
> 
> -- Use queries from random.sql
> SELECT count(*) FROM onek; -- Everything is ok
> ...
> SELECT random, count(random) FROM RANDOM_TBL
>   GROUP BY random HAVING count(random) > 3; -- Everything is ok
> 
> postgres=# SELECT * FROM RANDOM_TBL ORDER BY random; -- Print current 
> data
>  random
> --------
>      78
>      86
>      98
>      98
> (4 rows)
> 
> postgres=# SELECT AVG(random) FROM RANDOM_TBL
> postgres-#   HAVING AVG(random) NOT BETWEEN 80 AND 120; -- Oops!
>               avg
> -------------------------------
>  79446934848446476698976780288
> (1 row)
> 
> Debug output from the last query (see attached diff.patch, it is based
> on commit 9c7d06d60680c7f00d931233873dee81fdb311c6 of master):
> 
> makeInt128AggState
> int8_avg_accum val 98
> int8_avg_accum val_int128 as 2 x int64: 0 98
> int8_avg_accum val_int128 bytes: 00000000000000000000000000000062
> int8_avg_accum state 100e648d8
> int8_avg_accum 1007f2e94
> do_int128_accum int128 newval as 2 x int64: 4306826968 0
> do_int128_accum int128 newval bytes: 0000000100B4F6D80000000000000000
> do_int128_accum state 100e648d8
> do_int128_accum 1007f1e30
> int8_avg_accum val 86
> int8_avg_accum val_int128 as 2 x int64: 0 86
> int8_avg_accum val_int128 bytes: 00000000000000000000000000000056
> int8_avg_accum state 100e648d8
> int8_avg_accum 1007f2e94
> do_int128_accum int128 newval as 2 x int64: 4306826968 0
> do_int128_accum int128 newval bytes: 0000000100B4F6D80000000000000000
> do_int128_accum state 100e648d8
> do_int128_accum 1007f1e30
> int8_avg_accum val 98
> int8_avg_accum val_int128 as 2 x int64: 0 98
> int8_avg_accum val_int128 bytes: 00000000000000000000000000000062
> int8_avg_accum state 100e648d8
> int8_avg_accum 1007f2e94
> do_int128_accum int128 newval as 2 x int64: 4306826968 0
> do_int128_accum int128 newval bytes: 0000000100B4F6D80000000000000000
> do_int128_accum state 100e648d8
> do_int128_accum 1007f1e30
> int8_avg_accum val 78
> int8_avg_accum val_int128 as 2 x int64: 0 78
> int8_avg_accum val_int128 bytes: 0000000000000000000000000000004E
> int8_avg_accum state 100e648d8
> int8_avg_accum 1007f2e94
> do_int128_accum int128 newval as 2 x int64: 4306826968 0
> do_int128_accum int128 newval bytes: 0000000100B4F6D80000000000000000
> do_int128_accum state 100e648d8
> do_int128_accum 1007f1e30
> numeric_poly_avg
> int128_to_numericvar
> int128_to_numericvar int128 val as 2 x int64: 17227307872 0
> int128_to_numericvar int128 val bytes: 0000000402D3DB600000000000000000
> 
> (val_int128 in the function int8_avg_accum is correct, but newval in
> the function do_int128_accum is not equal to it. val in the function
> int128_to_numericvar is (4 * 4306826968).)
> 
> Based on this, we modified the test program (see attached). Here is
> its output on Solaris 10 for different alignments requirements for
> int128 (on my machine where make check-world passes everything is OK)
> (ALIGNOF_PG_INT128_TYPE is 16 on Solaris 10):
> 
> $ gcc -D PG_ALIGN_128=16 -m64 -o int128test2 int128test2.c
> $ ./int128test2
> basic aritmetic OK
> pass int 16 OK
> pass uint 16 OK
> pass int 32 OK
> pass int 64 OK
> pass int 128 OK
> $ gcc -D PG_ALIGN_128=8 -m64 -o int128test2 int128test2.c
> $ ./int128test2
> basic aritmetic OK
> pass int 16 FAILED
> pass uint 16 FAILED
> pass int 32 FAILED
> pass int 64 FAILED
> pass int 128 OK
> 
> Maybe some pass test from int128test2.c can be used to test __int128?
> 
> P.S. I suppose, g.b should be 97656250 to get 400000000005:
> 
>> struct glob128
>> {
>>     __int128    start;
>>     char        pad;
>>     int128a        a;
>>     int128a        b;
>>     int128a        c;
>>     int128a        d;
>> } g = {0, 'p', 48828125, 97656255, 0, 0};
>> ...
>> g.b = (g.b << 12) + 5;        /* 400000000005 */
> 
> [1] https://www.opencsw.org

-- 
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachment

Re: master make check fails on Solaris 10

From
Victor Wagner
Date:
On Wed, 17 Jan 2018 18:02:26 +0300
Marina Polyakova <m.polyakova@postgrespro.ru> wrote:


> > Attached is a possible test program.  I can confirm it passes on a
> > machine with working __int128, but I have no idea whether it will
> > detect the problem on yours.  If not, maybe you can tweak it?
> 
> Thank you! Using gcc 5.5.0 it prints that everything is ok. But, 
> investigating the regression diffs, we found out that the error
> occurs when we pass int128 as not the first argument to the function
> (perhaps its value is replaced by the value of some address):

I'm attaching stripped-down version of test program, which demonstrate
the problem and two assembler listings produced with this C source using
alignment 8 and 16. May be this stripped-down version can be used as 
base for configure test.

As it turns out, Sparc GCC passes function arguments via register ring
which is referenced as %on in the calling code and as %in in function.

And somehow it happens that alignment attribute of typedef affects
access to arguments in the function, but doesn't affect how regiser
ring is filled before call. Looks like bug in GCC.

Unfortunately, we have only one Sparc machine and started our
investigation by upgrading GCC 5.2.0 to GCC 5.5.0, so it is hard to
downgrade and test with older GCC.

--
Attachment

Re: master make check fails on Solaris 10

From
Tom Lane
Date:
BTW, now that you've demonstrated that the bug exists in a current
gcc release, you should definitely file a bug at
https://gcc.gnu.org/bugzilla/
I think you can just give them int128test2.c as-is as a test case.

Please do that and let me know the PR number --- I think it would be
good to cite the bug specifically in the comments for our configure code.

            regards, tom lane


Re: master make check fails on Solaris 10

From
Victor Wagner
Date:
On Wed, 17 Jan 2018 10:07:37 -0500
Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Marina Polyakova <m.polyakova@postgrespro.ru> writes:

> Yeah, I can work with this.  What I propose to do is use a somewhat
> stripped-down version of this test as an AC_RUN_IFELSE test normally,
> but if cross-compiling, fall back to just seeing if we can link.

I'd suggest to add a configure option to switch off 128-bit support 
(--disable-int128), especially for these cross-compile cases where link
test cannot give us enough information to decide automatically.

--


Re: master make check fails on Solaris 10

From
Tom Lane
Date:
Victor Wagner <vitus@wagner.pp.ru> writes:
> On Wed, 17 Jan 2018 10:07:37 -0500
> Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Yeah, I can work with this.  What I propose to do is use a somewhat
>> stripped-down version of this test as an AC_RUN_IFELSE test normally,
>> but if cross-compiling, fall back to just seeing if we can link.

> I'd suggest to add a configure option to switch off 128-bit support 
> (--disable-int128), especially for these cross-compile cases where link
> test cannot give us enough information to decide automatically.

I don't want to go there without some evidence that the problem is much
more widespread than it appears now.  A disable switch will be a permanent
documentation and maintenance overhead, plus anyone who puts it into
their build scripts will probably never remember to remove it :-(.
And how many people will be cross-compiling to Solaris/SPARC anyway?
(If there are any, they can always manually change pg_config.h ...)

            regards, tom lane


Re: master make check fails on Solaris 10

From
Tom Lane
Date:
Victor Wagner <vitus@wagner.pp.ru> writes:
> I'm attaching stripped-down version of test program, which demonstrate
> the problem and two assembler listings produced with this C source using
> alignment 8 and 16. May be this stripped-down version can be used as 
> base for configure test.

Ah, thanks, this will be easier to work with.

> Unfortunately, we have only one Sparc machine and started our
> investigation by upgrading GCC 5.2.0 to GCC 5.5.0, so it is hard to
> downgrade and test with older GCC.

OK.  Well, the fact that you get the same failures in PG itself seems
like good evidence that the compiler's behavior hasn't changed here.

            regards, tom lane


Re: master make check fails on Solaris 10

From
Marina Polyakova
Date:
On 17-01-2018 18:07, Tom Lane wrote:
> Marina Polyakova <m.polyakova@postgrespro.ru> writes:
>> investigating the regression diffs, we found out that the error occurs
>> when we pass int128 as not the first argument to the function (perhaps
>> its value is replaced by the value of some address):
>> ...
>> Based on this, we modified the test program (see attached). Here is 
>> its
>> output on Solaris 10 for different alignments requirements for int128
>> (on my machine where make check-world passes everything is OK)
>> (ALIGNOF_PG_INT128_TYPE is 16 on Solaris 10):
> 
> Excellent.  This fails the same way on gcc 5.2.0 and 5.5.0?

As Victor answered in [1]:
> Unfortunately, we have only one Sparc machine and started our
> investigation by upgrading GCC 5.2.0 to GCC 5.5.0, so it is hard to
> downgrade and test with older GCC.

>> Maybe some pass test from int128test2.c can be used to test __int128?
> 
> Yeah, I can work with this.  What I propose to do is use a somewhat
> stripped-down version of this test as an AC_RUN_IFELSE test normally,
> but if cross-compiling, fall back to just seeing if we can link.

Thanks, I'll try to do this.. And Victor attached a stripped-down 
version of this in [1].

> Thanks for investigating!

Thank you! :)

[1] 
https://www.postgresql.org/message-id/20180117181359.3a6cc06c%40fafnir.local.vm

-- 
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: master make check fails on Solaris 10

From
Tom Lane
Date:
Marina Polyakova <m.polyakova@postgrespro.ru> writes:
> On 17-01-2018 18:07, Tom Lane wrote:
>> Yeah, I can work with this.  What I propose to do is use a somewhat
>> stripped-down version of this test as an AC_RUN_IFELSE test normally,
>> but if cross-compiling, fall back to just seeing if we can link.

> Thanks, I'll try to do this.. And Victor attached a stripped-down 
> version of this in [1].

Oh, I was already working on a patch, thanks.

            regards, tom lane


Re: master make check fails on Solaris 10

From
Marina Polyakova
Date:
On 17-01-2018 18:28, Tom Lane wrote:
> BTW, now that you've demonstrated that the bug exists in a current
> gcc release, you should definitely file a bug at
> https://gcc.gnu.org/bugzilla/
> I think you can just give them int128test2.c as-is as a test case.
> 
> Please do that and let me know the PR number --- I think it would be
> good to cite the bug specifically in the comments for our configure 
> code.

Thanks, I'll try to do it.

-- 
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: master make check fails on Solaris 10

From
Tom Lane
Date:
Attached is a draft patch to incorporate Victor's slimmed-down test
into configure.  If you have a chance, could you confirm it does
the right thing on your Sparc machine?

BTW, it would be a good idea to set up a buildfarm member on that
machine, if you care about whether that configuration continues
to work in the future.

            regards, tom lane

diff --git a/config/c-compiler.m4 b/config/c-compiler.m4
index 076656c..cbfccf6 100644
*** a/config/c-compiler.m4
--- b/config/c-compiler.m4
*************** AC_DEFUN([PGAC_TYPE_128BIT_INT],
*** 108,136 ****
  [AC_CACHE_CHECK([for __int128], [pgac_cv__128bit_int],
  [AC_LINK_IFELSE([AC_LANG_PROGRAM([
  /*
   * These are globals to discourage the compiler from folding all the
   * arithmetic tests down to compile-time constants.  We do not have
!  * convenient support for 64bit literals at this point...
   */
  __int128 a = 48828125;
! __int128 b = 97656255;
  ],[
  __int128 c,d;
  a = (a << 12) + 1; /* 200000000001 */
  b = (b << 12) + 5; /* 400000000005 */
! /* use the most relevant arithmetic ops */
  c = a * b;
  d = (c + b) / b;
! /* return different values, to prevent optimizations */
  if (d != a+1)
!   return 0;
! return 1;
  ])],
  [pgac_cv__128bit_int=yes],
  [pgac_cv__128bit_int=no])])
  if test x"$pgac_cv__128bit_int" = xyes ; then
!   AC_DEFINE(PG_INT128_TYPE, __int128, [Define to the name of a signed 128-bit integer type.])
!   AC_CHECK_ALIGNOF(PG_INT128_TYPE)
  fi])# PGAC_TYPE_128BIT_INT


--- 108,166 ----
  [AC_CACHE_CHECK([for __int128], [pgac_cv__128bit_int],
  [AC_LINK_IFELSE([AC_LANG_PROGRAM([
  /*
+  * We don't actually run this test, just link it to verify that any support
+  * functions needed for __int128 are present.
+  *
   * These are globals to discourage the compiler from folding all the
   * arithmetic tests down to compile-time constants.  We do not have
!  * convenient support for 128bit literals at this point...
   */
  __int128 a = 48828125;
! __int128 b = 97656250;
  ],[
  __int128 c,d;
  a = (a << 12) + 1; /* 200000000001 */
  b = (b << 12) + 5; /* 400000000005 */
! /* try the most relevant arithmetic ops */
  c = a * b;
  d = (c + b) / b;
! /* must use the results, else compiler may optimize arithmetic away */
  if (d != a+1)
!   return 1;
  ])],
  [pgac_cv__128bit_int=yes],
  [pgac_cv__128bit_int=no])])
  if test x"$pgac_cv__128bit_int" = xyes ; then
!   # Some versions of gcc have problems passing __int128 function arguments
!   # when using non-default alignment.  Test that, if not cross-compiling.
!   AC_CACHE_CHECK([for __int128 alignment bug], [pgac_cv__128bit_int_bug],
!   [AC_RUN_IFELSE([AC_LANG_PROGRAM([
! /* This must match the corresponding code in c.h: */
! #if defined(__GNUC__) || defined(__SUNPRO_C) || defined(__IBMC__)
! #define pg_attribute_aligned(a) __attribute__((aligned(a)))
! #endif
! typedef __int128 int128a
! #if defined(pg_attribute_aligned)
! pg_attribute_aligned(8)
! #endif
! ;
! int128a holder;
! void pass_by_val(void *buffer, int128a par) { holder = par; }
! ],[
! long int i64 = 97656225L << 12;
! int128a q;
! pass_by_val(main, (int128a) i64);
! q = (int128a) i64;
! if (q != holder)
!   return 1;
! ])],
!   [pgac_cv__128bit_int_bug=ok],
!   [pgac_cv__128bit_int_bug=broken],
!   [pgac_cv__128bit_int_bug=assuming-ok])])
!   if test x"$pgac_cv__128bit_int_bug" != xbroken ; then
!     AC_DEFINE(PG_INT128_TYPE, __int128, [Define to the name of a signed 128-bit integer type.])
!     AC_CHECK_ALIGNOF(PG_INT128_TYPE)
!   fi
  fi])# PGAC_TYPE_128BIT_INT


diff --git a/configure b/configure
index 45221e1..6eaed45 100755
*** a/configure
--- b/configure
*************** else
*** 14996,15007 ****
  /* end confdefs.h.  */

  /*
   * These are globals to discourage the compiler from folding all the
   * arithmetic tests down to compile-time constants.  We do not have
!  * convenient support for 64bit literals at this point...
   */
  __int128 a = 48828125;
! __int128 b = 97656255;

  int
  main ()
--- 14996,15010 ----
  /* end confdefs.h.  */

  /*
+  * We don't actually run this test, just link it to verify that any support
+  * functions needed for __int128 are present.
+  *
   * These are globals to discourage the compiler from folding all the
   * arithmetic tests down to compile-time constants.  We do not have
!  * convenient support for 128bit literals at this point...
   */
  __int128 a = 48828125;
! __int128 b = 97656250;

  int
  main ()
*************** main ()
*** 15010,15022 ****
  __int128 c,d;
  a = (a << 12) + 1; /* 200000000001 */
  b = (b << 12) + 5; /* 400000000005 */
! /* use the most relevant arithmetic ops */
  c = a * b;
  d = (c + b) / b;
! /* return different values, to prevent optimizations */
  if (d != a+1)
!   return 0;
! return 1;

    ;
    return 0;
--- 15013,15024 ----
  __int128 c,d;
  a = (a << 12) + 1; /* 200000000001 */
  b = (b << 12) + 5; /* 400000000005 */
! /* try the most relevant arithmetic ops */
  c = a * b;
  d = (c + b) / b;
! /* must use the results, else compiler may optimize arithmetic away */
  if (d != a+1)
!   return 1;

    ;
    return 0;
*************** fi
*** 15033,15042 ****
  { $as_echo "$as_me:${as_lineno-$LINENO}: result: $pgac_cv__128bit_int" >&5
  $as_echo "$pgac_cv__128bit_int" >&6; }
  if test x"$pgac_cv__128bit_int" = xyes ; then

  $as_echo "#define PG_INT128_TYPE __int128" >>confdefs.h

!   # The cast to long int works around a bug in the HP C Compiler,
  # see AC_CHECK_SIZEOF for more information.
  { $as_echo "$as_me:${as_lineno-$LINENO}: checking alignment of PG_INT128_TYPE" >&5
  $as_echo_n "checking alignment of PG_INT128_TYPE... " >&6; }
--- 15035,15097 ----
  { $as_echo "$as_me:${as_lineno-$LINENO}: result: $pgac_cv__128bit_int" >&5
  $as_echo "$pgac_cv__128bit_int" >&6; }
  if test x"$pgac_cv__128bit_int" = xyes ; then
+   # Some versions of gcc have problems passing __int128 function arguments
+   # when using non-default alignment.  Test that, if not cross-compiling.
+   { $as_echo "$as_me:${as_lineno-$LINENO}: checking for __int128 alignment bug" >&5
+ $as_echo_n "checking for __int128 alignment bug... " >&6; }
+ if ${pgac_cv__128bit_int_bug+:} false; then :
+   $as_echo_n "(cached) " >&6
+ else
+   if test "$cross_compiling" = yes; then :
+   pgac_cv__128bit_int_bug=assuming-ok
+ else
+   cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+ /* end confdefs.h.  */
+
+ /* This must match the corresponding code in c.h: */
+ #if defined(__GNUC__) || defined(__SUNPRO_C) || defined(__IBMC__)
+ #define pg_attribute_aligned(a) __attribute__((aligned(a)))
+ #endif
+ typedef __int128 int128a
+ #if defined(pg_attribute_aligned)
+ pg_attribute_aligned(8)
+ #endif
+ ;
+ int128a holder;
+ void pass_by_val(void *buffer, int128a par) { holder = par; }
+
+ int
+ main ()
+ {
+
+ long int i64 = 97656225L << 12;
+ int128a q;
+ pass_by_val(main, (int128a) i64);
+ q = (int128a) i64;
+ if (q != holder)
+   return 1;
+
+   ;
+   return 0;
+ }
+ _ACEOF
+ if ac_fn_c_try_run "$LINENO"; then :
+   pgac_cv__128bit_int_bug=ok
+ else
+   pgac_cv__128bit_int_bug=broken
+ fi
+ rm -f core *.core core.conftest.* gmon.out bb.out conftest$ac_exeext \
+   conftest.$ac_objext conftest.beam conftest.$ac_ext
+ fi
+
+ fi
+ { $as_echo "$as_me:${as_lineno-$LINENO}: result: $pgac_cv__128bit_int_bug" >&5
+ $as_echo "$pgac_cv__128bit_int_bug" >&6; }
+   if test x"$pgac_cv__128bit_int_bug" != xbroken ; then

  $as_echo "#define PG_INT128_TYPE __int128" >>confdefs.h

!     # The cast to long int works around a bug in the HP C Compiler,
  # see AC_CHECK_SIZEOF for more information.
  { $as_echo "$as_me:${as_lineno-$LINENO}: checking alignment of PG_INT128_TYPE" >&5
  $as_echo_n "checking alignment of PG_INT128_TYPE... " >&6; }
*************** cat >>confdefs.h <<_ACEOF
*** 15071,15076 ****
--- 15126,15132 ----
  _ACEOF


+   fi
  fi

  # Check for various atomic operations now that we have checked how to declare

Re: master make check fails on Solaris 10

From
Victor Wagner
Date:
В Wed, 17 Jan 2018 11:33:09 -0500
Tom Lane <tgl@sss.pgh.pa.us> пишет:

> Attached is a draft patch to incorporate Victor's slimmed-down test
> into configure.  If you have a chance, could you confirm it does
> the right thing on your Sparc machine?
>
Definitely. As soon as next work day begins in Moscow.

> BTW, it would be a good idea to set up a buildfarm member on that
> machine, if you care about whether that configuration continues
> to work in the future.

Really we already have buildfarm member on this machine. It is just
member of PostgresPro private buildfarm, not of big comminity
buildfarm.

So, I'll register it in the big buildfarm as soon as I figure out how
to distribute limited resources of this machine between two buildfarms.



--
                                   Victor Wagner <vitus@wagner.pp.ru>


Re: master make check fails on Solaris 10

From
Victor Wagner
Date:
On Wed, 17 Jan 2018 11:33:09 -0500
Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Attached is a draft patch to incorporate Victor's slimmed-down test
> into configure.  If you have a chance, could you confirm it does
> the right thing on your Sparc machine?

It seems that what it does is not exactly a right thing.
I've applied it to commit 9c7d06d60680 in master and see following

$ ./configure CC="gcc -m64"
[skip]
checking for __int128... yes
checking for __int128 alignment bug... ok
checking alignment of PG_INT128_TYPE... 16
checking for builtin __sync char locking functions... yes
[skip]

As far as I understand your patch, there should be:

checking for __int128 alignment bug... broken

Then in the pg_config.h I see


/* The normal alignment of `PG_INT128_TYPE', in bytes. */
#define ALIGNOF_PG_INT128_TYPE 16

/* Define to the name of a signed 128-bit integer type. */
#define PG_INT128_TYPE __int128

However, make check passes. 

There are two things which puzzle me
1. Why test program doesn't detect bug.
If I cut'n'paste it from configure, compile with flags, cut'n'pasted
from config log and run, it returns 1. But configure tells that all is
ok
2. If bug exist and is not detected by configure why make check passes.

We, Marina and I would continue investigation.




Re: master make check fails on Solaris 10

From
Tom Lane
Date:
Victor Wagner <vitus@wagner.pp.ru> writes:
> Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Attached is a draft patch to incorporate Victor's slimmed-down test
>> into configure.  If you have a chance, could you confirm it does
>> the right thing on your Sparc machine?

> It seems that what it does is not exactly a right thing.
> I've applied it to commit 9c7d06d60680 in master and see following
> checking for __int128 alignment bug... ok
> As far as I understand your patch, there should be:
> checking for __int128 alignment bug... broken

Yes, that's what I expected to happen.

> Then in the pg_config.h I see
> /* The normal alignment of `PG_INT128_TYPE', in bytes. */
> #define ALIGNOF_PG_INT128_TYPE 16
> /* Define to the name of a signed 128-bit integer type. */
> #define PG_INT128_TYPE __int128

That's what I'd expect if configure thinks all's well :-(

> However, make check passes. 

Uh ... how could that be?  If the output of configure is exactly
the same as before the patch, how could the end result be different?

> We, Marina and I would continue investigation.

I look forward to some results ... but I'm going to bed now ...

            regards, tom lane


Re: master make check fails on Solaris 10

From
Marina Polyakova
Date:
Thank you!

On 17-01-2018 19:33, Tom Lane wrote:
> Attached is a draft patch to incorporate Victor's slimmed-down test
> into configure.  If you have a chance, could you confirm it does
> the right thing on your Sparc machine?

> Victor Wagner <vitus(at)wagner(dot)pp(dot)ru> writes:
>> It seems that what it does is not exactly a right thing.
>> I've applied it to commit 9c7d06d60680 in master and see following
>> checking for __int128 alignment bug... ok
>> As far as I understand your patch, there should be:
>> checking for __int128 alignment bug... broken
> ...
>> Then in the pg_config.h I see
>> /* The normal alignment of `PG_INT128_TYPE', in bytes. */
>> #define ALIGNOF_PG_INT128_TYPE 16
>> /* Define to the name of a signed 128-bit integer type. */
>> #define PG_INT128_TYPE __int128
> ...
>> However, make check passes.
> Uh ... how could that be?  If the output of configure is exactly
> the same as before the patch, how could the end result be different?

Applying your patch on commit f033462d8f77c40b7d6b33c5116e50118fb4699d 
and using the configuration command from [1], I got:
checking for __int128... yes
checking for __int128 alignment bug... broken

Nothing is defined for int128 in pg_config.h:
/* The normal alignment of `PG_INT128_TYPE', in bytes. */
/* #undef ALIGNOF_PG_INT128_TYPE */
...
/* Define to the name of a signed 128-bit integer type. */
/* #undef PG_INT128_TYPE */

And make check-world passes. Victor said that he used a much simpler 
configuration command, and I'm trying to figure out what's changed..

> BTW, it would be a good idea to set up a buildfarm member on that
> machine, if you care about whether that configuration continues
> to work in the future.

Victor answered this in [2]:
> Really we already have buildfarm member on this machine. It is just
> member of PostgresPro private buildfarm, not of big comminity
> buildfarm.
> 
> So, I'll register it in the big buildfarm as soon as I figure out how
> to distribute limited resources of this machine between two buildfarms.

P.S. I found the trailing whitespace in line 80:
! int128a q;

[1] 
https://www.postgresql.org/message-id/0d3a9fa264cebe1cb9966f37b7c06e86%40postgrespro.ru
[2] 
https://www.postgresql.org/message-id/20180117203648.2626d97a%40wagner.wagner.home

-- 
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: master make check fails on Solaris 10

From
Marina Polyakova
Date:
On 17-01-2018 18:50, Marina Polyakova wrote:
> On 17-01-2018 18:28, Tom Lane wrote:
>> BTW, now that you've demonstrated that the bug exists in a current
>> gcc release, you should definitely file a bug at
>> https://gcc.gnu.org/bugzilla/
>> I think you can just give them int128test2.c as-is as a test case.
>> 
>> Please do that and let me know the PR number --- I think it would be
>> good to cite the bug specifically in the comments for our configure 
>> code.
> 
> Thanks, I'll try to do it.

If I understand correctly, its PR number is 83925 (see [1]).

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83925

-- 
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: master make check fails on Solaris 10

From
Victor Wagner
Date:
On Thu, 18 Jan 2018 01:47:46 -0500
Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Victor Wagner <vitus@wagner.pp.ru> writes:
> > Tom Lane <tgl@sss.pgh.pa.us> wrote:  
> > checking for __int128 alignment bug... ok
> > As far as I understand your patch, there should be:
> > checking for __int128 alignment bug... broken  
> 
> Yes, that's what I expected to happen.

It seems that I've made some unreproducible mistake last night 
applying your patch. Marina repeapplied it later  and everything works.

I'we cherrypicked in as far back as 9.5, and with these versions it
works too.




Re: master make check fails on Solaris 10

From
Tom Lane
Date:
Marina Polyakova <m.polyakova@postgrespro.ru> writes:
> Applying your patch on commit f033462d8f77c40b7d6b33c5116e50118fb4699d 
> and using the configuration command from [1], I got:
> checking for __int128... yes
> checking for __int128 alignment bug... broken
> ...
> And make check-world passes. Victor said that he used a much simpler 
> configuration command, and I'm trying to figure out what's changed..

Weird.  Maybe the gcc bug only manifests with certain optimization
flags?  That's not what I'd have expected from Victor's theory about
why the code is wrong, but if it only shows up some of the time,
it's hard to think of another explanation.

> P.S. I found the trailing whitespace in line 80:
> ! int128a q;

Ah, thanks.  Probably git would've whined about that when I went
to commit, but it's good to catch sooner.

            regards, tom lane


Re: master make check fails on Solaris 10

From
Victor Wagner
Date:
On Thu, 18 Jan 2018 09:56:48 -0500
Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Marina Polyakova <m.polyakova@postgrespro.ru> writes:
> > Applying your patch on commit
> > f033462d8f77c40b7d6b33c5116e50118fb4699d and using the
> > configuration command from [1], I got: checking for __int128... yes
> > checking for __int128 alignment bug... broken
> > ...
> > And make check-world passes. Victor said that he used a much
> > simpler configuration command, and I'm trying to figure out what's
> > changed..
>
> Weird.  Maybe the gcc bug only manifests with certain optimization
> flags?  That's not what I'd have expected from Victor's theory about

No. I've compiled test program without any optimizationf flags.
Just -m64, which tells compiler to generate 64-bit code.
(in 32-bit mode there is no __int128, so problem wouldn't manifest
inself).

From the other side, when I've tried to resolve issue with not worked
test, I've copied all gcc flags from config.log, and test program
returned 1 with exactly same flags.

Probably, I should have to regenerate configure with autoconf. instead
of applying patch to configure.

--



Re: master make check fails on Solaris 10

From
Tom Lane
Date:
Victor Wagner <vitus@wagner.pp.ru> writes:
> It seems that I've made some unreproducible mistake last night 
> applying your patch. Marina repeapplied it later  and everything works.

Ah, thanks for following up.  I'll adjust the comment to include the
gcc PR and push it shortly.

            regards, tom lane


Re: master make check fails on Solaris 10

From
Marina Polyakova
Date:
On 18-01-2018 17:56, Tom Lane wrote:
> Marina Polyakova <m.polyakova@postgrespro.ru> writes:
>> Applying your patch on commit f033462d8f77c40b7d6b33c5116e50118fb4699d
>> and using the configuration command from [1], I got:
>> checking for __int128... yes
>> checking for __int128 alignment bug... broken
>> ...
>> And make check-world passes. Victor said that he used a much simpler
>> configuration command, and I'm trying to figure out what's changed..
> 
> Weird.  Maybe the gcc bug only manifests with certain optimization
> flags?  That's not what I'd have expected from Victor's theory about
> why the code is wrong, but if it only shows up some of the time,
> it's hard to think of another explanation.

Thank you! Using ./configure CC="gcc" CFLAGS="-m64 -O1" on commit 
9c7d06d60680 with your patch, I got this:
checking for __int128... yes
checking for __int128 alignment bug... ok
checking alignment of PG_INT128_TYPE... 16

In pg_config.h:
/* The normal alignment of `PG_INT128_TYPE', in bytes. */
#define ALIGNOF_PG_INT128_TYPE 16
...
/* Define to the name of a signed 128-bit integer type. */
#define PG_INT128_TYPE __int128

But make check got the same failures, and I see the same debug output as 
in [1]..

P.S. As I understand it, this comment on bugzilla [2] is also about 
this.

[1] 
https://www.postgresql.org/message-id/90ab676392c8f9c84431976147097cf0%40postgrespro.ru
[2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83925#c6

-- 
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: master make check fails on Solaris 10

From
Tom Lane
Date:
Marina Polyakova <m.polyakova@postgrespro.ru> writes:
> On 18-01-2018 17:56, Tom Lane wrote:
>> Weird.  Maybe the gcc bug only manifests with certain optimization
>> flags?  That's not what I'd have expected from Victor's theory about
>> why the code is wrong, but if it only shows up some of the time,
>> it's hard to think of another explanation.

> Thank you! Using ./configure CC="gcc" CFLAGS="-m64 -O1" on commit
> 9c7d06d60680 with your patch, I got this:
> [ configure check passes ]
> But make check got the same failures, and I see the same debug output as
> in [1]..

Interesting.  Maybe the parameter-passing misbehavior that Victor's
test is looking for isn't the only associated bug.

> P.S. As I understand it, this comment on bugzilla [2] is also about
> this.
> [2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83925#c6

Even more interesting, see c7 that was just posted there:

>> Eric Botcazou 2018-01-18 16:22:48 UTC
>> 128-bit types requite 128-bit alignment on SPARC64 so we cannot support that.

So basically, we're outta luck and we have to consider __int128 as
unsupportable on SPARC.  I'm inclined to mechanize that as a test on
$host_cpu.  At least that means we don't need an AC_RUN test ;-)

            regards, tom lane


Re: master make check fails on Solaris 10

From
Marina Polyakova
Date:
On 18-01-2018 19:53, Tom Lane wrote:
> Marina Polyakova <m.polyakova@postgrespro.ru> writes:
>> On 18-01-2018 17:56, Tom Lane wrote:
>>> Weird.  Maybe the gcc bug only manifests with certain optimization
>>> flags?  That's not what I'd have expected from Victor's theory about
>>> why the code is wrong, but if it only shows up some of the time,
>>> it's hard to think of another explanation.
> 
>> Thank you! Using ./configure CC="gcc" CFLAGS="-m64 -O1" on commit
>> 9c7d06d60680 with your patch, I got this:
>> [ configure check passes ]
>> But make check got the same failures, and I see the same debug output 
>> as
>> in [1]..
> 
> Interesting.  Maybe the parameter-passing misbehavior that Victor's
> test is looking for isn't the only associated bug.
> 
>> P.S. As I understand it, this comment on bugzilla [2] is also about
>> this.
>> [2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83925#c6
> 
> Even more interesting, see c7 that was just posted there:
> 
>>> Eric Botcazou 2018-01-18 16:22:48 UTC
>>> 128-bit types requite 128-bit alignment on SPARC64 so we cannot 
>>> support that.
> 
> So basically, we're outta luck and we have to consider __int128 as
> unsupportable on SPARC.  I'm inclined to mechanize that as a test on
> $host_cpu.  At least that means we don't need an AC_RUN test ;-)

%-)) :-)
Can I do something else about this problem?..

-- 
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: master make check fails on Solaris 10

From
Tom Lane
Date:
Marina Polyakova <m.polyakova@postgrespro.ru> writes:
> On 18-01-2018 19:53, Tom Lane wrote:
>> So basically, we're outta luck and we have to consider __int128 as
>> unsupportable on SPARC.  I'm inclined to mechanize that as a test on
>> $host_cpu.  At least that means we don't need an AC_RUN test ;-)

> %-)) :-)
> Can I do something else about this problem?..

I don't see any other workable alternative.  The box we're in as far
as the interaction with MAXALIGN goes is still the same as it was
a month ago: raising MAXALIGN is impractical, and so is allowing
some datatypes to have more-than-MAXALIGN alignment specs.

I suppose you could imagine declaring int128s that are in any sort
of palloc'd storage as, in effect, char[16], and always memcpy'ing
to and from local variables that're declared int128 whenever you
want to do arithmetic with them.  But ugh.  I can't see taking that
sort of notational and performance hit for just one non-mainstream
architecture.

Really, this is something that the compiler ought to do for us, IMO.
If the gcc guys don't want to be bothered, OK, but that tells you more
about the priority they place on SPARC support than anything else.

            regards, tom lane


Re: master make check fails on Solaris 10

From
Tom Lane
Date:
I wrote:
> Marina Polyakova <m.polyakova@postgrespro.ru> writes:
>> On 18-01-2018 19:53, Tom Lane wrote:
>>> So basically, we're outta luck and we have to consider __int128 as
>>> unsupportable on SPARC.  I'm inclined to mechanize that as a test on
>>> $host_cpu.  At least that means we don't need an AC_RUN test ;-)

>> %-)) :-)
>> Can I do something else about this problem?..

> I don't see any other workable alternative.

But ... let's not panic, but wait and see the final result of the
discussion on the gcc PR.  Jakub at least seems to think it ought
to be a supportable case.

What you could do in the meantime is work on finding a variation of
Victor's test that will detect the bug regardless of -O level.
If we do have hope that future gcc versions will handle this correctly,
we'll need a better test rather than just summarily dismissing
host_cpu = sparc.

            regards, tom lane


Re: master make check fails on Solaris 10

From
Marina Polyakova
Date:
On 18-01-2018 20:24, Tom Lane wrote:
> Marina Polyakova <m.polyakova@postgrespro.ru> writes:
>> On 18-01-2018 19:53, Tom Lane wrote:
>>> So basically, we're outta luck and we have to consider __int128 as
>>> unsupportable on SPARC.  I'm inclined to mechanize that as a test on
>>> $host_cpu.  At least that means we don't need an AC_RUN test ;-)
> 
>> %-)) :-)
>> Can I do something else about this problem?..
> 
> I don't see any other workable alternative.  The box we're in as far
> as the interaction with MAXALIGN goes is still the same as it was
> a month ago: raising MAXALIGN is impractical, and so is allowing
> some datatypes to have more-than-MAXALIGN alignment specs.
> 
> I suppose you could imagine declaring int128s that are in any sort
> of palloc'd storage as, in effect, char[16], and always memcpy'ing
> to and from local variables that're declared int128 whenever you
> want to do arithmetic with them.  But ugh.  I can't see taking that
> sort of notational and performance hit for just one non-mainstream
> architecture.
> 
> Really, this is something that the compiler ought to do for us, IMO.
> If the gcc guys don't want to be bothered, OK, but that tells you more
> about the priority they place on SPARC support than anything else.

Thank you very much for your explanations!
So I'll go to all of your comments to my patch about stable functions 
when the next work day begins in Moscow)

-- 
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: master make check fails on Solaris 10

From
Robert Haas
Date:
On Thu, Jan 18, 2018 at 12:24 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Marina Polyakova <m.polyakova@postgrespro.ru> writes:
>> On 18-01-2018 19:53, Tom Lane wrote:
>>> So basically, we're outta luck and we have to consider __int128 as
>>> unsupportable on SPARC.  I'm inclined to mechanize that as a test on
>>> $host_cpu.  At least that means we don't need an AC_RUN test ;-)
>
>> %-)) :-)
>> Can I do something else about this problem?..
>
> I don't see any other workable alternative.  The box we're in as far
> as the interaction with MAXALIGN goes is still the same as it was
> a month ago: raising MAXALIGN is impractical, and so is allowing
> some datatypes to have more-than-MAXALIGN alignment specs.
>
> I suppose you could imagine declaring int128s that are in any sort
> of palloc'd storage as, in effect, char[16], and always memcpy'ing
> to and from local variables that're declared int128 whenever you
> want to do arithmetic with them.  But ugh.  I can't see taking that
> sort of notational and performance hit for just one non-mainstream
> architecture.

It's not like we'd have to take the performance hit everywhere; we
could do the expensive things only on platforms that need them.  The
trick would be to avoid too much notation.  But it's not like we don't
live with a lot of DatumGetThing and ThingGetDatum notation already.

> Really, this is something that the compiler ought to do for us, IMO.
> If the gcc guys don't want to be bothered, OK, but that tells you more
> about the priority they place on SPARC support than anything else.

Of course, the same accusation could be leveled at us.  We don't
require int128 support for correctness; we just use it for performance
where it's available and works the way we want.  Prolly, that means
mainstream platforms.  If we wanted to work harder, we could get it
working in other places too.  Or some other fix that delivers much of
the same performance benefit.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: master make check fails on Solaris 10

From
Marina Polyakova
Date:
On 18-01-2018 20:34, Tom Lane wrote:
> I wrote:
> ...
> But ... let's not panic, but wait and see the final result of the
> discussion on the gcc PR.  Jakub at least seems to think it ought
> to be a supportable case.
> 
> What you could do in the meantime is work on finding a variation of
> Victor's test that will detect the bug regardless of -O level.
> If we do have hope that future gcc versions will handle this correctly,
> we'll need a better test rather than just summarily dismissing
> host_cpu = sparc.

Thanks, I'll try..

-- 
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: master make check fails on Solaris 10

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, Jan 18, 2018 at 12:24 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Really, this is something that the compiler ought to do for us, IMO.
>> If the gcc guys don't want to be bothered, OK, but that tells you more
>> about the priority they place on SPARC support than anything else.

> Of course, the same accusation could be leveled at us.  We don't
> require int128 support for correctness; we just use it for performance
> where it's available and works the way we want.  Prolly, that means
> mainstream platforms.  If we wanted to work harder, we could get it
> working in other places too.  Or some other fix that delivers much of
> the same performance benefit.

Sure.  Part of the equation here is that (IMO anyway) int128 isn't
sufficiently performance-critical to us to justify putting enormous
amounts of work into trying to make it go on non-mainstream platforms.
It's possible that that could change in future ... but if part of the
cost is notational changes that make it harder and more bug-prone
to use int128 at all, then I daresay int128 will never become that
performance-critical, because it would always remain a niche thing.

            regards, tom lane


Re: master make check fails on Solaris 10

From
Robert Haas
Date:
On Thu, Jan 18, 2018 at 1:03 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Sure.  Part of the equation here is that (IMO anyway) int128 isn't
> sufficiently performance-critical to us to justify putting enormous
> amounts of work into trying to make it go on non-mainstream platforms.
> It's possible that that could change in future ... but if part of the
> cost is notational changes that make it harder and more bug-prone
> to use int128 at all, then I daresay int128 will never become that
> performance-critical, because it would always remain a niche thing.

That's possible.  On the other hand, we lived for many years with
painful workarounds for systems without working 64-bit integers, and
those eventually became mainstream enough that we made them mandatory
- and then ripped out some of the notational changes that we'd
introduced to cope with platforms that didn't support them.  So, the
same thing might happen here, whatever we decide about this.  Then
again, 64 bit counters are already so large that it's hard to imagine
ever having one overflow, so perhaps 128-bit values will never catch
on in quite the same way.  On the third hand, 640kB ought to be enough
for anybody.

Anyway, that's really an academic debate.  My real point is: I do not
think we should reject out of hand the idea that a patch introducing
some new notation to deal with this might be acceptable.  I am not
volunteering to write such a patch, and anyone who tries should be
aware that there is a chance that it will be rejected on grounds of
ugliness.  However, if they decide to try anyway, we should read the
patch and see how ugly it really is.  Maybe it's not that bad.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: master make check fails on Solaris 10

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> Anyway, that's really an academic debate.  My real point is: I do not
> think we should reject out of hand the idea that a patch introducing
> some new notation to deal with this might be acceptable.  I am not
> volunteering to write such a patch, and anyone who tries should be
> aware that there is a chance that it will be rejected on grounds of
> ugliness.  However, if they decide to try anyway, we should read the
> patch and see how ugly it really is.  Maybe it's not that bad.

Sure.  I'm not intending to write such a patch either.

            regards, tom lane


Re: master make check fails on Solaris 10

From
Marina Polyakova
Date:
On 18-01-2018 20:49, Marina Polyakova wrote:
> On 18-01-2018 20:34, Tom Lane wrote:
>> ...
>> What you could do in the meantime is work on finding a variation of
>> Victor's test that will detect the bug regardless of -O level.
>> If we do have hope that future gcc versions will handle this 
>> correctly,
>> we'll need a better test rather than just summarily dismissing
>> host_cpu = sparc.
> 
> Thanks, I'll try..

I tried different options of gcc but it did not help..
Perhaps searching in the source code of gcc will clarify something, but 
I'm sorry that I'm now too busy for this..

-- 
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company