Thread: Crash bug in 8.2.3 on Solaris 10/Sparc

Crash bug in 8.2.3 on Solaris 10/Sparc

From
Zoltan Boszormenyi
Date:
Hi,

we have found that psql in PostgreSQL 8.2.3
has problems connecting to the server
running on Solaris 10/Sun SPARC.

$ uname -a
SunOS dev-machine 5.10 Generic_118833-36 sun4u sparc SUNW,Sun-Fire-V440

It seems that somehow the system provided
GCC 3.4.3 miscompiles timestamptz_send()
and it segfaults. The default function looks like this:

Datum
timestamptz_send(PG_FUNCTION_ARGS)
{       TimestampTz timestamp = PG_GETARG_TIMESTAMPTZ(0);       StringInfoData buf;
       pq_begintypsend(&buf);
#ifdef HAVE_INT64_TIMESTAMP       pq_sendint64(&buf, timestamp);
#else       pq_sendfloat8(&buf, timestamp);
#endif       PG_RETURN_BYTEA_P(pq_endtypsend(&buf));
}

GDB indicates crash at the last line.
No matter how I unrolled the function calls,
the indicated crasher line was always the one
before:
       pq_sendfloat8(&buf, timestamp);

I must be a stack corruption somehow.
I also unrolled pq_sendfloat8() so the function looks like this:

Datum
timestamptz_send(PG_FUNCTION_ARGS)
{       TimestampTz timestamp = PG_GETARG_TIMESTAMPTZ(0);       StringInfoData buf;       bytea   *byteap;       union
    {               float8  f;               int64   i;       }               swap;       uint32  n32;
pq_begintypsend(&buf);
#ifdef HAVE_INT64_TIMESTAMP       pq_sendint64(&buf, timestamp);       elog(NOTICE, "timestamptz_send()
HAVE_INT64_TIMESTAMPafter
 
pq_sendint64");
#else       swap.f = (float8)timestamp;       elog(NOTICE, "timestamptz_send() int64: %lld", swap.i);       /* High
orderhalf first, since we're doing MSB-first */
 
#ifdef INT64_IS_BUSTED       /* don't try a right shift of 32 on a 32-bit word */       n32 = (swap.i < 0) ? -1 : 0;
  elog(NOTICE, "timestamptz_send() INT64_IS_BUSTED high 32: %d", n32);
 
#else       n32 = (uint32) (swap.i >> 32);       elog(NOTICE, "timestamptz_send() high 32: %d", n32);
#endif       n32 = htonl(n32);       elog(NOTICE, "timestamptz_send() htonl high 32: %d", n32);
appendBinaryStringInfo(&buf,(char *) &n32, 4);
 
       /* Now the low order half */       n32 = (uint32) swap.i;       elog(NOTICE, "timestamptz_send() low 32: %d",
n32);      n32 = htonl(n32);       elog(NOTICE, "timestamptz_send() htonl low 32: %d", n32);
appendBinaryStringInfo(&buf,(char *) &n32, 4);
 
       elog(NOTICE, "timestamptz_send() pq_sendfloat8");
#endif       byteap = (bytea *) buf.data;       elog(NOTICE, "timestamptz_send() buf->data = %p", byteap);
Assert(buf.len>= VARHDRSZ);       VARATT_SIZEP(byteap) = buf.len;       PG_RETURN_BYTEA_P(byteap);
 
}

Th crashing line according to GDB is now the elog() call after:
       swap.f = (float8)timestamp;

This is a simple explicit type cast which shouldn't cause problems,
however it is the one that somehow corrupts something on the stack
and causes the segfault upon entering the function at the next
statement.

As a workaround, we recompiled PostgreSQL 8.2.3 with--enable-integer-datetimes
and the client can connect to the server now, after initdb.

I tried to exercise calling timestamptz_send() but creating a table
with float8 field, INSERTing and SELECTing works, too.
Both textual and binary COPY FROM and COPY TO work, too.
Either these exercises didn't call pq_sendfloat8() or it
doesn't cause problems elsewhere, only in timestamptz_send().


-- 
----------------------------------
Zoltán Böszörményi
Cybertec Geschwinde & Schönig GmbH
http://www.postgresql.at/





Re: Crash bug in 8.2.3 on Solaris 10/Sparc

From
Zdenek Kotala
Date:
Zoltan Boszormenyi wrote:
> Hi,
> 
> we have found that psql in PostgreSQL 8.2.3
> has problems connecting to the server
> running on Solaris 10/Sun SPARC.
> 
> $ uname -a
> SunOS dev-machine 5.10 Generic_118833-36 sun4u sparc SUNW,Sun-Fire-V440
> 
> It seems that somehow the system provided
> GCC 3.4.3 miscompiles timestamptz_send()
> and it segfaults. The default function looks like this:
> 

Can you send me how you compiled Postgres (configure switches, LDFLAGS 
...) and is possible get core file?

Did you try compile with different optimalization flags or did you try 
sun studio compiler?
Zdenek


Re: Crash bug in 8.2.3 on Solaris 10/Sparc

From
Tom Lane
Date:
Zoltan Boszormenyi <zb@cybertec.at> writes:
> we have found that psql in PostgreSQL 8.2.3
> has problems connecting to the server
> running on Solaris 10/Sun SPARC.
> ...
> It seems that somehow the system provided
> GCC 3.4.3 miscompiles timestamptz_send()
> and it segfaults.

I find it fairly hard to believe that timestamptz_send would be invoked
at all while using psql, much less during initial connection.  psql
doesn't do any binary-output requests.
        regards, tom lane


Re: Crash bug in 8.2.3 on Solaris 10/Sparc

From
Zoltan Boszormenyi
Date:
Zdenek Kotala írta:
> Zoltan Boszormenyi wrote:
>> Hi,
>>
>> we have found that psql in PostgreSQL 8.2.3
>> has problems connecting to the server
>> running on Solaris 10/Sun SPARC.
>>
>> $ uname -a
>> SunOS dev-machine 5.10 Generic_118833-36 sun4u sparc SUNW,Sun-Fire-V440
>>
>> It seems that somehow the system provided
>> GCC 3.4.3 miscompiles timestamptz_send()
>> and it segfaults. The default function looks like this:
>>
>
> Can you send me how you compiled Postgres (configure switches, LDFLAGS 
> ...) and is possible get core file?

This was the configure line:

./configure --prefix=/export/local/postgresql/postgresql-8.2.3 
--with-includes=/usr/local/include --with-libraries=/usr/local/lib/

I added --enable-debug --enable-depend --enable-cassert
to get sensible gdb report after that.

The problem was that the server had problems
after psql connected with these commands:

$ psql -l -h dev-machine -p 5477 -U user
psql: server closed the connection unexpectedly       This probably means the server terminated abnormally       before
orwhile processing the request.
 
$ psql -h dev-machine -p 5477 -U user template1
psql: server closed the connection unexpectedly       This probably means the server terminated abnormally       before
orwhile processing the request.
 

If the user doesn't have permissions in e.g. pg_hba.conf
then I get the correct permission denied error.
If the user can connect then some statement inside psql
causes segfault in the server.

Compiled with debug info, I got this from gdb on the core file:
$ gdb /.../pgsql/bin/postgres /.../data/core
...
Program terminated with signal 11, Segmentation fault.
#0  0x0021c8a0 in timestamptz_send (fcinfo=0x1) at timestamp.c:461
461             PG_RETURN_BYTEA_P(pq_endtypsend(&buf));
(gdb)

I described my experiments, compiling with --enable-integer-datetimes
fixed the issue.


>
> Did you try compile with different optimalization flags or did you try 
> sun studio compiler?

No, and no. Sun Studio isn't installed, only gcc.

>
>     Zdenek
>

-- 
----------------------------------
Zoltán Böszörményi
Cybertec Geschwinde & Schönig GmbH
http://www.postgresql.at/



Re: Crash bug in 8.2.3 on Solaris 10/Sparc

From
Zoltan Boszormenyi
Date:
Tom Lane írta:
> Zoltan Boszormenyi <zb@cybertec.at> writes:
>   
>> we have found that psql in PostgreSQL 8.2.3
>> has problems connecting to the server
>> running on Solaris 10/Sun SPARC.
>> ...
>> It seems that somehow the system provided
>> GCC 3.4.3 miscompiles timestamptz_send()
>> and it segfaults.
>>     
>
> I find it fairly hard to believe that timestamptz_send would be invoked
> at all while using psql, much less during initial connection.  psql
> doesn't do any binary-output requests.
>
>             regards, tom lane
>   

Then please explain this miracle.
Anyway, your comment makes my suspicion about
the correctness of GCC-3.4.3 on Solaris 10/sparc
more founded now. :-)

-- 
----------------------------------
Zoltán Böszörményi
Cybertec Geschwinde & Schönig GmbH
http://www.postgresql.at/



Re: Crash bug in 8.2.3 on Solaris 10/Sparc

From
Zoltan Boszormenyi
Date:
Zoltan Boszormenyi írta:
> Zdenek Kotala írta:
>> Zoltan Boszormenyi wrote:
>>> Hi,
>>>
>>> we have found that psql in PostgreSQL 8.2.3
>>> has problems connecting to the server
>>> running on Solaris 10/Sun SPARC.
>>>
>>> $ uname -a
>>> SunOS dev-machine 5.10 Generic_118833-36 sun4u sparc SUNW,Sun-Fire-V440
>>>
>>> It seems that somehow the system provided
>>> GCC 3.4.3 miscompiles timestamptz_send()
>>> and it segfaults. The default function looks like this:
>>>
>>
>> Can you send me how you compiled Postgres (configure switches, 
>> LDFLAGS ...) and is possible get core file?
>
> This was the configure line:
>
> ./configure --prefix=/export/local/postgresql/postgresql-8.2.3 
> --with-includes=/usr/local/include --with-libraries=/usr/local/lib/
>
> I added --enable-debug --enable-depend --enable-cassert
> to get sensible gdb report after that.
>
> The problem was that the server had problems
> after psql connected with these commands:
>
> $ psql -l -h dev-machine -p 5477 -U user
> psql: server closed the connection unexpectedly
>        This probably means the server terminated abnormally
>        before or while processing the request.
> $ psql -h dev-machine -p 5477 -U user template1
> psql: server closed the connection unexpectedly
>        This probably means the server terminated abnormally
>        before or while processing the request.
>
> If the user doesn't have permissions in e.g. pg_hba.conf
> then I get the correct permission denied error.
> If the user can connect then some statement inside psql
> causes segfault in the server.
>
> Compiled with debug info, I got this from gdb on the core file:
> $ gdb /.../pgsql/bin/postgres /.../data/core
> ...
> Program terminated with signal 11, Segmentation fault.
> #0  0x0021c8a0 in timestamptz_send (fcinfo=0x1) at timestamp.c:461
> 461             PG_RETURN_BYTEA_P(pq_endtypsend(&buf));
> (gdb)
>
> I described my experiments, compiling with --enable-integer-datetimes
> fixed the issue.

We compiled GCC-4.1.2 on this machine, recompiled PostgreSQL
with the new GCC without --enable-integer-datetimes and it fixed
the problem we experienced. It seems that my suspicion was right:
GCC-3.4.3 on Solaris 10/Sparc is buggy.

-- 
----------------------------------
Zoltán Böszörményi
Cybertec Geschwinde & Schönig GmbH
http://www.postgresql.at/



Re: Crash bug in 8.2.3 on Solaris 10/Sparc

From
Zoltan Boszormenyi
Date:
Zoltan Boszormenyi írta:
> Zoltan Boszormenyi írta:
>> Zdenek Kotala írta:
>>> Zoltan Boszormenyi wrote:
>>>> Hi,
>>>>
>>>> we have found that psql in PostgreSQL 8.2.3
>>>> has problems connecting to the server
>>>> running on Solaris 10/Sun SPARC.
>>>>
>>>> $ uname -a
>>>> SunOS dev-machine 5.10 Generic_118833-36 sun4u sparc 
>>>> SUNW,Sun-Fire-V440
>>>>
>>>> It seems that somehow the system provided
>>>> GCC 3.4.3 miscompiles timestamptz_send()
>>>> and it segfaults. The default function looks like this:
>>>>
>>>
>>> Can you send me how you compiled Postgres (configure switches, 
>>> LDFLAGS ...) and is possible get core file?
>>
>> This was the configure line:
>>
>> ./configure --prefix=/export/local/postgresql/postgresql-8.2.3 
>> --with-includes=/usr/local/include --with-libraries=/usr/local/lib/
>>
>> I added --enable-debug --enable-depend --enable-cassert
>> to get sensible gdb report after that.
>>
>> The problem was that the server had problems
>> after psql connected with these commands:
>>
>> $ psql -l -h dev-machine -p 5477 -U user
>> psql: server closed the connection unexpectedly
>>        This probably means the server terminated abnormally
>>        before or while processing the request.
>> $ psql -h dev-machine -p 5477 -U user template1
>> psql: server closed the connection unexpectedly
>>        This probably means the server terminated abnormally
>>        before or while processing the request.
>>
>> If the user doesn't have permissions in e.g. pg_hba.conf
>> then I get the correct permission denied error.
>> If the user can connect then some statement inside psql
>> causes segfault in the server.
>>
>> Compiled with debug info, I got this from gdb on the core file:
>> $ gdb /.../pgsql/bin/postgres /.../data/core
>> ...
>> Program terminated with signal 11, Segmentation fault.
>> #0  0x0021c8a0 in timestamptz_send (fcinfo=0x1) at timestamp.c:461
>> 461             PG_RETURN_BYTEA_P(pq_endtypsend(&buf));
>> (gdb)
>>
>> I described my experiments, compiling with --enable-integer-datetimes
>> fixed the issue.
>
> We compiled GCC-4.1.2 on this machine, recompiled PostgreSQL
> with the new GCC without --enable-integer-datetimes and it fixed
> the problem we experienced. It seems that my suspicion was right:
> GCC-3.4.3 on Solaris 10/Sparc is buggy.
>

Oh, and the proof that I use the newly compiled version:

$ psql -h reddb-dev-pgr -p 5477 test
Welcome to psql 8.2.3, the PostgreSQL interactive terminal.

Type:  \copyright for distribution terms      \h for help with SQL commands      \? for help with psql commands      \g
orterminate with semicolon to execute query      \q to quit
 

test=# select version();                                 version                                  
----------------------------------------------------------------------------PostgreSQL 8.2.3 on sparc-sun-solaris2.10,
compiledby GCC gcc (GCC) 4.1.2
 
(1 row)

test=# show integer_datetimes;integer_datetimes
-------------------off
(1 row)

-- 
----------------------------------
Zoltán Böszörményi
Cybertec Geschwinde & Schönig GmbH
http://www.postgresql.at/



Re: Crash bug in 8.2.3 on Solaris 10/Sparc

From
Zdenek Kotala
Date:
Zoltan Boszormenyi wrote:

> 
> We compiled GCC-4.1.2 on this machine, recompiled PostgreSQL
> with the new GCC without --enable-integer-datetimes and it fixed
> the problem we experienced. It seems that my suspicion was right:
> GCC-3.4.3 on Solaris 10/Sparc is buggy.
> 

I tried original S10 gcc (3.4.3) on two different machine with different 
kernel update and both work fine. In term of our offlist communication 
and Tom's mention, It looks more as problem in linking/loading. Maybe 
some libraries mismatch. I'm not able say more without core.
    Zdenek