Thread: Crash bug in 8.2.3 on Solaris 10/Sparc
Hi, we have found that psql in PostgreSQL 8.2.3 has problems connecting to the server running on Solaris 10/Sun SPARC. $ uname -a SunOS dev-machine 5.10 Generic_118833-36 sun4u sparc SUNW,Sun-Fire-V440 It seems that somehow the system provided GCC 3.4.3 miscompiles timestamptz_send() and it segfaults. The default function looks like this: Datum timestamptz_send(PG_FUNCTION_ARGS) { TimestampTz timestamp = PG_GETARG_TIMESTAMPTZ(0); StringInfoData buf; pq_begintypsend(&buf); #ifdef HAVE_INT64_TIMESTAMP pq_sendint64(&buf, timestamp); #else pq_sendfloat8(&buf, timestamp); #endif PG_RETURN_BYTEA_P(pq_endtypsend(&buf)); } GDB indicates crash at the last line. No matter how I unrolled the function calls, the indicated crasher line was always the one before: pq_sendfloat8(&buf, timestamp); I must be a stack corruption somehow. I also unrolled pq_sendfloat8() so the function looks like this: Datum timestamptz_send(PG_FUNCTION_ARGS) { TimestampTz timestamp = PG_GETARG_TIMESTAMPTZ(0); StringInfoData buf; bytea *byteap; union { float8 f; int64 i; } swap; uint32 n32; pq_begintypsend(&buf); #ifdef HAVE_INT64_TIMESTAMP pq_sendint64(&buf, timestamp); elog(NOTICE, "timestamptz_send() HAVE_INT64_TIMESTAMPafter pq_sendint64"); #else swap.f = (float8)timestamp; elog(NOTICE, "timestamptz_send() int64: %lld", swap.i); /* High orderhalf first, since we're doing MSB-first */ #ifdef INT64_IS_BUSTED /* don't try a right shift of 32 on a 32-bit word */ n32 = (swap.i < 0) ? -1 : 0; elog(NOTICE, "timestamptz_send() INT64_IS_BUSTED high 32: %d", n32); #else n32 = (uint32) (swap.i >> 32); elog(NOTICE, "timestamptz_send() high 32: %d", n32); #endif n32 = htonl(n32); elog(NOTICE, "timestamptz_send() htonl high 32: %d", n32); appendBinaryStringInfo(&buf,(char *) &n32, 4); /* Now the low order half */ n32 = (uint32) swap.i; elog(NOTICE, "timestamptz_send() low 32: %d", n32); n32 = htonl(n32); elog(NOTICE, "timestamptz_send() htonl low 32: %d", n32); appendBinaryStringInfo(&buf,(char *) &n32, 4); elog(NOTICE, "timestamptz_send() pq_sendfloat8"); #endif byteap = (bytea *) buf.data; elog(NOTICE, "timestamptz_send() buf->data = %p", byteap); Assert(buf.len>= VARHDRSZ); VARATT_SIZEP(byteap) = buf.len; PG_RETURN_BYTEA_P(byteap); } Th crashing line according to GDB is now the elog() call after: swap.f = (float8)timestamp; This is a simple explicit type cast which shouldn't cause problems, however it is the one that somehow corrupts something on the stack and causes the segfault upon entering the function at the next statement. As a workaround, we recompiled PostgreSQL 8.2.3 with--enable-integer-datetimes and the client can connect to the server now, after initdb. I tried to exercise calling timestamptz_send() but creating a table with float8 field, INSERTing and SELECTing works, too. Both textual and binary COPY FROM and COPY TO work, too. Either these exercises didn't call pq_sendfloat8() or it doesn't cause problems elsewhere, only in timestamptz_send(). -- ---------------------------------- Zoltán Böszörményi Cybertec Geschwinde & Schönig GmbH http://www.postgresql.at/
Zoltan Boszormenyi wrote: > Hi, > > we have found that psql in PostgreSQL 8.2.3 > has problems connecting to the server > running on Solaris 10/Sun SPARC. > > $ uname -a > SunOS dev-machine 5.10 Generic_118833-36 sun4u sparc SUNW,Sun-Fire-V440 > > It seems that somehow the system provided > GCC 3.4.3 miscompiles timestamptz_send() > and it segfaults. The default function looks like this: > Can you send me how you compiled Postgres (configure switches, LDFLAGS ...) and is possible get core file? Did you try compile with different optimalization flags or did you try sun studio compiler? Zdenek
Zoltan Boszormenyi <zb@cybertec.at> writes: > we have found that psql in PostgreSQL 8.2.3 > has problems connecting to the server > running on Solaris 10/Sun SPARC. > ... > It seems that somehow the system provided > GCC 3.4.3 miscompiles timestamptz_send() > and it segfaults. I find it fairly hard to believe that timestamptz_send would be invoked at all while using psql, much less during initial connection. psql doesn't do any binary-output requests. regards, tom lane
Zdenek Kotala írta: > Zoltan Boszormenyi wrote: >> Hi, >> >> we have found that psql in PostgreSQL 8.2.3 >> has problems connecting to the server >> running on Solaris 10/Sun SPARC. >> >> $ uname -a >> SunOS dev-machine 5.10 Generic_118833-36 sun4u sparc SUNW,Sun-Fire-V440 >> >> It seems that somehow the system provided >> GCC 3.4.3 miscompiles timestamptz_send() >> and it segfaults. The default function looks like this: >> > > Can you send me how you compiled Postgres (configure switches, LDFLAGS > ...) and is possible get core file? This was the configure line: ./configure --prefix=/export/local/postgresql/postgresql-8.2.3 --with-includes=/usr/local/include --with-libraries=/usr/local/lib/ I added --enable-debug --enable-depend --enable-cassert to get sensible gdb report after that. The problem was that the server had problems after psql connected with these commands: $ psql -l -h dev-machine -p 5477 -U user psql: server closed the connection unexpectedly This probably means the server terminated abnormally before orwhile processing the request. $ psql -h dev-machine -p 5477 -U user template1 psql: server closed the connection unexpectedly This probably means the server terminated abnormally before orwhile processing the request. If the user doesn't have permissions in e.g. pg_hba.conf then I get the correct permission denied error. If the user can connect then some statement inside psql causes segfault in the server. Compiled with debug info, I got this from gdb on the core file: $ gdb /.../pgsql/bin/postgres /.../data/core ... Program terminated with signal 11, Segmentation fault. #0 0x0021c8a0 in timestamptz_send (fcinfo=0x1) at timestamp.c:461 461 PG_RETURN_BYTEA_P(pq_endtypsend(&buf)); (gdb) I described my experiments, compiling with --enable-integer-datetimes fixed the issue. > > Did you try compile with different optimalization flags or did you try > sun studio compiler? No, and no. Sun Studio isn't installed, only gcc. > > Zdenek > -- ---------------------------------- Zoltán Böszörményi Cybertec Geschwinde & Schönig GmbH http://www.postgresql.at/
Tom Lane írta: > Zoltan Boszormenyi <zb@cybertec.at> writes: > >> we have found that psql in PostgreSQL 8.2.3 >> has problems connecting to the server >> running on Solaris 10/Sun SPARC. >> ... >> It seems that somehow the system provided >> GCC 3.4.3 miscompiles timestamptz_send() >> and it segfaults. >> > > I find it fairly hard to believe that timestamptz_send would be invoked > at all while using psql, much less during initial connection. psql > doesn't do any binary-output requests. > > regards, tom lane > Then please explain this miracle. Anyway, your comment makes my suspicion about the correctness of GCC-3.4.3 on Solaris 10/sparc more founded now. :-) -- ---------------------------------- Zoltán Böszörményi Cybertec Geschwinde & Schönig GmbH http://www.postgresql.at/
Zoltan Boszormenyi írta: > Zdenek Kotala írta: >> Zoltan Boszormenyi wrote: >>> Hi, >>> >>> we have found that psql in PostgreSQL 8.2.3 >>> has problems connecting to the server >>> running on Solaris 10/Sun SPARC. >>> >>> $ uname -a >>> SunOS dev-machine 5.10 Generic_118833-36 sun4u sparc SUNW,Sun-Fire-V440 >>> >>> It seems that somehow the system provided >>> GCC 3.4.3 miscompiles timestamptz_send() >>> and it segfaults. The default function looks like this: >>> >> >> Can you send me how you compiled Postgres (configure switches, >> LDFLAGS ...) and is possible get core file? > > This was the configure line: > > ./configure --prefix=/export/local/postgresql/postgresql-8.2.3 > --with-includes=/usr/local/include --with-libraries=/usr/local/lib/ > > I added --enable-debug --enable-depend --enable-cassert > to get sensible gdb report after that. > > The problem was that the server had problems > after psql connected with these commands: > > $ psql -l -h dev-machine -p 5477 -U user > psql: server closed the connection unexpectedly > This probably means the server terminated abnormally > before or while processing the request. > $ psql -h dev-machine -p 5477 -U user template1 > psql: server closed the connection unexpectedly > This probably means the server terminated abnormally > before or while processing the request. > > If the user doesn't have permissions in e.g. pg_hba.conf > then I get the correct permission denied error. > If the user can connect then some statement inside psql > causes segfault in the server. > > Compiled with debug info, I got this from gdb on the core file: > $ gdb /.../pgsql/bin/postgres /.../data/core > ... > Program terminated with signal 11, Segmentation fault. > #0 0x0021c8a0 in timestamptz_send (fcinfo=0x1) at timestamp.c:461 > 461 PG_RETURN_BYTEA_P(pq_endtypsend(&buf)); > (gdb) > > I described my experiments, compiling with --enable-integer-datetimes > fixed the issue. We compiled GCC-4.1.2 on this machine, recompiled PostgreSQL with the new GCC without --enable-integer-datetimes and it fixed the problem we experienced. It seems that my suspicion was right: GCC-3.4.3 on Solaris 10/Sparc is buggy. -- ---------------------------------- Zoltán Böszörményi Cybertec Geschwinde & Schönig GmbH http://www.postgresql.at/
Zoltan Boszormenyi írta: > Zoltan Boszormenyi írta: >> Zdenek Kotala írta: >>> Zoltan Boszormenyi wrote: >>>> Hi, >>>> >>>> we have found that psql in PostgreSQL 8.2.3 >>>> has problems connecting to the server >>>> running on Solaris 10/Sun SPARC. >>>> >>>> $ uname -a >>>> SunOS dev-machine 5.10 Generic_118833-36 sun4u sparc >>>> SUNW,Sun-Fire-V440 >>>> >>>> It seems that somehow the system provided >>>> GCC 3.4.3 miscompiles timestamptz_send() >>>> and it segfaults. The default function looks like this: >>>> >>> >>> Can you send me how you compiled Postgres (configure switches, >>> LDFLAGS ...) and is possible get core file? >> >> This was the configure line: >> >> ./configure --prefix=/export/local/postgresql/postgresql-8.2.3 >> --with-includes=/usr/local/include --with-libraries=/usr/local/lib/ >> >> I added --enable-debug --enable-depend --enable-cassert >> to get sensible gdb report after that. >> >> The problem was that the server had problems >> after psql connected with these commands: >> >> $ psql -l -h dev-machine -p 5477 -U user >> psql: server closed the connection unexpectedly >> This probably means the server terminated abnormally >> before or while processing the request. >> $ psql -h dev-machine -p 5477 -U user template1 >> psql: server closed the connection unexpectedly >> This probably means the server terminated abnormally >> before or while processing the request. >> >> If the user doesn't have permissions in e.g. pg_hba.conf >> then I get the correct permission denied error. >> If the user can connect then some statement inside psql >> causes segfault in the server. >> >> Compiled with debug info, I got this from gdb on the core file: >> $ gdb /.../pgsql/bin/postgres /.../data/core >> ... >> Program terminated with signal 11, Segmentation fault. >> #0 0x0021c8a0 in timestamptz_send (fcinfo=0x1) at timestamp.c:461 >> 461 PG_RETURN_BYTEA_P(pq_endtypsend(&buf)); >> (gdb) >> >> I described my experiments, compiling with --enable-integer-datetimes >> fixed the issue. > > We compiled GCC-4.1.2 on this machine, recompiled PostgreSQL > with the new GCC without --enable-integer-datetimes and it fixed > the problem we experienced. It seems that my suspicion was right: > GCC-3.4.3 on Solaris 10/Sparc is buggy. > Oh, and the proof that I use the newly compiled version: $ psql -h reddb-dev-pgr -p 5477 test Welcome to psql 8.2.3, the PostgreSQL interactive terminal. Type: \copyright for distribution terms \h for help with SQL commands \? for help with psql commands \g orterminate with semicolon to execute query \q to quit test=# select version(); version ----------------------------------------------------------------------------PostgreSQL 8.2.3 on sparc-sun-solaris2.10, compiledby GCC gcc (GCC) 4.1.2 (1 row) test=# show integer_datetimes;integer_datetimes -------------------off (1 row) -- ---------------------------------- Zoltán Böszörményi Cybertec Geschwinde & Schönig GmbH http://www.postgresql.at/
Zoltan Boszormenyi wrote: > > We compiled GCC-4.1.2 on this machine, recompiled PostgreSQL > with the new GCC without --enable-integer-datetimes and it fixed > the problem we experienced. It seems that my suspicion was right: > GCC-3.4.3 on Solaris 10/Sparc is buggy. > I tried original S10 gcc (3.4.3) on two different machine with different kernel update and both work fine. In term of our offlist communication and Tom's mention, It looks more as problem in linking/loading. Maybe some libraries mismatch. I'm not able say more without core. Zdenek