Thread: postgres segfaulting on pg_restore

postgres segfaulting on pg_restore

From
Chris Curvey
Date:
Thank goodness I'm still in development!  I'm trying to use pg_restore with a dump created with pg_dump.  But it keeps failing with segmentation or general protection faults.  My restore command is:

$ sudo -u postgres pg_restore -d mydatabase -j8 mydatabase.dmp

I've now tried five times, and gotten five failures. (I've tries setting --jobs to smaller values, including not specifying anything at all.)  Here's the postgres error log from the last failure:

2011-04-04 23:07:00 EDT LOG:  server process (PID 7632) was terminated by signal 11: Segmentation fault
2011-04-04 23:07:00 EDT LOG:  terminating any other active server processes
2011-04-04 23:07:00 EDT WARNING:  terminating connection because of crash of another server process
2011-04-04 23:07:00 EDT DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2011-04-04 23:07:00 EDT HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2011-04-04 23:07:00 EDT WARNING:  terminating connection because of crash of another server process
2011-04-04 23:07:00 EDT DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2011-04-04 23:07:00 EDT HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2011-04-04 23:07:00 EDT CONTEXT:  COPY transactions, line 10629187
2011-04-04 23:07:00 EDT WARNING:  terminating connection because of crash of another server process
2011-04-04 23:07:00 EDT DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2011-04-04 23:07:00 EDT HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2011-04-04 23:07:00 EDT WARNING:  terminating connection because of crash of another server process
2011-04-04 23:07:00 EDT DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2011-04-04 23:07:00 EDT HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2011-04-04 23:07:00 EDT FATAL:  the database system is in recovery mode
2011-04-04 23:07:00 EDT LOG:  all server processes terminated; reinitializing
2011-04-04 23:07:00 EDT LOG:  database system was interrupted; last known up at 2011-04-04 23:06:25 EDT
2011-04-04 23:07:00 EDT LOG:  database system was not properly shut down; automatic recovery in progress
2011-04-04 23:07:00 EDT LOG:  consistent recovery state reached at 19/6E524410
2011-04-04 23:07:00 EDT LOG:  redo starts at 19/6E175B68
2011-04-04 23:07:00 EDT LOG:  unexpected pageaddr 18/425D6000 in log file 25, segment 110, offset 6119424
2011-04-04 23:07:00 EDT LOG:  redo done at 19/6E5D5F90
2011-04-04 23:07:00 EDT LOG:  last completed transaction was at log time 2011-04-04 23:06:16.684615-04
2011-04-04 23:07:01 EDT LOG:  database system is ready to accept connections
2011-04-04 23:07:01 EDT LOG:  autovacuum launcher started

And in /var/log/messages, I have the following (last line is for this crash, previous lines for previous crashes):

Apr  4 21:27:27 mu kernel: [ 1964.787667] do_general_protection: 24 callbacks suppressed
Apr  4 21:27:27 mu kernel: [ 1964.787671] postgres[3439] general protection ip:7f8372c8d2f7 sp:7fff61b72b38 error:0 in postgres[7f8372973000+4a8000]
Apr  4 21:32:21 mu kernel: [ 2258.266116] postgres[4307]: segfault at 30 ip 00007f8372c8d2fb sp 00007fff61b72bd8 error 4 in postgres[7f8372973000+4a8000]
Apr  4 21:47:27 mu kernel: [ 3164.740812] postgres[5040]: segfault at 40 ip 00007f8372c8d2fb sp 00007fff61b72b38 error 4 in postgres[7f8372973000+4a8000]
Apr  4 22:03:12 mu kernel: [ 4108.987420] postgres[5077]: segfault at 40 ip 00007f8372c8d2fb sp 00007fff61b72978 error 4 in postgres[7f8372973000+4a8000]
Apr  4 22:56:13 mu kernel: [ 7288.639099] postgres[5308] general protection ip:7f8372c8d2f7 sp:7fff61b72748 error:0 in postgres[7f8372973000+4a8000]
Apr  4 23:07:00 mu kernel: [ 7935.670820] postgres[7632]: segfault at 7365 ip 00007f8372c8c94e sp 00007fff61b72ad0 error 4 in postgres[7f8372973000+4a8000]

This is on Ubuntu 10.10 server, 64-bit on Intel.  I'm running Postgres 9.0.3, 64-bit from the ppa repository.

What can I do to help the developers find the root cause of this?  I'm happy to run a debug version, or find a corefile, or whatever I can, but I might need some guidance, as I'm pretty new to postgres.

-Chris


--
Ignoring that little voice in my head since 1966!

Re: postgres segfaulting on pg_restore

From
Craig Ringer
Date:
On 05/04/11 11:18, Chris Curvey wrote:

> This is on Ubuntu 10.10 server, 64-bit on Intel.  I'm running Postgres
> 9.0.3, 64-bit from the ppa repository.
>
> What can I do to help the developers find the root cause of this?  I'm
> happy to run a debug version, or find a corefile, or whatever I can, but
> I might need some guidance, as I'm pretty new to postgres.

A backtrace would be a good start. On Ubuntu you will need to install
debuginfo packages first. For details, see:

http://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Linux/BSD

--
Craig Ringer

Re: postgres segfaulting on pg_restore

From
Chris Curvey
Date:


On Mon, Apr 4, 2011 at 11:59 PM, Craig Ringer <craig@postnewspapers.com.au> wrote:
On 05/04/11 11:18, Chris Curvey wrote:

> This is on Ubuntu 10.10 server, 64-bit on Intel.  I'm running Postgres
> 9.0.3, 64-bit from the ppa repository.
>
> What can I do to help the developers find the root cause of this?  I'm
> happy to run a debug version, or find a corefile, or whatever I can, but
> I might need some guidance, as I'm pretty new to postgres.

A backtrace would be a good start. On Ubuntu you will need to install
debuginfo packages first. For details, see:

http://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Linux/BSD

--
Craig Ringer

Hmmm.  I'm on Ubuntu, and running 9.0.3, so I'm getting my packages from Martin Pitt, per the instructions at http://www.dctrwatson.com/2010/09/installing-postgresql-9-0-on-ubuntu-10-04/.  I don't seem to see any debug packages there.

Can anyone point me to debug packages for 9.0.3, or point me to the correct flags to use to compile from source with the debug symbols?


--
Ignoring that little voice in my head since 1966!

Re: postgres segfaulting on pg_restore

From
Craig Ringer
Date:
On 7/04/2011 4:02 AM, Chris Curvey wrote:

> Hmmm.  I'm on Ubuntu, and running 9.0.3, so I'm getting my packages from
> Martin Pitt, per the instructions at
> http://www.dctrwatson.com/2010/09/installing-postgresql-9-0-on-ubuntu-10-04/.
>   I don't seem to see any debug packages there.

Nor I. https://launchpad.net/~pitti/+archive/postgresql/+packages
doesn't seem to list any.

Martin: Are your packages stripped, or do they include minimal debug
info in the main packages? Do you put your debuginfo debs anywhere
accessible?

--
Craig Ringer

Tech-related writing at http://soapyfrogs.blogspot.com/

Re: postgres segfaulting on pg_restore

From
Chris Curvey
Date:

Nor I. https://launchpad.net/~pitti/+archive/postgresql/+packages doesn't seem to list any.

Martin: Are your packages stripped, or do they include minimal debug info in the main packages? Do you put your debuginfo debs anywhere accessible?


I decided to just try building it myself -- it has been a voyage of discovery.  Here are instructions without going down all the false paths that I did.

1) download postgres 9.0.3 from postgresql.org
2) install bison, flex and libreadline5-dev (sudo apt-get install PACKAGE).  I already had the gcc package installed
3) ./configure --enable-debug  (I could not get --with-openssl to work...the server kept complaining about SSL when it started.  so I gave up)
4) make
5) sudo make install
6) stop the existing server (sudo /etc/init.d/postgresql stop)
7) start the debug server (sudo -u postgres /usr/local/pgsql/bin start -D /etc/postgresql/9.0.main)  took me a while to figure out that  the -D argument goes to where your postgresql.conf file lives, NOT to where the database files actually are.  Probably obvious to experienced users.
8) Start my restore

then follow the instructions at http://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Linux/BSD to get gdb connected to the postgres backend.

And voila!  Here is the backtrace:

#0  0x00000000006ce317 in GetMemoryChunkSpace (pointer=0x347cc70) at mcxt.c:264
#1  0x00000000006d3d56 in writetup_index (state=0x26fc530, tapenum=<value optimized out>, stup=<value optimized out>) at tuplesort.c:2924
#2  0x00000000006d2af7 in dumptuples (state=0x26fc530, alltuples=0 '\000') at tuplesort.c:2068
#3  0x00000000006d392f in puttuple_common (state=0x26fc530, tuple=0x7fff1e21d3b0) at tuplesort.c:1097
#4  0x00000000006d3c4c in tuplesort_putindextuple (state=0x26fc530, tuple=<value optimized out>) at tuplesort.c:943
#5  0x0000000000472cac in btbuildCallback (index=<value optimized out>, htup=0x26f4460, values=<value optimized out>, isnull=<value optimized out>, tupleIsAlive=1 '\001', state=0x7fff1e21d870) at nbtree.c:194
#6  0x00000000004ab1ec in IndexBuildHeapScan (heapRelation=<value optimized out>, indexRelation=<value optimized out>, indexInfo=<value optimized out>, allow_sync=<value optimized out>, callback=<value optimized out>, callback_state=<value optimized out>) at index.c:1866
#7  0x0000000000472b35 in btbuild (fcinfo=<value optimized out>) at nbtree.c:123
#8  0x00000000006b8ba1 in OidFunctionCall3 (functionId=<value optimized out>, arg1=140128587519600, arg2=140128587659696, arg3=40470992) at fmgr.c:1610
#9  0x00000000004ab804 in index_build (heapRelation=0x7f723aae9670, indexRelation=0x7f723ab0b9b0, indexInfo=0x26989d0, isprimary=0 '\000') at index.c:1427
#10 0x00000000004ad43e in index_create (heapRelationId=<value optimized out>, indexRelationName=<value optimized out>, indexRelationId=<value optimized out>, indexInfo=0x26989d0, indexColNames=<value optimized out>, accessMethodObjectId=<value optimized out>, tableSpaceId=0, classObjectId=0x26f2e70, coloptions=0x26f2e90, reloptions=0, isprimary=0 '\000', isconstraint=0 '\000', deferrable=0 '\000', initdeferred=0 '\000', allow_system_table_mods=<value optimized out>, skip_build=0 '\000', concurrent=0 '\000') at index.c:959
#11 0x0000000000514ec2 in DefineIndex (heapRelation=<value optimized out>, indexRelationName=<value optimized out>, indexRelationId=<value optimized out>, accessMethodName=<value optimized out>, tableSpaceName=<value optimized out>, attributeList=0x2, predicate=0x0, options=0x0, exclusionOpNames=0x0, unique=0 '\000', primary=0 '\000', isconstraint=0 '\000', deferrable=<value optimized out>, initdeferred=<value optimized out>, is_alter_table=0 '\000', check_rights=1 '\001', skip_build=0 '\000', quiet=0 '\000', concurrent=<value optimized out>) at indexcmds.c:484
#12 0x0000000000603b69 in standard_ProcessUtility (parsetree=0x2648880, queryString=0x2647be0 "CREATE INDEX itransf ON transactions USING btree (loccode, startdtact, starttmact);", params=0x0, isTopLevel=1 '\001', dest=0x2648c20, completionTag=0x7fff1e21e3c0 "") at utility.c:876
#13 0x00000000006000a7 in PortalRunUtility (portal=0x25bf0d0, utilityStmt=0x2648880, isTopLevel=0 '\000', dest=0x2648c20, completionTag=0x7fff1e21e3c0 "") at pquery.c:1191
#14 0x00000000006010ec in PortalRunMulti (portal=0x25bf0d0, isTopLevel=1 '\001', dest=0x2648c20, altdest=0x2648c20, completionTag=0x7fff1e21e3c0 "") at pquery.c:1296
#15 0x0000000000601852 in PortalRun (portal=<value optimized out>, count=<value optimized out>, isTopLevel=112 'p', dest=<value optimized out>, altdest=<value optimized out>, completionTag=<value optimized out>) at pquery.c:822
#16 0x00000000005fde0b in exec_simple_query (query_string=<value optimized out>) at postgres.c:1058
#17 0x00000000005fee47 in PostgresMain (argc=<value optimized out>, argv=<value optimized out>, username=<value optimized out>) at postgres.c:3931
#18 0x00000000005cc3b9 in BackendRun () at postmaster.c:3555
#19 BackendStartup () at postmaster.c:3242
#20 ServerLoop () at postmaster.c:1431
#21 0x00000000005cea1c in PostmasterMain (argc=39596208, argv=0x259f8d0) at postmaster.c:1092
#22 0x0000000000575be0 in main (argc=3, argv=0x259f8c0) at main.c:188

so, do I leave this here, or do I send it to pgsql-bugs?


--
Ignoring that little voice in my head since 1966!

Re: postgres segfaulting on pg_restore

From
Craig Ringer
Date:
On 04/07/2011 06:37 PM, Chris Curvey wrote:

> 2) install bison, flex and libreadline5-dev (sudo apt-get install
> PACKAGE).  I already had the gcc package installed

The easy way to do this on Debian/ubuntu, if you're building something
that packages exist for, is (eg):

sudo apt-get build-dep postgresql

This downloads and installs all the dependencies required to compile
postgresql.

> And voila!  Here is the backtrace:

Thankyou VERY much for taking the time to collect this information.

It appears to be crashing while building an index:

CREATE INDEX itransf ON transactions USING btree
(loccode, startdtact, starttmact);

I don't know PostgreSQL's innards well enough to know much more than
that, but others do and may well check this out.

Are you able to distribute your dataset - if not to the general public,
then to someone interested in identifying the fault?

Does the same dataset crash Pg when restored on another machine?

> #0  0x00000000006ce317 in GetMemoryChunkSpace (pointer=0x347cc70) at
> mcxt.c:264
> #1  0x00000000006d3d56 in writetup_index (state=0x26fc530,
> tapenum=<value optimized out>, stup=<value optimized out>) at
> tuplesort.c:2924
> #2  0x00000000006d2af7 in dumptuples (state=0x26fc530, alltuples=0
> '\000') at tuplesort.c:2068
> #3  0x00000000006d392f in puttuple_common (state=0x26fc530,
> tuple=0x7fff1e21d3b0) at tuplesort.c:1097
> #4  0x00000000006d3c4c in tuplesort_putindextuple (state=0x26fc530,
> tuple=<value optimized out>) at tuplesort.c:943
> #5  0x0000000000472cac in btbuildCallback (index=<value optimized out>,
> htup=0x26f4460, values=<value optimized out>, isnull=<value optimized
> out>, tupleIsAlive=1 '\001', state=0x7fff1e21d870) at nbtree.c:194
> #6  0x00000000004ab1ec in IndexBuildHeapScan (heapRelation=<value
> optimized out>, indexRelation=<value optimized out>, indexInfo=<value
> optimized out>, allow_sync=<value optimized out>, callback=<value
> optimized out>, callback_state=<value optimized out>) at index.c:1866
> #7  0x0000000000472b35 in btbuild (fcinfo=<value optimized out>) at
> nbtree.c:123
> #8  0x00000000006b8ba1 in OidFunctionCall3 (functionId=<value optimized
> out>, arg1=140128587519600, arg2=140128587659696, arg3=40470992) at
> fmgr.c:1610
> #9  0x00000000004ab804 in index_build (heapRelation=0x7f723aae9670,
> indexRelation=0x7f723ab0b9b0, indexInfo=0x26989d0, isprimary=0 '\000')
> at index.c:1427
> #10 0x00000000004ad43e in index_create (heapRelationId=<value optimized
> out>, indexRelationName=<value optimized out>, indexRelationId=<value
> optimized out>, indexInfo=0x26989d0, indexColNames=<value optimized
> out>, accessMethodObjectId=<value optimized out>, tableSpaceId=0,
> classObjectId=0x26f2e70, coloptions=0x26f2e90, reloptions=0, isprimary=0
> '\000', isconstraint=0 '\000', deferrable=0 '\000', initdeferred=0
> '\000', allow_system_table_mods=<value optimized out>, skip_build=0
> '\000', concurrent=0 '\000') at index.c:959
> #11 0x0000000000514ec2 in DefineIndex (heapRelation=<value optimized
> out>, indexRelationName=<value optimized out>, indexRelationId=<value
> optimized out>, accessMethodName=<value optimized out>,
> tableSpaceName=<value optimized out>, attributeList=0x2, predicate=0x0,
> options=0x0, exclusionOpNames=0x0, unique=0 '\000', primary=0 '\000',
> isconstraint=0 '\000', deferrable=<value optimized out>,
> initdeferred=<value optimized out>, is_alter_table=0 '\000',
> check_rights=1 '\001', skip_build=0 '\000', quiet=0 '\000',
> concurrent=<value optimized out>) at indexcmds.c:484
> #12 0x0000000000603b69 in standard_ProcessUtility (parsetree=0x2648880,
> queryString=0x2647be0 "CREATE INDEX itransf ON transactions USING btree
> (loccode, startdtact, starttmact);", params=0x0, isTopLevel=1 '\001',
> dest=0x2648c20, completionTag=0x7fff1e21e3c0 "") at utility.c:876
> #13 0x00000000006000a7 in PortalRunUtility (portal=0x25bf0d0,
> utilityStmt=0x2648880, isTopLevel=0 '\000', dest=0x2648c20,
> completionTag=0x7fff1e21e3c0 "") at pquery.c:1191
> #14 0x00000000006010ec in PortalRunMulti (portal=0x25bf0d0, isTopLevel=1
> '\001', dest=0x2648c20, altdest=0x2648c20, completionTag=0x7fff1e21e3c0
> "") at pquery.c:1296
> #15 0x0000000000601852 in PortalRun (portal=<value optimized out>,
> count=<value optimized out>, isTopLevel=112 'p', dest=<value optimized
> out>, altdest=<value optimized out>, completionTag=<value optimized
> out>) at pquery.c:822
> #16 0x00000000005fde0b in exec_simple_query (query_string=<value
> optimized out>) at postgres.c:1058
> #17 0x00000000005fee47 in PostgresMain (argc=<value optimized out>,
> argv=<value optimized out>, username=<value optimized out>) at
> postgres.c:3931
> #18 0x00000000005cc3b9 in BackendRun () at postmaster.c:3555
> #19 BackendStartup () at postmaster.c:3242
> #20 ServerLoop () at postmaster.c:1431
> #21 0x00000000005cea1c in PostmasterMain (argc=39596208, argv=0x259f8d0)
> at postmaster.c:1092
> #22 0x0000000000575be0 in main (argc=3, argv=0x259f8c0) at main.c:188
>
> so, do I leave this here, or do I send it to pgsql-bugs?
>
>
> --
> Ignoring that little voice in my head since 1966!


Re: postgres segfaulting on pg_restore

From
Chris Curvey
Date:

Are you able to distribute your dataset - if not to the general public, then to someone interested in identifying the fault?

probably not.  but let me ask...maybe I can come up with a creative solution.


Does the same dataset crash Pg when restored on another machine?

That is a good question. I'm running a full memtest on the server in question at the moment, just to be sure that this is not a hardware issue.  (And I'm assuming that if I was out of memory -- which I think is unlikely -- that I would get a different error.)

Let me see if I can load it somewhere else.   And since the only other machine I have to try it on is a Windows box, I will be able to answer my own question of "are pg_dump dumps OS-portable?"  



--
Ignoring that little voice in my head since 1966!

Re: postgres segfaulting on pg_restore

From
Craig Ringer
Date:
On 04/07/2011 08:49 PM, Chris Curvey wrote:

> Let me see if I can load it somewhere else.   And since the only other
> machine I have to try it on is a Windows box, I will be able to answer
> my own question of "are pg_dump dumps OS-portable?"

The answer there is "mostly". Contrib modules can cause issues and
locale names differ across OSes. I usually CREATE DATABASE then restore
into the empty database, because pg_restore may try to create a database
using a locale that doesn't exist on the target machine.

--
Craig Ringer

Re: postgres segfaulting on pg_restore

From
Tom Lane
Date:
Chris Curvey <chris@chriscurvey.com> writes:
> And voila!  Here is the backtrace:

> #0  0x00000000006ce317 in GetMemoryChunkSpace (pointer=0x347cc70) at
> mcxt.c:264
> #1  0x00000000006d3d56 in writetup_index (state=0x26fc530, tapenum=<value
> optimized out>, stup=<value optimized out>) at tuplesort.c:2924
> #2  0x00000000006d2af7 in dumptuples (state=0x26fc530, alltuples=0 '\000')
> at tuplesort.c:2068
> #3  0x00000000006d392f in puttuple_common (state=0x26fc530,
> tuple=0x7fff1e21d3b0) at tuplesort.c:1097
> #4  0x00000000006d3c4c in tuplesort_putindextuple (state=0x26fc530,
> tuple=<value optimized out>) at tuplesort.c:943
> #5  0x0000000000472cac in btbuildCallback (index=<value optimized out>,
> htup=0x26f4460, values=<value optimized out>, isnull=<value optimized out>,
> tupleIsAlive=1 '\001', state=0x7fff1e21d870) at nbtree.c:194

That is damn peculiar.  The tuple handed to writetup_index would have
been copied just moments before in tuplesort_putindextuple, so there is
no way that GetMemoryChunkSpace ought to fail.  If you do the run
several times over, do you get the exact same stack trace every time?

            regards, tom lane

Re: postgres segfaulting on pg_restore

From
Chris Curvey
Date:


On Thu, Apr 7, 2011 at 10:45 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Chris Curvey <chris@chriscurvey.com> writes:
> And voila!  Here is the backtrace:

> #0  0x00000000006ce317 in GetMemoryChunkSpace (pointer=0x347cc70) at
> mcxt.c:264
> #1  0x00000000006d3d56 in writetup_index (state=0x26fc530, tapenum=<value
> optimized out>, stup=<value optimized out>) at tuplesort.c:2924
> #2  0x00000000006d2af7 in dumptuples (state=0x26fc530, alltuples=0 '\000')
> at tuplesort.c:2068
> #3  0x00000000006d392f in puttuple_common (state=0x26fc530,
> tuple=0x7fff1e21d3b0) at tuplesort.c:1097
> #4  0x00000000006d3c4c in tuplesort_putindextuple (state=0x26fc530,
> tuple=<value optimized out>) at tuplesort.c:943
> #5  0x0000000000472cac in btbuildCallback (index=<value optimized out>,
> htup=0x26f4460, values=<value optimized out>, isnull=<value optimized out>,
> tupleIsAlive=1 '\001', state=0x7fff1e21d870) at nbtree.c:194

That is damn peculiar.

You clearly haven't met me yet :)
 
 The tuple handed to writetup_index would have
been copied just moments before in tuplesort_putindextuple, so there is
no way that GetMemoryChunkSpace ought to fail.  If you do the run
several times over, do you get the exact same stack trace every time?

                       regards, tom lane

I don't get the same trace every time.   I ran this again to see if I could duplicate it, and everything worked.  Then I ran it again, and got this trace.  (Segfault in same location, but a different path to it.)

0  0x00000000006ce31b in GetMemoryChunkSpace (pointer=0x246eae0) at mcxt.c:264
#1  0x00000000006d3d56 in writetup_index (state=0x22cd460, tapenum=<value optimized out>, stup=<value optimized out>) at tuplesort.c:2924
#2  0x00000000006d2af7 in dumptuples (state=0x22cd460, alltuples=1 '\001') at tuplesort.c:2068
#3  0x00000000006d304f in tuplesort_performsort (state=0x22cd460) at tuplesort.c:1164
#4  0x0000000000476023 in _bt_leafbuild (btspool=<value optimized out>, btspool2=0x246eae0) at nbtsort.c:206
#5  0x0000000000472b5a in btbuild (fcinfo=<value optimized out>) at nbtree.c:139
#6  0x00000000006b8ba1 in OidFunctionCall3 (functionId=<value optimized out>, arg1=140165585988880, arg2=140156762410992, arg3=36255616) at fmgr.c:1610
#7  0x00000000004ab804 in index_build (heapRelation=0x7f7ad7f66d10, indexRelation=0x7f78ca094ff0, indexInfo=0x2293780, isprimary=1 '\001') at index.c:1427
#8  0x00000000004ad43e in index_create (heapRelationId=<value optimized out>, indexRelationName=<value optimized out>, indexRelationId=<value optimized out>, indexInfo=0x2293780, indexColNames=<value optimized out>, accessMethodObjectId=<value optimized out>, tableSpaceId=0, classObjectId=0x22939d0, coloptions=0x22939e8, reloptions=0, isprimary=1 '\001', isconstraint=1 '\001', deferrable=0 '\000', initdeferred=0 '\000', allow_system_table_mods=<value optimized out>, skip_build=0 '\000', concurrent=0 '\000') at index.c:959
#9  0x0000000000514ec2 in DefineIndex (heapRelation=<value optimized out>, indexRelationName=<value optimized out>, indexRelationId=<value optimized out>, accessMethodName=<value optimized out>, tableSpaceName=<value optimized out>, attributeList=0x0, predicate=0x0, options=0x0, exclusionOpNames=0x0, unique=1 '\001', primary=1 '\001', isconstraint=1 '\001', deferrable=<value optimized out>, initdeferred=<value optimized out>, is_alter_table=1 '\001', check_rights=1 '\001', skip_build=0 '\000', quiet=0 '\000', concurrent=<value optimized out>) at indexcmds.c:484
#10 0x0000000000522b7b in ATExecAddIndex (tab=<value optimized out>, rel=<value optimized out>, stmt=<value optimized out>, is_rebuild=0 '\000') at tablecmds.c:4576
#11 0x000000000052b422 in ATExecCmd (wqueue=<value optimized out>) at tablecmds.c:2744
#12 ATRewriteCatalogs (wqueue=<value optimized out>) at tablecmds.c:2670
#13 0x000000000052c0d7 in ATController (rel=0x7f7ad7f66d10, cmds=<value optimized out>, recurse=<value optimized out>) at tablecmds.c:2421
#14 0x0000000000604415 in standard_ProcessUtility (parsetree=<value optimized out>, queryString=0x2246b80 "ALTER TABLE ONLY erelated_base\n    ADD CONSTRAINT erelated_base_pkey PRIMARY KEY (id);", params=0x0, isTopLevel=1 '\001', dest=0x2247b40, completionTag=0x7fffc180d440 "") at utility.c:717
#15 0x00000000006000a7 in PortalRunUtility (portal=0x21bdf90, utilityStmt=0x2247800, isTopLevel=0 '\000', dest=0x2247b40, completionTag=0x7fffc180d440 "") at pquery.c:1191
#16 0x00000000006010ec in PortalRunMulti (portal=0x21bdf90, isTopLevel=1 '\001', dest=0x2247b40, altdest=0x2247b40, completionTag=0x7fffc180d440 "") at pquery.c:1296
#17 0x0000000000601852 in PortalRun (portal=<value optimized out>, count=<value optimized out>, isTopLevel=-32 '\340', dest=<value optimized out>, altdest=<value optimized out>, completionTag=<value optimized out>) at pquery.c:822
#18 0x00000000005fde0b in exec_simple_query (query_string=<value optimized out>) at postgres.c:1058
#19 0x00000000005fee47 in PostgresMain (argc=<value optimized out>, argv=<value optimized out>, username=<value optimized out>) at postgres.c:3931
#20 0x00000000005cc3b9 in BackendRun () at postmaster.c:3555
#21 BackendStartup () at postmaster.c:3242
#22 ServerLoop () at postmaster.c:1431
#23 0x00000000005cea1c in PostmasterMain (argc=35397488, argv=0x219d8f0) at postmaster.c:1092
#24 0x0000000000575be0 in main (argc=3, argv=0x219d8e0) at main.c:188



--
Ignoring that little voice in my head since 1966!

Re: postgres segfaulting on pg_restore

From
Chris Curvey
Date:


and, FWIW, here's another trace, which is NEARLY the same as the first one I posted, with the difference being a slightly different line number at #3.  I will be quiet now and leave the brain trust to ponder.  Let me know if there is anything else I can get for you guys.

#0  0x00000000006ce317 in GetMemoryChunkSpace (pointer=0x292a0e0) at mcxt.c:264
#1  0x00000000006d3d56 in writetup_index (state=0x2281670, tapenum=<value optimized out>, stup=<value optimized out>) at tuplesort.c:2924
#2  0x00000000006d2af7 in dumptuples (state=0x2281670, alltuples=0 '\000') at tuplesort.c:2068
#3  0x00000000006d38b7 in puttuple_common (state=0x2281670, tuple=<value optimized out>) at tuplesort.c:1046
#4  0x00000000006d3c4c in tuplesort_putindextuple (state=0x2281670, tuple=<value optimized out>) at tuplesort.c:943
#5  0x0000000000472cac in btbuildCallback (index=<value optimized out>, htup=0x22eade0, values=<value optimized out>, isnull=<value optimized out>, tupleIsAlive=1 '\001', state=0x7fffc180c8f0) at nbtree.c:194
#6  0x00000000004ab1ec in IndexBuildHeapScan (heapRelation=<value optimized out>, indexRelation=<value optimized out>, indexInfo=<value optimized out>, allow_sync=<value optimized out>, callback=<value optimized out>, callback_state=<value optimized out>) at index.c:1866
#7  0x0000000000472b35 in btbuild (fcinfo=<value optimized out>) at nbtree.c:123
#8  0x00000000006b8ba1 in OidFunctionCall3 (functionId=<value optimized out>, arg1=140165586614672, arg2=140165586792240, arg3=36148832) at fmgr.c:1610
#9  0x00000000004ab804 in index_build (heapRelation=0x7f7ad7fff990, indexRelation=0x7f7ad802af30, indexInfo=0x2279660, isprimary=0 '\000') at index.c:1427
#10 0x00000000004ad43e in index_create (heapRelationId=<value optimized out>, indexRelationName=<value optimized out>, indexRelationId=<value optimized out>, indexInfo=0x2279660, indexColNames=<value optimized out>, accessMethodObjectId=<value optimized out>, tableSpaceId=0, classObjectId=0x228ee20, coloptions=0x228ee40, reloptions=0, isprimary=0 '\000', isconstraint=0 '\000', deferrable=0 '\000', initdeferred=0 '\000', allow_system_table_mods=<value optimized out>, skip_build=0 '\000', concurrent=0 '\000') at index.c:959
#11 0x0000000000514ec2 in DefineIndex (heapRelation=<value optimized out>, indexRelationName=<value optimized out>, indexRelationId=<value optimized out>, accessMethodName=<value optimized out>, tableSpaceName=<value optimized out>, attributeList=0x2, predicate=0x0, options=0x0, exclusionOpNames=0x0, unique=0 '\000', primary=0 '\000', isconstraint=0 '\000', deferrable=<value optimized out>, initdeferred=<value optimized out>, is_alter_table=0 '\000', check_rights=1 '\001', skip_build=0 '\000', quiet=0 '\000', concurrent=<value optimized out>) at indexcmds.c:484
#12 0x0000000000603b69 in standard_ProcessUtility (parsetree=0x21a4218, queryString=0x21a3570 "CREATE INDEX ieqansb ON eqanswer_base USING btree (entityrole, fieldnum, memotext);", params=0x0, isTopLevel=1 '\001', dest=0x21a45b8, completionTag=0x7fffc180d440 "") at utility.c:876
#13 0x00000000006000a7 in PortalRunUtility (portal=0x225ea10, utilityStmt=0x21a4218, isTopLevel=0 '\000', dest=0x21a45b8, completionTag=0x7fffc180d440 "") at pquery.c:1191
#14 0x00000000006010ec in PortalRunMulti (portal=0x225ea10, isTopLevel=1 '\001', dest=0x21a45b8, altdest=0x21a45b8, completionTag=0x7fffc180d440 "") at pquery.c:1296
#15 0x0000000000601852 in PortalRun (portal=<value optimized out>, count=<value optimized out>, isTopLevel=-32 '\340', dest=<value optimized out>, altdest=<value optimized out>, completionTag=<value optimized out>) at pquery.c:822
#16 0x00000000005fde0b in exec_simple_query (query_string=<value optimized out>) at postgres.c:1058
#17 0x00000000005fee47 in PostgresMain (argc=<value optimized out>, argv=<value optimized out>, username=<value optimized out>) at postgres.c:3931
#18 0x00000000005cc3b9 in BackendRun () at postmaster.c:3555
#19 BackendStartup () at postmaster.c:3242
#20 ServerLoop () at postmaster.c:1431
#21 0x00000000005cea1c in PostmasterMain (argc=35397488, argv=0x219d8f0) at postmaster.c:1092
#22 0x0000000000575be0 in main (argc=3, argv=0x219d8e0) at main.c:188

--
Ignoring that little voice in my head since 1966!

Re: postgres segfaulting on pg_restore

From
Tom Lane
Date:
Chris Curvey <chris@chriscurvey.com> writes:
> and, FWIW, here's another trace, which is NEARLY the same as the first one I
> posted, with the difference being a slightly different line number at #3.  I
> will be quiet now and leave the brain trust to ponder.  Let me know if there
> is anything else I can get for you guys.

> #0  0x00000000006ce317 in GetMemoryChunkSpace (pointer=0x292a0e0) at
> mcxt.c:264
> #1  0x00000000006d3d56 in writetup_index (state=0x2281670, tapenum=<value
> optimized out>, stup=<value optimized out>) at tuplesort.c:2924

It occurred to me that a simple explanation for a core dump there would
be if something had scribbled past the end of the preceding palloc
chunk.  That would tend to clobber the "context" link of the palloc
chunk after it, which would send GetMemoryChunkSpace off into
never-never land following a trashed pointer.

That doesn't get us very much closer to a solution, but it does suggest
that you might learn something if you try this with an assert-enabled
build (configure --enable-cassert).  Also, are there any contrib modules
or third-party add-on modules or home-brew C functions that you're using?

            regards, tom lane

Re: postgres segfaulting on pg_restore

From
Chris Curvey
Date:

It occurred to me that a simple explanation for a core dump there would
be if something had scribbled past the end of the preceding palloc
chunk.  That would tend to clobber the "context" link of the palloc
chunk after it, which would send GetMemoryChunkSpace off into
never-never land following a trashed pointer.

That doesn't get us very much closer to a solution, but it does suggest
that you might learn something if you try this with an assert-enabled
build (configure --enable-cassert).  Also, are there any contrib modules
or third-party add-on modules or home-brew C functions that you're using?


in reverse order:  no third-party, no contrib, no home-brew C.    here is a stack trace from a fresh build of 9.1 beta 1 built with enable-cassert.  

 Program received signal SIGSEGV, Segmentation fault.
0x0000000000744777 in AllocSetAlloc (context=0x13f6d08, size=16) at aset.c:639
639 Assert(chunk->size >= size);
#0  0x0000000000744777 in AllocSetAlloc (context=0x13f6d08, size=16) at aset.c:639
#1  0x000000000074d07b in readtup_index (state=0x1453568, stup=0x7fffa5135010, tapenum=0, len=<value optimized out>) at tuplesort.c:3258
#2  0x000000000074a57c in mergeprereadone (state=0x1453568, srcTape=0) at tuplesort.c:2157
#3  0x000000000074a9d3 in tuplesort_gettuple_common (state=0x1453568, forward=<value optimized out>, stup=0x7fffa51350a0, should_free=<value optimized out>) at tuplesort.c:1491
#4  0x000000000074ac96 in tuplesort_getindextuple (state=0x13f6d08, forward=16 '\020', should_free=0x10 <Address 0x10 out of bounds>) at tuplesort.c:1580
#5  0x000000000048d076 in _bt_load (btspool=<value optimized out>, btspool2=<value optimized out>) at nbtsort.c:786
#6  _bt_leafbuild (btspool=<value optimized out>, btspool2=<value optimized out>) at nbtsort.c:224
#7  0x000000000048788c in btbuild (fcinfo=<value optimized out>) at nbtree.c:142
#8  0x000000000072d55c in OidFunctionCall3Coll (functionId=<value optimized out>, collation=0, arg1=140144035752040, arg2=21827704, arg3=20972800) at fmgr.c:1645
#9  0x00000000004d095d in index_build (heapRelation=0x7f75d377e868, indexRelation=0x14d1078, indexInfo=0x1400500, isprimary=<value optimized out>, isreindex=0 '\000') at index.c:1737
#10 0x00000000004d31c5 in index_create (heapRelation=<value optimized out>, indexRelationName=<value optimized out>, indexRelationId=<value optimized out>, indexInfo=0x1400500, indexColNames=<value optimized out>, accessMethodObjectId=<value optimized out>, tableSpaceId=0, collationObjectId=0x1400618, classObjectId=0x1400638, coloptions=0x1400658, reloptions=0, isprimary=1 '\001', isconstraint=1 '\001', deferrable=0 '\000', initdeferred=0 '\000', allow_system_table_mods=0 '\000', skip_build=0 '\000', concurrent=0 '\000') at index.c:1067
#11 0x000000000054bd81 in DefineIndex (heapRelation=<value optimized out>, indexRelationName=<value optimized out>, indexRelationId=<value optimized out>, accessMethodName=<value optimized out>, tableSpaceName=<value optimized out>, attributeList=0x85cf3e, predicate=0x0, options=0x0, exclusionOpNames=0x0, unique=1 '\001', primary=1 '\001', isconstraint=1 '\001', deferrable=<value optimized out>, initdeferred=<value optimized out>, is_alter_table=1 '\001', check_rights=1 '\001', skip_build=0 '\000', quiet=0 '\000', concurrent=<value optimized out>) at indexcmds.c:397
#12 0x000000000055b115 in ATExecAddIndex (tab=0x13bb038, rel=<value optimized out>, stmt=0x14001d8, is_rebuild=0 '\000', lockmode=<value optimized out>) at tablecmds.c:5173
#13 0x00000000005653c9 in ATExecCmd (wqueue=<value optimized out>, tab=0x13bb038, rel=0x7f75d377e868, cmd=0x14001a0, lockmode=6) at tablecmds.c:3074
#14 0x0000000000565fc5 in ATRewriteCatalogs (rel=<value optimized out>, cmds=<value optimized out>, recurse=<value optimized out>, lockmode=6) at tablecmds.c:3001
#15 ATController (rel=<value optimized out>, cmds=<value optimized out>, recurse=<value optimized out>, lockmode=6) at tablecmds.c:2735
#16 0x000000000066bda5 in standard_ProcessUtility (parsetree=<value optimized out>, queryString=0x13ad538 "ALTER TABLE ONLY transactions\n    ADD CONSTRAINT transactions_pkey PRIMARY KEY (id);", params=0x0, isTopLevel=1 '\001', dest=0x13ae5a8, completionTag=0x7fffa5135e30 "") at utility.c:767
#17 0x00000000006674b7 in PortalRunUtility (portal=0x1329078, utilityStmt=0x13ae220, isTopLevel=16 '\020', dest=0x13ae5a8, completionTag=0x7fffa5135e30 "") at pquery.c:1184
#18 0x0000000000668740 in PortalRunMulti (portal=0x1329078, isTopLevel=1 '\001', dest=<value optimized out>, altdest=0x13ae5a8, completionTag=0x7fffa5135e30 "") at pquery.c:1315
#19 0x0000000000668f52 in PortalRun (portal=<value optimized out>, count=<value optimized out>, isTopLevel=16 '\020', dest=<value optimized out>, altdest=<value optimized out>, completionTag=<value optimized out>) at pquery.c:813
#20 0x000000000066518b in exec_simple_query (query_string=<value optimized out>) at postgres.c:1018
#21 0x000000000066615b in PostgresMain (argc=<value optimized out>, argv=<value optimized out>, username=<value optimized out>) at postgres.c:3924
#22 0x000000000061d2f1 in BackendRun () at postmaster.c:3600
#23 BackendStartup () at postmaster.c:3285
#24 ServerLoop () at postmaster.c:1454
#25 0x000000000061fc9d in PostmasterMain (argc=3, argv=0x1305880) at postmaster.c:1115
#26 0x00000000005b9720 in main (argc=3, argv=0x1305870) at main.c:199

--
e-Mail is the equivalent of a postcard written in pencil.  This message may not have been sent by me, or intended for you.  It may have been read or even modified while in transit.  e-Mail disclaimers have the same force in law as a note passed in study hall.  If your corporate attorney says that you need an disclaimer in your signature, you need a new corporate attorney.

Re: postgres segfaulting on pg_restore

From
Tom Lane
Date:
Chris Curvey <chris@chriscurvey.com> writes:
> in reverse order:  no third-party, no contrib, no home-brew C.    here is a
> stack trace from a fresh build of 9.1 beta 1 built with enable-cassert.

>  Program received signal SIGSEGV, Segmentation fault.
> 0x0000000000744777 in AllocSetAlloc (context=0x13f6d08, size=16) at
> aset.c:639
> 639 Assert(chunk->size >= size);
> #0  0x0000000000744777 in AllocSetAlloc (context=0x13f6d08, size=16) at
> aset.c:639

Hmm, that's pretty interesting.  Possibly it's just another
manifestation of something scribbling past the end of its allocated
chunk, but my credence for that theory is way down if there's no add-on
code involved.  Does the postmaster log show any messages like
    WARNING: detected write past chunk end in ...
before the crash?

            regards, tom lane

Re: postgres segfaulting on pg_restore

From
Chris Curvey
Date:

Hmm, that's pretty interesting.  Possibly it's just another
manifestation of something scribbling past the end of its allocated
chunk, but my credence for that theory is way down if there's no add-on
code involved.  Does the postmaster log show any messages like
       WARNING: detected write past chunk end in ...
before the crash?

                       regards, tom lane

Nope.  Just a bunch of messages saying "checkpoints occurring too frequently", then "server process (PID 123456) was terminated by signal 11:  Segmentation fault"


Re: postgres segfaulting on pg_restore

From
Tom Lane
Date:
Chris Curvey <chris@chriscurvey.com> writes:
>> Hmm, that's pretty interesting.  Possibly it's just another
>> manifestation of something scribbling past the end of its allocated
>> chunk, but my credence for that theory is way down if there's no add-on
>> code involved.  Does the postmaster log show any messages like
>> WARNING: detected write past chunk end in ...
>> before the crash?

> Nope.  Just a bunch of messages saying "checkpoints occurring too
> frequently", then "server process (PID 123456) was terminated by signal 11:
>  Segmentation fault"

Hmph.  Well, I'm out of ideas here.  I think you're going to need to put
some effort into producing a self-contained test case so that others can
debug this ...

            regards, tom lane