Tips for investigating DBMS failure - Mailing list pgsql-general
From | Karl DeBisschop |
---|---|
Subject | Tips for investigating DBMS failure |
Date | |
Msg-id | 39EB31BB.A18C87C7@alert.infoplease.com Whole thread Raw |
Responses |
Re: Tips for investigating DBMS failure
|
List | pgsql-general |
We had a production DBMS die in a failrly dramitica way this morning. I suspect the cause may have been interactions between lock and pg_dump, but I'm a long way off from being able to say that with any certainty at all. In the end, I moved the data directroy off to the side and restored all of if the data into a newly initiallized data directory. The databases are backed up fairly frequently, but there is likely to be some data since the previous backup that I'd like to recover. I have tried starting a postmaster on a different port to get the data, but I just can't seem to make any headway. Postmater runs, and I can get copious debugging text, but psql always complains that the backend failed unexpectedly whenever I try to make any sort of connection. Can anyone give me general tips on how I might recover this data? (PostgreSQL 7.0-2, redhat 6.2smp on a dual processor netfinity). If it's of any use, I have attached the debug output - I don't see where the problem is occurring, but I don't pretend to fully understand the output, other than the exit is status 25. Someone with just a little bit of insight would probably do alot better with this than I. >>>>> debug postmaster: ServerLoop: handling reading 4 postmaster: ServerLoop: handling reading 4 postmaster: ServerLoop: handling writing 4 postmaster: BackendStartup: environ dump: ----------------------------------------- USERNAME= HISTSIZE=1000 HISTFILESIZE=1000 SSH_TTY=/dev/pts/2 MAIL=/var/spool/mail/kdebisschop TERM=xterm HOSTTYPE=i386 PATH=/usr/bin:/bin:/usr/sbin:/sbin:/usr/X11R6/bin: INPUTRC=/etc/inputrc SHELL=/bin/bash PS1=[\u@\h \W]\$ USER=postgres OSTYPE=Linux SHLVL=1 _=/usr/bin/sudo SUDO_COMMAND=/usr/bin/postmaster -d 99 -D /disk/1/pgsql/data1 -p 5433 POSTPORT=5433 POSTID=2147483644 PGDATA=/disk/1/pgsql/data1 IPC_KEY=5433100 PG_USER=postgres ----------------------------------------- postmaster: BackendStartup: pid 4532 user postgres db template1 socket 4 postmaster child[4532]: starting with (/usr/bin/postgres -d99 -v131072 -p template1 ) 001016.12:36:23.302 [4532] read_pg_options: verbose=2,query=2,syslog=0 FindExec: found "/usr/bin/postgres" using argv[0] 001016.12:36:23.303 [4532] started: host=localhost user=postgres database=template1 001016.12:36:23.303 [4532] InitPostgres 001016.12:36:23.333 [4532] reset_client_encoding().. 001016.12:36:23.333 [4532] reset_client_encoding() done. 001016.12:36:23.333 [4532] StartTransactionCommand 001016.12:36:23.333 [4532] query: select getdatabaseencoding() 001016.12:36:23.337 [4532] parser outputs: { QUERY :command 1 :utility <> :resultRelation 0 :into <> :isPortal false :isBinary false :isTemp false :unionall false:distinctClause <> :sortClause <> :rtable <> :targetlist ({ TARGETENTRY :resdom { RESDOM :resno 1 :restype 19 :restypmod-1 :resname getdatabaseencoding :reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk false } :expr { EXPR :typeOid19 :opType func :oper { FUNC :funcid 1039 :functype 19 :funcisindex false :funcsize 0 :func_fcache @ 0x0 :func_tlist({ TARGETENTRY :resdom { RESDOM :resno 1 :restype 19 :restypmod -1 :resname \<noname> :reskey 0 :reskeyop 0 :ressortgroupref0 :resjunk false } :expr { VAR :varno -1 :varattno 1 :vartype 19 :vartypmod -1 :varlevelsup 0 :varnoold-1 :varoattno 1}}) :func_planlist <>} :args <>}}) :qual <> :groupClause <> :havingQual <> :hasAggs false :hasSubLinksfalse :unionClause <> :intersectClause <> :limitOffset <> :limitCount <> :rowMark <>} 001016.12:36:23.337 [4532] after rewriting: { QUERY :command 1 :utility <> :resultRelation 0 :into <> :isPortal false :isBinary false :isTemp false :unionall false :distinctClause <> :sortClause <> :rtable <> :targetlist ( { TARGETENTRY :resdom { RESDOM :resno 1 :restype 19 :restypmod -1 :resname getdatabaseencoding :reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk false } :expr { EXPR :typeOid 19 :opType func :oper { FUNC :funcid 1039 :functype 19 :funcisindex false :funcsize 0 :func_fcache @ 0x0 :func_tlist ( { TARGETENTRY :resdom { RESDOM :resno 1 :restype 19 :restypmod -1 :resname \<noname> :reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk false } :expr { VAR :varno -1 :varattno 1 :vartype 19 :vartypmod -1 :varlevelsup 0 :varnoold -1 :varoattno 1 } } ) :func_planlist <> } :args <> } } ) :qual <> :groupClause <> :havingQual <> :hasAggs false :hasSubLinks false :unionClause <> :intersectClause <> :limitOffset <> :limitCount <> :rowMark <> } 001016.12:36:23.339 [4532] plan: { RESULT :startup_cost 0.00 :total_cost 0.00 :rows 0 :width 0 :state <> :qptargetlist ({ TARGETENTRY :resdom { RESDOM :resno1 :restype 19 :restypmod -1 :resname getdatabaseencoding :reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk false }:expr { EXPR :typeOid 19 :opType func :oper { FUNC :funcid 1039 :functype 19 :funcisindex false :funcsize 0 :func_fcache@ 0x0 :func_tlist ({ TARGETENTRY :resdom { RESDOM :resno 1 :restype 19 :restypmod -1 :resname \<noname> :reskey0 :reskeyop 0 :ressortgroupref 0 :resjunk false } :expr { VAR :varno -1 :varattno 1 :vartype 19 :vartypmod -1 :varlevelsup0 :varnoold -1 :varoattno 1}}) :func_planlist <>} :args <>}}) :qpqual <> :lefttree <> :righttree <> :extprm ():locprm () :initplan <> :nprm 0 :resconstantqual <>} 001016.12:36:23.339 [4532] ProcessQuery 001016.12:36:23.339 [4532] CommitTransactionCommand postmaster: reaping dead processes... postmaster: CleanupProc: pid 4532 exited with status 25 Server process (pid 4532) exited with status 25 at Mon Oct 16 12:36:23 2000 Terminating any active server processes... Server processes were terminated at Mon Oct 16 12:36:23 2000 Reinitializing shared memory and semaphores 001016.12:36:23.340 [4508] shmem_exit(0) binding ShmemCreate(key=52e771, size=1104896) 001016.12:36:23.343 [4533] DEBUG: Data Base System is starting up at Mon Oct 16 12:36:23 2000 001016.12:36:23.343 [4533] DEBUG: Data Base System was interrupted being in production at Mon Oct 16 12:35:44 2000 001016.12:36:23.365 [4533] DEBUG: Data Base System is in production state at Mon Oct 16 12:36:23 2000 001016.12:36:23.365 [4533] proc_exit(0) 001016.12:36:23.365 [4533] shmem_exit(0) 001016.12:36:23.365 [4533] exit(0) postmaster: reaping dead processes... <<<<<<<< debug Many thanks for any insight. -- Karl DeBisschop kdebisschop@alert.infoplease.com Learning Network Reference http://www.infoplease.com Netsaint Plugin Developer kdebisschop@users.sourceforge.net
pgsql-general by date: