Thread: Out of memory (Failed on request size 24)
PostgreSQL 8.0.3 running on AIX 5.3 (same thing happens on 5.1 though). DBMS was running fine for some months but now one of the databases isn't accessible. Any help would be greatly appreciated. DBMS starts up fine, but any operation on the files database (psql files, vaccumdb files, pgdump files) yields the same result.The client responds with > psql files DEBUG: InitPostgres DEBUG: StartTransaction DEBUG: name: unnamed; blockState: DEFAULT; state: INPROGR, xid/subid/cid: 626915/1/0, nestlvl: 1, children: <> psql: FATAL: out of memory DETAIL: Failed on request of size 24. The server log file contains the following LOG: database system was shut down at 2006-11-14 04:38:54 EST LOG: checkpoint record is at 0/19A82450 LOG: redo record is at 0/19A82450; undo record is at 0/0; shutdown TRUE LOG: next transaction ID: 626912; next OID: 54355 LOG: database system is ready DEBUG: proc_exit(0) DEBUG: shmem_exit(0) DEBUG: exit(0) DEBUG: reaping dead processes DEBUG: forked new backend, pid=168724 socket=7 DEBUG: postmaster child[168724]: starting with ( DEBUG: postgres DEBUG: -v196608 DEBUG: -p DEBUG: files DEBUG: ) DEBUG: InitPostgres DEBUG: StartTransaction DEBUG: name: unnamed; blockState: DEFAULT; state: INPROGR, xid/subid/cid: 626912/1/0, nestlvl: 1, children: <> TopMemoryContext: 42416 total in 4 blocks; 12312 free (2 chunks); 30104 used TopTransactionContext: 2145378304 total in 266 blocks; 928 free (14 chunks); 2145377376 used PortalMemory: 0 total in 0 blocks; 0 free (0 chunks); 0 used CacheMemoryContext: 516096 total in 6 blocks; 178752 free (10 chunks); 337344 used pg_operator_oid_index: 1024 total in 1 blocks; 840 free (0 chunks); 184 used pg_amproc_opc_proc_index: 1024 total in 1 blocks; 736 free (0 chunks); 288 used pg_amop_opc_strat_index: 1024 total in 1 blocks; 736 free (0 chunks); 288 used pg_index_indexrelid_index: 1024 total in 1 blocks; 840 free (0 chunks); 184 used pg_attribute_relid_attnum_index: 1024 total in 1 blocks; 744 free (0 chunks); 280 used pg_class_oid_index: 1024 total in 1 blocks; 840 free (0 chunks); 184 used pg_amproc_opc_proc_index: 1024 total in 1 blocks; 736 free (0 chunks); 288 used pg_amop_opc_strat_index: 1024 total in 1 blocks; 736 free (0 chunks); 288 used pg_class_relname_nsp_index: 1024 total in 1 blocks; 744 free (0 chunks); 280 used MdSmgr: 8192 total in 1 blocks; 7808 free (0 chunks); 384 used DynaHash: 8192 total in 1 blocks; 5936 free (0 chunks); 2256 used Operator class cache: 8192 total in 1 blocks; 1968 free (0 chunks); 6224 used smgr relation table: 24576 total in 2 blocks; 16256 free (5 chunks); 8320 used Portal hash: 8192 total in 1 blocks; 4032 free (0 chunks); 4160 used Relcache by OID: 8192 total in 1 blocks; 928 free (0 chunks); 7264 used Relcache by name: 24576 total in 2 blocks; 14208 free (5 chunks); 10368 used LockTable (locallock hash): 24576 total in 2 blocks; 16272 free (6 chunks); 8304 used ErrorContext: 8192 total in 1 blocks; 8160 free (7 chunks); 32 used FATAL: out of memory DETAIL: Failed on request of size 24. DEBUG: proc_exit(0) DEBUG: shmem_exit(0) DEBUG: exit(0) DEBUG: reaping dead processes DEBUG: server process (PID 168724) exited with exit code 0
On Tue, Nov 14, 2006 at 05:53:08AM -0500, Rob Owen wrote: > PostgreSQL 8.0.3 running on AIX 5.3 (same thing happens on 5.1 though). > DBMS was running fine for some months but now one of the databases isn't accessible. Any help would be greatly appreciated. > > DBMS starts up fine, but any operation on the files database (psql files, vaccumdb files, pgdump files) yields the sameresult. The client responds with > > > psql files <snip> Something screwed up: > TopTransactionContext: 2145378304 total in 266 blocks; 928 free (14 chunks); 2145377376 used That's a lot of memory. I thought there was a check on negative sized allocations... Did "make check" pass ok? Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Attachment
Thanks Martijn, I reduced a number of the buffers and connection settings, and added some more tracing and this is the result. The number(TopTransactionContext) is smaller, but still very large. Any reason why this number would suddenly go sky high - thesame system was working fine just a month ago. <2006-11-14 05:48:35 EST>LOG: 00000: database system was shut down at 2006-11-14 05:48:30 EST <2006-11-14 05:48:35 EST>LOCATION: StartupXLOG, xlog.c:4049 <2006-11-14 05:48:35 EST>LOG: 00000: checkpoint record is at 0/19A825B8 <2006-11-14 05:48:35 EST>LOCATION: StartupXLOG, xlog.c:4132 <2006-11-14 05:48:35 EST>LOG: 00000: redo record is at 0/19A825B8; undo record is at 0/0; shutdown TRUE <2006-11-14 05:48:35 EST>LOCATION: StartupXLOG, xlog.c:4160 <2006-11-14 05:48:35 EST>LOG: 00000: next transaction ID: 626916; next OID: 54355 <2006-11-14 05:48:35 EST>LOCATION: StartupXLOG, xlog.c:4163 <2006-11-14 05:48:35 EST>LOG: 00000: database system is ready <2006-11-14 05:48:35 EST>LOCATION: StartupXLOG, xlog.c:4526 <2006-11-14 05:48:35 EST>DEBUG: 00000: proc_exit(0) <2006-11-14 05:48:35 EST>LOCATION: proc_exit, ipc.c:95 <2006-11-14 05:48:35 EST>DEBUG: 00000: shmem_exit(0) <2006-11-14 05:48:35 EST>LOCATION: shmem_exit, ipc.c:126 <2006-11-14 05:48:35 EST>DEBUG: 00000: exit(0) <2006-11-14 05:48:35 EST>LOCATION: proc_exit, ipc.c:113 <2006-11-14 05:48:35 EST>DEBUG: 00000: reaping dead processes <2006-11-14 05:48:35 EST>LOCATION: reaper, postmaster.c:1988 <2006-11-14 05:48:46 EST>DEBUG: 00000: forked new backend, pid=168246 socket=7 <2006-11-14 05:48:46 EST>LOCATION: BackendStartup, postmaster.c:2499 <2006-11-14 05:48:46 EST>DEBUG: 00000: postmaster child[168246]: starting with ( <2006-11-14 05:48:46 EST>LOCATION: BackendRun, postmaster.c:2829 <2006-11-14 05:48:46 EST>DEBUG: 00000: postgres <2006-11-14 05:48:46 EST>LOCATION: BackendRun, postmaster.c:2832 <2006-11-14 05:48:46 EST>DEBUG: 00000: -v196608 <2006-11-14 05:48:46 EST>LOCATION: BackendRun, postmaster.c:2832 <2006-11-14 05:48:46 EST>DEBUG: 00000: -p <2006-11-14 05:48:46 EST>LOCATION: BackendRun, postmaster.c:2832 <2006-11-14 05:48:46 EST>DEBUG: 00000: files <2006-11-14 05:48:46 EST>LOCATION: BackendRun, postmaster.c:2832 <2006-11-14 05:48:46 EST>DEBUG: 00000: ) <2006-11-14 05:48:46 EST>LOCATION: BackendRun, postmaster.c:2834 <2006-11-14 05:48:46 EST>DEBUG: 00000: InitPostgres <2006-11-14 05:48:46 EST>LOCATION: PostgresMain, postgres.c:2719 <2006-11-14 05:48:46 EST>DEBUG: 00000: StartTransaction <2006-11-14 05:48:46 EST>LOCATION: ShowTransactionState, xact.c:3609 <2006-11-14 05:48:46 EST>DEBUG: 00000: name: unnamed; blockState: DEFAULT; state: INPROGR, xid/subid/cid: 626916/ 1/0, nestlvl: 1, children: <> <2006-11-14 05:48:46 EST>LOCATION: ShowTransactionStateRec, xact.c:3634 TopMemoryContext: 32768 total in 3 blocks; 10760 free (3 chunks); 22008 used TopTransactionContext: 1340071936 total in 170 blocks; 928 free (14 chunks); 1340071008 used PortalMemory: 0 total in 0 blocks; 0 free (0 chunks); 0 used CacheMemoryContext: 516096 total in 6 blocks; 178752 free (10 chunks); 337344 used pg_operator_oid_index: 1024 total in 1 blocks; 840 free (0 chunks); 184 used pg_amproc_opc_proc_index: 1024 total in 1 blocks; 736 free (0 chunks); 288 used pg_amop_opc_strat_index: 1024 total in 1 blocks; 736 free (0 chunks); 288 used pg_index_indexrelid_index: 1024 total in 1 blocks; 840 free (0 chunks); 184 used pg_attribute_relid_attnum_index: 1024 total in 1 blocks; 744 free (0 chunks); 280 used pg_class_oid_index: 1024 total in 1 blocks; 840 free (0 chunks); 184 used pg_amproc_opc_proc_index: 1024 total in 1 blocks; 736 free (0 chunks); 288 used pg_amop_opc_strat_index: 1024 total in 1 blocks; 736 free (0 chunks); 288 used pg_class_relname_nsp_index: 1024 total in 1 blocks; 744 free (0 chunks); 280 used MdSmgr: 8192 total in 1 blocks; 7808 free (0 chunks); 384 used DynaHash: 8192 total in 1 blocks; 5936 free (0 chunks); 2256 used Operator class cache: 8192 total in 1 blocks; 1968 free (0 chunks); 6224 used smgr relation table: 24576 total in 2 blocks; 16256 free (5 chunks); 8320 used Portal hash: 8192 total in 1 blocks; 4032 free (0 chunks); 4160 used Relcache by OID: 8192 total in 1 blocks; 928 free (0 chunks); 7264 used Relcache by name: 24576 total in 2 blocks; 14208 free (5 chunks); 10368 used LockTable (locallock hash): 24576 total in 2 blocks; 16272 free (6 chunks); 8304 used ErrorContext: 8192 total in 1 blocks; 8160 free (7 chunks); 32 used <2006-11-14 05:50:03 EST>FATAL: 53200: out of memory <2006-11-14 05:50:03 EST>DETAIL: Failed on request of size 24. <2006-11-14 05:50:03 EST>LOCATION: AllocSetAlloc, aset.c:702 <2006-11-14 05:50:03 EST>DEBUG: 00000: proc_exit(0) <2006-11-14 05:50:03 EST>LOCATION: proc_exit, ipc.c:95 <2006-11-14 05:50:03 EST>DEBUG: 00000: shmem_exit(0) <2006-11-14 05:50:03 EST>LOCATION: shmem_exit, ipc.c:126 <2006-11-14 05:50:03 EST>DEBUG: 00000: exit(0) <2006-11-14 05:50:03 EST>LOCATION: proc_exit, ipc.c:113 <2006-11-14 05:50:03 EST>DEBUG: 00000: reaping dead processes <2006-11-14 05:50:03 EST>LOCATION: reaper, postmaster.c:1988 <2006-11-14 05:50:03 EST>DEBUG: 00000: server process (PID 168246) exited with exit code 0 <2006-11-14 05:50:03 EST>LOCATION: LogChildExit, postmaster.c:2349 <2006-11-14 05:53:49 EST>DEBUG: 00000: postmaster received signal 15 <2006-11-14 05:53:49 EST>LOCATION: pmdie, postmaster.c:1850 <2006-11-14 05:53:49 EST>LOG: 00000: received smart shutdown request <2006-11-14 05:53:49 EST>LOCATION: pmdie, postmaster.c:1865 <2006-11-14 05:53:49 EST>LOG: 00000: shutting down <2006-11-14 05:53:49 EST>LOCATION: ShutdownXLOG, xlog.c:4706 <2006-11-14 05:53:49 EST>DEBUG: 00000: reaping dead processes <2006-11-14 05:53:49 EST>LOCATION: reaper, postmaster.c:1988 <2006-11-14 05:53:49 EST>LOG: 00000: database system is shut down <2006-11-14 05:53:49 EST>LOCATION: ShutdownXLOG, xlog.c:4715 <2006-11-14 05:53:49 EST>DEBUG: 00000: proc_exit(0) <2006-11-14 05:53:49 EST>LOCATION: proc_exit, ipc.c:95 <2006-11-14 05:53:49 EST>DEBUG: 00000: shmem_exit(0) <2006-11-14 05:53:49 EST>LOCATION: shmem_exit, ipc.c:126 <2006-11-14 05:53:49 EST>DEBUG: 00000: exit(0) <2006-11-14 05:53:49 EST>LOCATION: proc_exit, ipc.c:113 <2006-11-14 05:53:49 EST>DEBUG: 00000: reaping dead processes <2006-11-14 05:53:49 EST>LOCATION: reaper, postmaster.c:1988 <2006-11-14 05:53:49 EST>DEBUG: 00000: proc_exit(0) <2006-11-14 05:53:49 EST>LOCATION: proc_exit, ipc.c:95 <2006-11-14 05:53:49 EST>DEBUG: 00000: shmem_exit(0) <2006-11-14 05:53:49 EST>LOCATION: shmem_exit, ipc.c:126 <2006-11-14 05:53:49 EST>DEBUG: 00000: exit(0) <2006-11-14 05:53:49 EST>LOCATION: proc_exit, ipc.c:113 <2006-11-14 05:53:49 EST>LOG: 00000: logger shutting down <2006-11-14 05:53:49 EST>LOCATION: SysLoggerMain, syslogger.c:361 <2006-11-14 05:53:49 EST>DEBUG: 00000: proc_exit(0) <2006-11-14 05:53:49 EST>LOCATION: proc_exit, ipc.c:95 <2006-11-14 05:53:49 EST>DEBUG: 00000: shmem_exit(0) <2006-11-14 05:53:49 EST>LOCATION: shmem_exit, ipc.c:126 <2006-11-14 05:53:49 EST>DEBUG: 00000: exit(0) <2006-11-14 05:53:49 EST>LOCATION: proc_exit, ipc.c:113 -----Original Message----- From: Martijn van Oosterhout [mailto:kleptog@svana.org] Sent: Tuesday, November 14, 2006 6:44 AM To: Rob Owen Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] Out of memory (Failed on request size 24) On Tue, Nov 14, 2006 at 05:53:08AM -0500, Rob Owen wrote: > PostgreSQL 8.0.3 running on AIX 5.3 (same thing happens on 5.1 though). > DBMS was running fine for some months but now one of the databases isn't accessible. Any help would be greatly appreciated. > > DBMS starts up fine, but any operation on the files database (psql > files, vaccumdb files, pgdump files) yields the same result. The > client responds with > > > psql files <snip> Something screwed up: > TopTransactionContext: 2145378304 total in 266 blocks; 928 free (14 > chunks); 2145377376 used That's a lot of memory. I thought there was a check on negative sized allocations... Did "make check" pass ok? Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
"Rob Owen" <Rob.Owen@sas.com> writes: > PostgreSQL 8.0.3 running on AIX 5.3 (same thing happens on 5.1 though). > DBMS was running fine for some months but now one of the databases isn't accessible. Any help would be greatly appreciated. Just one database? Sounds like it might be corrupt data in that database's system catalogs. Can you get a stack trace from the point of the error to help us narrow it down? The way I usually debug startup-time failures is: export PGOPTIONS="-W 30" psql ... Now I have 30 seconds to identify the PID of the backend process in another window and do (as the postgres user) gdb /path/to/postgres PID Once you've got gdb control of the backend, do gdb> break errfinish gdb> cont ... wait for the timeout to finish elapsing, if needed ... Once gdb reports that the breakpoint has been reached, say gdb> bt ... useful info here... gdb> cont regards, tom lane
Attached to backend postmaster and got the following. Hope this helps. Attaching to program: /nfs/silence/bigdisk/eurrow/pgsql/bin/postmaster, process 170422 [Switching to Thread 1] 0x000000000000377c in ?? () (gdb) break errfinish Breakpoint 1 at 0x1000019dc (gdb) cont Continuing. [Switching to Thread 1] Breakpoint 1, 0x00000001000019dc in errfinish () (gdb) bt #0 0x00000001000019dc in errfinish () #1 0x00000001002920d0 in reaper () #2 <signal handler called> #3 0x0fffffffffffd810 in ?? () Cannot access memory at address 0x203fe94000000000 (gdb) cont Continuing. Breakpoint 1, 0x00000001000019dc in errfinish () (gdb) bt #0 0x00000001000019dc in errfinish () #1 0x0000000100292680 in LogChildExit () #2 0x00000001002971a8 in CleanupBackend () #3 0x00000001002923d0 in reaper () #4 <signal handler called> #5 0x0fffffffffffd810 in ?? () Cannot access memory at address 0x203fe94000000000 (gdb) cont Continuing. -----Original Message----- From: Tom Lane [mailto:tgl@sss.pgh.pa.us] Sent: Tuesday, November 14, 2006 11:01 AM To: Rob Owen Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] Out of memory (Failed on request size 24) "Rob Owen" <Rob.Owen@sas.com> writes: > PostgreSQL 8.0.3 running on AIX 5.3 (same thing happens on 5.1 though). > DBMS was running fine for some months but now one of the databases isn't accessible. Any help would be greatly appreciated. Just one database? Sounds like it might be corrupt data in that database's system catalogs. Can you get a stack trace fromthe point of the error to help us narrow it down? The way I usually debug startup-time failures is: export PGOPTIONS="-W 30" psql ... Now I have 30 seconds to identify the PID of the backend process in another window and do (as the postgres user) gdb /path/to/postgres PID Once you've got gdb control of the backend, do gdb> break errfinish gdb> cont ... wait for the timeout to finish elapsing, if needed ... Once gdb reports that the breakpoint has been reached, say gdb> bt ... useful info here... gdb> cont regards, tom lane
"Rob Owen" <Rob.Owen@sas.com> writes: > Attached to backend postmaster and got the following. Hope this helps. Nope, you got the postmaster itself there, you need to look at the new child process. (It should look like "postgres: startup" in ps.) regards, tom lane
Breakpoint 1, 0x00000001000019dc in errfinish () from postmaster (gdb) bt #0 0x00000001000019dc in errfinish () from postmaster #1 0x000000010000a680 in AllocSetAlloc () from postmaster #2 0x0000000100002a1c in MemoryContextAlloc () from postmaster #3 0x0000000100108c28 in _bt_search () from postmaster #4 0x0000000100106484 in _bt_first () from postmaster #5 0x00000001001045b4 in btgettuple () from postmaster #6 0x0000000100029fb0 in FunctionCall2 () from postmaster #7 0x00000001000295f8 in index_getnext () from postmaster #8 0x000000010002942c in systable_getnext () from postmaster #9 0x000000010000f9a0 in ScanPgRelation () from postmaster #10 0x0000000100011088 in RelationBuildDesc () from postmaster #11 0x000000010000e6fc in RelationSysNameGetRelation () from postmaster #12 0x000000010000e620 in relation_openr () from postmaster #13 0x000000010000e44c in heap_openr () from postmaster #14 0x0000000100041044 in RelationBuildTriggers () from postmaster #15 0x00000001000111a4 in RelationBuildDesc () from postmaster #16 0x000000010000e6fc in RelationSysNameGetRelation () from postmaster #17 0x000000010000e620 in relation_openr () from postmaster #18 0x000000010000e44c in heap_openr () from postmaster #19 0x000000010000e1c8 in CatalogCacheInitializeCache () from postmaster #20 0x000000010000dab8 in SearchCatCache () from postmaster #21 0x000000010000da3c in SearchSysCache () from postmaster #22 0x000000010028b570 in InitializeSessionUserId () from postmaster #23 0x0000000100288ae8 in InitPostgres () from postmaster #24 0x000000010029e2a8 in PostgresMain () from postmaster #25 0x00000001002990e0 in BackendRun () from postmaster #26 0x0000000100298758 in BackendStartup () from postmaster #27 0x0000000100297db0 in ServerLoop () from postmaster #28 0x0000000100009b90 in PostmasterMain () from postmaster #29 0x0000000100000680 in main () from postmaster #30 0x000000010000028c in __start () from postmaster (gdb) cont Continuing. Breakpoint 1, 0x00000001000019dc in errfinish () from postmaster (gdb) bt #0 0x00000001000019dc in errfinish () from postmaster #1 0x0000000100002c58 in elog_finish () from postmaster #2 0x0000000100007aa8 in proc_exit () from postmaster #3 0x0000000100001c5c in errfinish () from postmaster #4 0x000000010000a680 in AllocSetAlloc () from postmaster #5 0x0000000100002a1c in MemoryContextAlloc () from postmaster #6 0x0000000100108c28 in _bt_search () from postmaster #7 0x0000000100106484 in _bt_first () from postmaster #8 0x00000001001045b4 in btgettuple () from postmaster #9 0x0000000100029fb0 in FunctionCall2 () from postmaster #10 0x00000001000295f8 in index_getnext () from postmaster #11 0x000000010002942c in systable_getnext () from postmaster #12 0x000000010000f9a0 in ScanPgRelation () from postmaster #13 0x0000000100011088 in RelationBuildDesc () from postmaster #14 0x000000010000e6fc in RelationSysNameGetRelation () from postmaster #15 0x000000010000e620 in relation_openr () from postmaster #16 0x000000010000e44c in heap_openr () from postmaster #17 0x0000000100041044 in RelationBuildTriggers () from postmaster #18 0x00000001000111a4 in RelationBuildDesc () from postmaster #19 0x000000010000e6fc in RelationSysNameGetRelation () from postmaster #20 0x000000010000e620 in relation_openr () from postmaster #21 0x000000010000e44c in heap_openr () from postmaster #22 0x000000010000e1c8 in CatalogCacheInitializeCache () from postmaster #23 0x000000010000dab8 in SearchCatCache () from postmaster #24 0x000000010000da3c in SearchSysCache () from postmaster #25 0x000000010028b570 in InitializeSessionUserId () from postmaster #26 0x0000000100288ae8 in InitPostgres () from postmaster #27 0x000000010029e2a8 in PostgresMain () from postmaster #28 0x00000001002990e0 in BackendRun () from postmaster #29 0x0000000100298758 in BackendStartup () from postmaster #30 0x0000000100297db0 in ServerLoop () from postmaster #31 0x0000000100009b90 in PostmasterMain () from postmaster #32 0x0000000100000680 in main () from postmaster #33 0x000000010000028c in __start () from postmaster (gdb) cont Continuing. Breakpoint 1, 0x00000001000019dc in errfinish () from postmaster (gdb) bt #0 0x00000001000019dc in errfinish () from postmaster #1 0x0000000100002c58 in elog_finish () from postmaster #2 0x0000000100007bcc in shmem_exit () from postmaster #3 0x0000000100007ab4 in proc_exit () from postmaster #4 0x0000000100001c5c in errfinish () from postmaster #5 0x000000010000a680 in AllocSetAlloc () from postmaster #6 0x0000000100002a1c in MemoryContextAlloc () from postmaster #7 0x0000000100108c28 in _bt_search () from postmaster #8 0x0000000100106484 in _bt_first () from postmaster #9 0x00000001001045b4 in btgettuple () from postmaster #10 0x0000000100029fb0 in FunctionCall2 () from postmaster #11 0x00000001000295f8 in index_getnext () from postmaster #12 0x000000010002942c in systable_getnext () from postmaster #13 0x000000010000f9a0 in ScanPgRelation () from postmaster #14 0x0000000100011088 in RelationBuildDesc () from postmaster #15 0x000000010000e6fc in RelationSysNameGetRelation () from postmaster #16 0x000000010000e620 in relation_openr () from postmaster #17 0x000000010000e44c in heap_openr () from postmaster #18 0x0000000100041044 in RelationBuildTriggers () from postmaster #19 0x00000001000111a4 in RelationBuildDesc () from postmaster #20 0x000000010000e6fc in RelationSysNameGetRelation () from postmaster #21 0x000000010000e620 in relation_openr () from postmaster #22 0x000000010000e44c in heap_openr () from postmaster #23 0x000000010000e1c8 in CatalogCacheInitializeCache () from postmaster #24 0x000000010000dab8 in SearchCatCache () from postmaster #25 0x000000010000da3c in SearchSysCache () from postmaster #26 0x000000010028b570 in InitializeSessionUserId () from postmaster #27 0x0000000100288ae8 in InitPostgres () from postmaster #28 0x000000010029e2a8 in PostgresMain () from postmaster #29 0x00000001002990e0 in BackendRun () from postmaster #30 0x0000000100298758 in BackendStartup () from postmaster #31 0x0000000100297db0 in ServerLoop () from postmaster #32 0x0000000100009b90 in PostmasterMain () from postmaster #33 0x0000000100000680 in main () from postmaster #34 0x000000010000028c in __start () from postmaster (gdb) cont Continuing. Breakpoint 1, 0x00000001000019dc in errfinish () from postmaster (gdb) bt #0 0x00000001000019dc in errfinish () from postmaster #1 0x0000000100002c58 in elog_finish () from postmaster #2 0x0000000100007b3c in proc_exit () from postmaster #3 0x0000000100001c5c in errfinish () from postmaster #4 0x000000010000a680 in AllocSetAlloc () from postmaster #5 0x0000000100002a1c in MemoryContextAlloc () from postmaster #6 0x0000000100108c28 in _bt_search () from postmaster #7 0x0000000100106484 in _bt_first () from postmaster #8 0x00000001001045b4 in btgettuple () from postmaster #9 0x0000000100029fb0 in FunctionCall2 () from postmaster #10 0x00000001000295f8 in index_getnext () from postmaster #11 0x000000010002942c in systable_getnext () from postmaster #12 0x000000010000f9a0 in ScanPgRelation () from postmaster #13 0x0000000100011088 in RelationBuildDesc () from postmaster #14 0x000000010000e6fc in RelationSysNameGetRelation () from postmaster #15 0x000000010000e620 in relation_openr () from postmaster #16 0x000000010000e44c in heap_openr () from postmaster #17 0x0000000100041044 in RelationBuildTriggers () from postmaster #18 0x00000001000111a4 in RelationBuildDesc () from postmaster #19 0x000000010000e6fc in RelationSysNameGetRelation () from postmaster #20 0x000000010000e620 in relation_openr () from postmaster #21 0x000000010000e44c in heap_openr () from postmaster #22 0x000000010000e1c8 in CatalogCacheInitializeCache () from postmaster #23 0x000000010000dab8 in SearchCatCache () from postmaster #24 0x000000010000da3c in SearchSysCache () from postmaster #25 0x000000010028b570 in InitializeSessionUserId () from postmaster #26 0x0000000100288ae8 in InitPostgres () from postmaster #27 0x000000010029e2a8 in PostgresMain () from postmaster #28 0x00000001002990e0 in BackendRun () from postmaster #29 0x0000000100298758 in BackendStartup () from postmaster #30 0x0000000100297db0 in ServerLoop () from postmaster #31 0x0000000100009b90 in PostmasterMain () from postmaster #32 0x0000000100000680 in main () from postmaster #33 0x000000010000028c in __start () from postmaster (gdb) cont Continuing. Program exited normally. (gdb) -----Original Message----- From: Tom Lane [mailto:tgl@sss.pgh.pa.us] Sent: Tuesday, November 14, 2006 11:01 AM To: Rob Owen Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] Out of memory (Failed on request size 24) "Rob Owen" <Rob.Owen@sas.com> writes: > PostgreSQL 8.0.3 running on AIX 5.3 (same thing happens on 5.1 though). > DBMS was running fine for some months but now one of the databases isn't accessible. Any help would be greatly appreciated. Just one database? Sounds like it might be corrupt data in that database's system catalogs. Can you get a stack trace fromthe point of the error to help us narrow it down? The way I usually debug startup-time failures is: export PGOPTIONS="-W 30" psql ... Now I have 30 seconds to identify the PID of the backend process in another window and do (as the postgres user) gdb /path/to/postgres PID Once you've got gdb control of the backend, do gdb> break errfinish gdb> cont ... wait for the timeout to finish elapsing, if needed ... Once gdb reports that the breakpoint has been reached, say gdb> bt ... useful info here... gdb> cont regards, tom lane
"Rob Owen" <Rob.Owen@sas.com> writes: > Breakpoint 1, 0x00000001000019dc in errfinish () from postmaster > (gdb) bt > #0 0x00000001000019dc in errfinish () from postmaster > #1 0x000000010000a680 in AllocSetAlloc () from postmaster > #2 0x0000000100002a1c in MemoryContextAlloc () from postmaster > #3 0x0000000100108c28 in _bt_search () from postmaster > #4 0x0000000100106484 in _bt_first () from postmaster > #5 0x00000001001045b4 in btgettuple () from postmaster > #6 0x0000000100029fb0 in FunctionCall2 () from postmaster > #7 0x00000001000295f8 in index_getnext () from postmaster > #8 0x000000010002942c in systable_getnext () from postmaster > #9 0x000000010000f9a0 in ScanPgRelation () from postmaster > #10 0x0000000100011088 in RelationBuildDesc () from postmaster > #11 0x000000010000e6fc in RelationSysNameGetRelation () from postmaster I think you are in luck: this looks like the corrupted data is in one of the indexes on pg_class, so you should be able to recover by reindexing. See the man page for REINDEX for the gory details of doing this (you need the "ignore system indexes" option, and maybe some other pushups depending on your Postgres version). regards, tom lane
Thanks Tom. It's all working again now. -----Original Message----- From: Tom Lane [mailto:tgl@sss.pgh.pa.us] Sent: Tuesday, November 14, 2006 6:10 PM To: Rob Owen Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] Out of memory (Failed on request size 24) "Rob Owen" <Rob.Owen@sas.com> writes: > Breakpoint 1, 0x00000001000019dc in errfinish () from postmaster > (gdb) bt > #0 0x00000001000019dc in errfinish () from postmaster > #1 0x000000010000a680 in AllocSetAlloc () from postmaster > #2 0x0000000100002a1c in MemoryContextAlloc () from postmaster > #3 0x0000000100108c28 in _bt_search () from postmaster > #4 0x0000000100106484 in _bt_first () from postmaster > #5 0x00000001001045b4 in btgettuple () from postmaster > #6 0x0000000100029fb0 in FunctionCall2 () from postmaster > #7 0x00000001000295f8 in index_getnext () from postmaster > #8 0x000000010002942c in systable_getnext () from postmaster > #9 0x000000010000f9a0 in ScanPgRelation () from postmaster #10 > 0x0000000100011088 in RelationBuildDesc () from postmaster > #11 0x000000010000e6fc in RelationSysNameGetRelation () from > postmaster I think you are in luck: this looks like the corrupted data is in one of the indexes on pg_class, so you should be able torecover by reindexing. See the man page for REINDEX for the gory details of doing this (you need the "ignore system indexes" option, and maybe someother pushups depending on your Postgres version). regards, tom lane