Thread: Postmaster cannot start
Hi,
my postgresql version 7.4.5 cannot start until this morning, when i check the log file, it give me this error
this is part of my log file
DEBUG: vacuuming "pg_catalog.pg_class"
DEBUG: "pg_class": found 9823 removable, 1017 nonremovable row versions in 205 pages
DETAIL: 0 dead row versions cannot be removed yet.
Nonremovable row versions range from 148 to 196 bytes long.
There were 20 unused item pointers.
Total free space (including removable row versions) is 1477460 bytes.
183 pages are or will become empty, including 183 at the end of the table.
9 pages containing 19260 free bytes are potential move destinations.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
DEBUG: index "pg_class_oid_index" now contains 1017 row versions in 35 pages
DETAIL: 9823 index row versions were removed.
28 index pages have been deleted, 28 are currently reusable.
CPU 0.01s/0.00u sec elapsed 0.02 sec.
DEBUG: index "pg_class_relname_nsp_index" now contains 726 row versions in 197 pages
DETAIL: 6124 index row versions were removed.
126 index pages have been deleted, 126 are currently reusable.
CPU 0.00s/0.03u sec elapsed 0.18 sec.
WARNING: index "pg_class_relname_nsp_index" contains 726 row versions, but table contains 1017 row versions
HINT: Rebuild the index with REINDEX.
DEBUG: "pg_class": moved 57 row versions, truncated 205 to 20 pages
DETAIL: CPU 0.00s/0.00u sec elapsed 0.34 sec.
DEBUG: index "pg_class_oid_index" now contains 1017 row versions in 35 pages
DETAIL: 57 index row versions were removed.
28 index pages have been deleted, 28 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
ERROR: could not open segment 1 of relation "pg_class_relname_nsp_index" (target block 1183847760): No such file or directory
DEBUG: AbortCurrentTransaction
PANIC: cannot abort transaction 14135438, it was already committed
DEBUG: child process (PID 4834) was terminated by signal 6
LOG: server process (PID 4834) was terminated by signal 6
LOG: terminating any other active server processes
DEBUG: sending SIGQUIT to process 3337
DEBUG: sending SIGQUIT to process 3336
DEBUG: sending SIGQUIT to process 3335
WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and repeat your command.
WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
DEBUG: "pg_class": found 9823 removable, 1017 nonremovable row versions in 205 pages
DETAIL: 0 dead row versions cannot be removed yet.
Nonremovable row versions range from 148 to 196 bytes long.
There were 20 unused item pointers.
Total free space (including removable row versions) is 1477460 bytes.
183 pages are or will become empty, including 183 at the end of the table.
9 pages containing 19260 free bytes are potential move destinations.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
DEBUG: index "pg_class_oid_index" now contains 1017 row versions in 35 pages
DETAIL: 9823 index row versions were removed.
28 index pages have been deleted, 28 are currently reusable.
CPU 0.01s/0.00u sec elapsed 0.02 sec.
DEBUG: index "pg_class_relname_nsp_index" now contains 726 row versions in 197 pages
DETAIL: 6124 index row versions were removed.
126 index pages have been deleted, 126 are currently reusable.
CPU 0.00s/0.03u sec elapsed 0.18 sec.
WARNING: index "pg_class_relname_nsp_index" contains 726 row versions, but table contains 1017 row versions
HINT: Rebuild the index with REINDEX.
DEBUG: "pg_class": moved 57 row versions, truncated 205 to 20 pages
DETAIL: CPU 0.00s/0.00u sec elapsed 0.34 sec.
DEBUG: index "pg_class_oid_index" now contains 1017 row versions in 35 pages
DETAIL: 57 index row versions were removed.
28 index pages have been deleted, 28 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
ERROR: could not open segment 1 of relation "pg_class_relname_nsp_index" (target block 1183847760): No such file or directory
DEBUG: AbortCurrentTransaction
PANIC: cannot abort transaction 14135438, it was already committed
DEBUG: child process (PID 4834) was terminated by signal 6
LOG: server process (PID 4834) was terminated by signal 6
LOG: terminating any other active server processes
DEBUG: sending SIGQUIT to process 3337
DEBUG: sending SIGQUIT to process 3336
DEBUG: sending SIGQUIT to process 3335
WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and repeat your command.
WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
Can someone please explain to me what happen to my database, cause i not so clear,
i have a daily maintenance routine that always trigger at 4am morning.
1) will run end of day job. (every day)
2) will run vacuum full database. (every day)
3) will run analyze database. (every day)
4) will run reindex database. (every day)
saw from the log file, it's possible that server crash during vacuum process...
Question :
1) what happen to my database server? what the error meaning?
2) it is my daily maintenance routine causing that?
3) it is ok to run vacuum full every day ?
4) how can i solve this problem?
Thanks
Regards
Beh
""Chun Yit(Chronos)"" <ivanbeh@chronos.com.my> wrote > saw from the log file, it's possible that server crash during > vacuum process... > > Question : > 1) what happen to my database server? what the error meaning? > It looks like index "pg_class_relname_nsp_index" (which is an index on pg_class) is corrupted. But not sure why it reports the following error message (which looks like a post-commit cleanup caused error): DEBUG: AbortCurrentTransaction PANIC: cannot abort transaction 14135438, it was already committed > 2) it is my daily maintenance routine causing that? No. > 3) it is ok to run vacuum full every day ? Yes. There is no apparent bad side effect of vacuum full except that it would block other accesses on the target table. But since you do it per day(guess you do it on a nightly schedule), so this should not be a problem. > 4) how can i solve this problem? > The base table pg_class should be ok(pg_class_oid_ind indicates both have the same cardinality). Try to reindex pg_class as the superuser. Regards, Qingqing
"Qingqing Zhou" <zhouqq@cs.toronto.edu> writes: > But not sure why it reports the following error > message (which looks like a post-commit cleanup caused error): > DEBUG: AbortCurrentTransaction > PANIC: cannot abort transaction 14135438, it was already committed I think this is an artifact of the fact that VACUUM FULL commits its own transaction before it starts the final index cleanup pass. We ought to think of a better way to handle that sometime. I don't recall having seen a PANIC like this reported before, but on reflection it seems like this would be guaranteed to happen for any ERROR condition occurring during that last pass. An error there would be pretty improbable, but surely not impossible. As for the OP's problem, it seems pretty suspicious that we got through one cycle of vacuuming pg_class_relname_nsp_index and then the second one failed with what seems to be a bad block link. If that bad link was there before, why didn't it fail the first time through? I'm wondering about flaky hardware ... regards, tom lane
>> 4) how can i solve this problem? >> > The base table pg_class should be ok(pg_class_oid_ind indicates both have > the same cardinality). Try to reindex pg_class as the superuser. > >> >but not i not be able to reindex the table because i cannot start the >postmaster. >postmaster give me error every time i try to start it >this is the error msg DEBUG: found "/usr/local/pgsql/bin/postgres" using argv[0] LOG: could not create IPv6 socket: Address family not supported by protocol DEBUG: max_safe_fds = 985, usable_fds = 1019, already_open = 5 DEBUG: found "/usr/local/pgsql/bin/postmaster" using argv[0] LOG: could not resolve "localhost": Temporary failure in name resolution LOG: database system was interrupted while in recovery at 2006-05-15 04:14:39 MYT HINT: This probably means that some data is corrupted and you will have to use the last backup for recovery. LOG: checkpoint record is at A/46315F10 LOG: redo record is at A/46315F10; undo record is at 0/0; shutdown FALSE LOG: next transaction ID: 14135366; next OID: 51809028 LOG: database system was not properly shut down; automatic recovery in progress LOG: redo starts at A/46315F50 PANIC: btree_delete_page_redo: uninitialized right sibling LOG: startup process (PID 5043) was terminated by signal 6 LOG: aborting startup due to startup process failure >Regards >Beh
> "Qingqing Zhou" <zhouqq@cs.toronto.edu> writes: >> But not sure why it reports the following error >> message (which looks like a post-commit cleanup caused error): > >> DEBUG: AbortCurrentTransaction >> PANIC: cannot abort transaction 14135438, it was already committed > > I think this is an artifact of the fact that VACUUM FULL commits its own > transaction before it starts the final index cleanup pass. We ought to > think of a better way to handle that sometime. I don't recall having > seen a PANIC like this reported before, but on reflection it seems like > this would be guaranteed to happen for any ERROR condition occurring > during that last pass. An error there would be pretty improbable, but > surely not impossible. > > >>>I have check my past postgresql log file, i did not see this error come >>>out (i did vacuum full every night). > > > As for the OP's problem, it seems pretty suspicious that we got through > one cycle of vacuuming pg_class_relname_nsp_index and then the second > one failed with what seems to be a bad block link. If that bad link was > there before, why didn't it fail the first time through? I'm wondering > about flaky hardware ... > >>>Any tool to check am i having a flaky hardware? >>> >>>Regards >>>Beh
"Chun Yit\(Chronos\)" <ivanbeh@chronos.com.my> writes: > PANIC: btree_delete_page_redo: uninitialized right sibling > LOG: startup process (PID 5043) was terminated by signal 6 > LOG: aborting startup due to startup process failure That's pretty ugly :-(. I think your only hope to get out of it is to use pg_resetxlog. After that, you will undoubtedly need to reindex pg_class in the problem database. After that, if you're very lucky, you'll be able to pg_dump. If you can get a clean dump, then initdb and reload --- it would not be wise to keep using the database. I'd suggest first taking a filesystem backup of the $PGDATA directory as it stands, so you can get back to where you are if this fails to work. Assuming you can get back up and running, you should look into (a) why you are running an old, known-buggy PG release, and (b) whether you have flaky hardware, as per comments already made upthread. regards, tom lane
""Chun Yit(Chronos)"" <ivanbeh@chronos.com.my> wrote > postmaster give me error every time i try to start it > LOG: redo starts at A/46315F50 > PANIC: btree_delete_page_redo: uninitialized right sibling > So the last resort I can think of is to use pg_resetxlog to pass the startup failure -- but no gaurantee that your database is still consistent after that :-(. So before you do that, make a copy of your database for backup. Maybe a deeper look into the problem is not needed, since 7.4.5 is too old, and try to use some newer version. Regards, Qingqing
> What i done is i use the last backup file and restore it on new $PGData > Directory, now it > ok. just curious how this thing can happen? maybe i asking a stupid > question since i using > quite old Postgresql Version but hoping this problem won't come back > again. > Thanks for your help to clear some doubt on my mind. > Regards > Beh ----- Original Message ----- From: "Tom Lane" <tgl@sss.pgh.pa.us> To: "Chun Yit(Chronos)" <ivanbeh@chronos.com.my> Cc: <pgsql-general@postgresql.org> Sent: Tuesday, May 16, 2006 1:03 PM Subject: Re: [GENERAL] Postmaster cannot start > "Chun Yit\(Chronos\)" <ivanbeh@chronos.com.my> writes: >> PANIC: btree_delete_page_redo: uninitialized right sibling >> LOG: startup process (PID 5043) was terminated by signal 6 >> LOG: aborting startup due to startup process failure > > That's pretty ugly :-(. I think your only hope to get out of it is to > use pg_resetxlog. After that, you will undoubtedly need to reindex > pg_class in the problem database. After that, if you're very lucky, > you'll be able to pg_dump. If you can get a clean dump, then initdb > and reload --- it would not be wise to keep using the database. > > I'd suggest first taking a filesystem backup of the $PGDATA directory as > it stands, so you can get back to where you are if this fails to work. > > Assuming you can get back up and running, you should look into (a) why > you are running an old, known-buggy PG release, and (b) whether you have > flaky hardware, as per comments already made upthread. > > regards, tom lane >