Thread: Identifying cause of "database system shutdown was interrupted" at failed startup
Identifying cause of "database system shutdown was interrupted" at failed startup
From
"Crispin Miller"
Date:
Hi, We recently encountered a serious database crash that resulted in a significant loss of data...=20 We took down the database server, and when we restarted the backend we got an error 'database system shutdown was interrupted' ... 'invalid checkpoint' etc... with missing xlog files (I've appended the log to the end of this post)... I've been trawling list-archives for a few days and this issue has cropped up a number of times, but I've found it hard to identify a single post - or set of posts - that might help explain the cause of such a crash... Hopefully I'll be able to bring together the results of this trawl through the archives in this post - but I'd really appreciate any help or suggestions people have - we currently have a slightly uneasy feeling because we've not quite got to the bottom of the issues, and it would be nice to set our minds at rest! :-) So far I've identified two possible causes of the crash - I've listed them below, and wonder whether people have any comments on them: 1) We were running postgres version 7.3.6-1 (which is the version in RedHat AS3 : redhat EL AS3 kernel-smp-2.4.21-9.0.1EL) The following post suggests that this is a known issue in 7.3.3, but 7.3.4 is safe? I assume, therefore, that 7.3.6-1 is also safe... =09 http://archives.postgresql.org/pgsql-general/2003-09/msg01086.php =09=20 2) We are running the database in conjunction with Jboss, connecting to the database server from a different machine via JDBC. The database was taken down *without* stopping Jboss first.=20 Any thoughts would be much apreciated! Below are the relevant bits of the shutdown and startup logs, Best wishes, Crispin ---------------------- shutdown log (/var/log/messages):=20 May 28 15:43:35 shutdown: shutting down for system halt May 28 15:43:35 init: Switching to runlevel: 0 May 28 15:43:36 server rhnsd[1694]: Exiting May 28 15:43:36 server rhnsd: rhnsd shutdown succeeded May 28 15:43:36 server atd: atd shutdown succeeded May 28 15:43:36 server cups: cupsd shutdown succeeded May 28 15:43:36 server xfs[1643]: terminating=20 May 28 15:43:36 server xfs: xfs shutdown succeeded May 28 15:43:36 server mysqld: Stopping MySQL: succeeded May 28 15:43:36 server gpm: gpm shutdown succeeded May 28 15:43:37 server rhdb: Stopping PostgreSQL - Red Hat Edition service:=20 May 28 15:43:37 server su(pam_unix)[12400]: session opened for user postgres by (uid=3D0) May 28 15:43:40 server su(pam_unix)[12400]: session closed for user postgres May 28 15:43:40 server rhdb: ^[[60G[=20 May 28 15:43:40 server rhdb:=20 May 28 15:43:40 server rc: Stopping rhdb: succeeded=20 ...=20 May 28 15:43:44 server kernel: Kernel logging (proc) stopped. May 28 15:43:44 server kernel: Kernel log daemon terminating. May 28 15:43:45 server syslog: klogd shutdown succeeded May 28 15:43:45 server exiting on signal 15 May 28 16:13:35 server syslogd 1.4.1: restart. ----- starting messages Jun 1 10:43:55 server postgres[5537]: [30] LOG: database system shutdown was interrupted at 2004-05-28 16:32:08 BST Jun 1 10:43:55 server postgres[5537]: [31] LOG: open of /var/lib/pgsql/data/pg_xlog/0000000000000000 (log file 0, segment 0) failed: No such file or directory Jun 1 10:43:55 server postgres[5537]: [32] LOG: invalid primary checkpoint record Jun 1 10:43:55 server postgres[5537]: [33] LOG: open of /var/lib/pgsql/data/pg_xlog/0000000000000000 (log file 0, segment 0) failed: No such file or directory Jun 1 10:43:55 server postgres[5537]: [34] LOG: invalid secondary checkpoint record Jun 1 10:43:55 server postgres[5537]: [35] PANIC: unable to locate a valid checkpoint record Jun 1 10:43:55 server postgres[5534]: [31] LOG: startup process (pid 5537) was terminated by signal 6 Jun 1 10:43:55 server postgres[5534]: [32] LOG: aborting startup due to startup process failure Jun 1 10:43:56 server rhdb: Starting PostgreSQL - Red Hat Edition service: failed Jun 1 10:44:00 server su(pam_unix)[5554]: session opened for user postgres by (uid=3D0) Jun 1 10:44:00 server su(pam_unix)[5554]: session closed for user postgres Jun 1 10:44:00 server postgres[5595]: [30] LOG: database system shutdown was interrupted at 2004-05-28 16:32:08 BST Jun 1 10:44:00 server postgres[5595]: [31] LOG: open of /var/lib/pgsql/data/pg_xlog/0000000000000000 (log file 0, segment 0) failed: No such file or directory Jun 1 10:44:00 server postgres[5595]: [32] LOG: invalid primary checkpoint record Jun 1 10:44:00 server postgres[5595]: [33] LOG: open of /var/lib/pgsql/data/pg_xlog/0000000000000000 (log file 0, segment 0) failed: No such file or directory Jun 1 10:44:00 server postgres[5595]: [34] LOG: invalid secondary checkpoint record Jun 1 10:44:00 server postgres[5595]: [35] PANIC: unable to locate a valid checkpoint record Jun 1 10:44:00 server postgres[5592]: [31] LOG: startup process (pid 5595) was terminated by signal 6 Jun 1 10:44:00 server postgres[5592]: [32] LOG: aborting startup due to startup process failure Jun 1 10:44:01 server rhdb: Starting PostgreSQL - Red Hat Edition service: failed =20 -------------------------------------------------------- =20 This email is confidential and intended solely for the use of the person(s)= ('the intended recipient') to whom it was addressed. Any views or opinions= presented are solely those of the author and do not necessarily represent = those of the Paterson Institute for Cancer Research or the Christie Hospita= l NHS Trust. It may contain information that is privileged & confidential w= ithin the meaning of applicable law. Accordingly any dissemination, distrib= ution, copying, or other use of this message, or any of its contents, by an= y person other than the intended recipient may constitute a breach of civil= or criminal law and is strictly prohibited. If you are NOT the intended re= cipient please contact the sender and dispose of this e-mail as soon as pos= sible. =20