Thread: PostgreSQL 9.1 "database system was interrupted; last known up at"
I had to power cycle my system because it became unresponsive. Now PosgtreSQL will not start. I would like advice abouthow to proceed; I think pg_resetxlog is my next step. I have made a copy of the current database files. <log file="postgresql-9.1-main.log"> 2015-05-25 10:44:21 PDT LOG: database system was interrupted; last known up at 2015-05-22 09:22:25 PDT 2015-05-25 10:44:21 PDT LOG: incomplete startup packet 2015-05-25 10:44:21 PDT FATAL: could not open file "/etc/ssl/certs/ssl-cert-snakeoil.pem": Permission denied 2015-05-25 10:44:21 PDT LOG: startup process (PID 5180) exited with exit code 1 2015-05-25 10:44:21 PDT LOG: aborting startup due to startup process failure </log> I am running PostgreSQL 9.1 on Debian wheezy aka 7 aka oldstable. Installed via the Debian package. I think I accepted the defaults, and have not changed the configuration since. Linux 3.2.0, stock Debian kernel, amd64. Connect via emacs sql-postgresql or psql. An init script controls startup. When the system became unresponsive I was able to ssh in; the X process had gone crazy and could not be killed. Most keyfile systems had been remounted read-only, and many commands (includiing shutdown and telinit) produced errors, oftenI/O errors, when run. The last kern.log entries showed a process being killed. There was quite a lot of inode deletionand log replaying on restart. The log message above was from just after the restart. I have no backups*, but could recreate the database in the worst case. I haven't done anything with the database in at leasta week, I think, and so if I could get back the state as of 5/22 that would be fine. Filesystem is ext3 on dm-crypt on LVM. The permission error on the snakeoil cert is weird, since it is readable by all. I'm guessing it's a side effect of theearlier problems. Thanks for any guidance. Ross Boylan (*) It would be a bit of mess even if I did, since I use bacula with postgres as the database. I think I was also dumpingthe database when I had backup going.
"Boylan, Ross" <Ross.Boylan@ucsf.edu> writes: > I had to power cycle my system because it became unresponsive. Now PosgtreSQL will not start. I would like advice abouthow to proceed; I think pg_resetxlog is my next step. I have made a copy of the current database files. > <log file="postgresql-9.1-main.log"> > 2015-05-25 10:44:21 PDT LOG: database system was interrupted; last known up at 2015-05-22 09:22:25 PDT > 2015-05-25 10:44:21 PDT LOG: incomplete startup packet > 2015-05-25 10:44:21 PDT FATAL: could not open file "/etc/ssl/certs/ssl-cert-snakeoil.pem": Permission denied You did not say, but I'm betting you recently updated to last week's release, 9.1.16? If so, this is an unforeseen side-effect of a patch to forcibly fsync everything in the PG data directory after a crash, see http://www.postgresql.org/message-id/flat/20150523172627.GA24277@msg.df7cb.de We're debating how to fix it for real, but in the meantime the easiest workaround is to change the permissions on that file so it's writable by Postgres. (Yeah, I know, ick. Make a note to undo that after we fix the fsync'ing logic.) regards, tom lane
Thanks for your reply. 1. Yes, I updated to 9.1.16. 2. Changing the permission of the snakeoil file makes produced a similar error about the private key. Changing the permission of the private key produced an error that the key no longer had valid permissions. 3. So is the problem reading the certificate my only real problem? Details: 1. Since security backports do not necessarily include all upstream changes, here's the changelog.Debian entry for the lastupdate: postgresql-9.1 (9.1.16-0+deb7u1) wheezy-security; urgency=medium * New upstream version. + Avoid possible crash when client disconnects just before the authentication timeout expires (Benkocs Norbert Attila) If the timeout interrupt fired partway through the session shutdown sequence, SSL-related state would be freed twice, typically causing a crash and hence denial of service to other sessions. Experimentation shows that an unauthenticated remote attacker could trigger the bug somewhat consistently, hence treat as security issue. (CVE-2015-3165) + Improve detection of system-call failures (Noah Misch) Our replacement implementation of snprintf() failed to check for errors reported by the underlying system library calls; the main case that might be missed is out-of-memory situations. In the worst case this might lead to information exposure, due to our code assuming that a buffer had been overwritten when it hadn't been. Also, there were a few places in which security-relevant calls of other system library functions did not check for failure. It remains possible that some calls of the *printf() family of functions are vulnerable to information disclosure if an out-of-memory error occurs at just the wrong time. We judge the risk to not be large, but will continue analysis in this area. (CVE-2015-3166) + In contrib/pgcrypto, uniformly report decryption failures as Wrong key or corrupt data (Noah Misch) Previously, some cases of decryption with an incorrect key could report other error message texts. It has been shown that such variance in error reports can aid attackers in recovering keys from other systems. While it's unknown whether pgcrypto's specific behaviors are likewise exploitable, it seems better to avoid the risk by using a one-size-fits-all message. (CVE-2015-3167) * Repository moved to git, update Vcs headers. -- Christoph Berg <christoph.berg@credativ.de> Thu, 21 May 2015 16:35:28 +0200 2. Note Debian has a wrapper to support running multiple clusters of postgres at multiple versions of the database. So the command below has the version (9.1) and cluster name (main) as arguments. root@tempserver:/etc# date; pg_ctlcluster 9.1 main start Mon May 25 21:25:29 PDT 2015 The PostgreSQL server failed to start. Please check the log output: 2015-05-25 21:25:29 PDT LOG: database system was interrupted; last known up at 2015-05-22 09:22:25 PDT 2015-05-25 21:25:30 PDT LOG: incomplete startup packet 2015-05-25 21:25:30 PDT FATAL: could not open file "/etc/ssl/private/ssl-cert-snakeoil.key": Permission denied 2015-05-25 21:25:30 PDT LOG: startup process (PID 28524) exited with exit code 1 2015-05-25 21:25:30 PDT LOG: aborting startup due to startup process failure root@tempserver:/etc# ls -l /etc/ssl/certs/ssl-cert-snakeoil.pem # previously changed group and group write -rw-rw-r-- 1 root postgres 1005 Apr 22 2013 /etc/ssl/certs/ssl-cert-snakeoil.pem root@tempserver:/etc# ls -ld /etc/ssl/certs drwxr-xr-x 3 root root 22528 Oct 18 2014 /etc/ssl/certs # Realize that the "could not open file" is different from original error. Try to adjust. root@tempserver:/etc# ls -l /etc/ssl/private/ssl-cert-snakeoil.key -rw-r----- 1 root ssl-cert 1704 Apr 22 2013 /etc/ssl/private/ssl-cert-snakeoil.key root@tempserver:/etc# grep ssl-cert /etc/group ssl-cert:x:118:postgres root@tempserver:/etc# date; chmod g+w /etc/ssl/private/ssl-cert-snakeoil.key Mon May 25 21:37:45 PDT 2015 root@tempserver:/etc# ls -l /etc/ssl/private/ssl-cert-snakeoil.key -rw-rw---- 1 root ssl-cert 1704 Apr 22 2013 /etc/ssl/private/ssl-cert-snakeoil.key root@tempserver:/etc# date; pg_ctlcluster 9.1 main start Mon May 25 21:37:58 PDT 2015 The PostgreSQL server failed to start. Please check the log output: 2015-05-25 21:37:58 PDT FATAL: private key file "server.key" has group or world access 2015-05-25 21:37:58 PDT DETAIL: File must be owned by the database user or root, must have no write permission for "group",and must have no permissions for "oth\ er". Ross ________________________________________ From: Tom Lane [tgl@sss.pgh.pa.us] Sent: Monday, May 25, 2015 7:15 PM To: Boylan, Ross Cc: pgsql-admin@postgresql.org Subject: Re: [ADMIN] PostgreSQL 9.1 "database system was interrupted; last known up at" "Boylan, Ross" <Ross.Boylan@ucsf.edu> writes: > I had to power cycle my system because it became unresponsive. Now PosgtreSQL will not start. I would like advice abouthow to proceed; I think pg_resetxlog is my next step. I have made a copy of the current database files. > <log file="postgresql-9.1-main.log"> > 2015-05-25 10:44:21 PDT LOG: database system was interrupted; last known up at 2015-05-22 09:22:25 PDT > 2015-05-25 10:44:21 PDT LOG: incomplete startup packet > 2015-05-25 10:44:21 PDT FATAL: could not open file "/etc/ssl/certs/ssl-cert-snakeoil.pem": Permission denied You did not say, but I'm betting you recently updated to last week's release, 9.1.16? If so, this is an unforeseen side-effect of a patch to forcibly fsync everything in the PG data directory after a crash, see http://www.postgresql.org/message-id/flat/20150523172627.GA24277@msg.df7cb.de We're debating how to fix it for real, but in the meantime the easiest workaround is to change the permissions on that file so it's writable by Postgres. (Yeah, I know, ick. Make a note to undo that after we fix the fsync'ing logic.) regards, tom lane
Ross Boylan wrote: > I had to power cycle my system because it became unresponsive. Now PosgtreSQL will not start. I > would like advice about how to proceed; I think pg_resetxlog is my next step. I have made a copy of > the current database files. > > <log file="postgresql-9.1-main.log"> > 2015-05-25 10:44:21 PDT LOG: database system was interrupted; last known up at 2015-05-22 09:22:25 PDT > 2015-05-25 10:44:21 PDT LOG: incomplete startup packet > 2015-05-25 10:44:21 PDT FATAL: could not open file "/etc/ssl/certs/ssl-cert-snakeoil.pem": Permission denied > 2015-05-25 10:44:21 PDT LOG: startup process (PID 5180) exited with exit code 1 > 2015-05-25 10:44:21 PDT LOG: aborting startup due to startup process failure > </log> > > I am running PostgreSQL 9.1 on Debian wheezy aka 7 aka oldstable. > Installed via the Debian package. I think I accepted the defaults, and have not changed the > configuration since. > Linux 3.2.0, stock Debian kernel, amd64. > Connect via emacs sql-postgresql or psql. An init script controls startup. > > When the system became unresponsive I was able to ssh in; the X process had gone crazy and could not > be killed. Most key file systems had been remounted read-only, and many commands (includiing shutdown > and telinit) produced errors, often I/O errors, when run. The last kern.log entries showed a process > being killed. There was quite a lot of inode deletion and log replaying on restart. The log message > above was from just after the restart. > > I have no backups*, but could recreate the database in the worst case. I haven't done anything with > the database in at least a week, I think, and so if I could get back the state as of 5/22 that would > be fine. > > Filesystem is ext3 on dm-crypt on LVM. > > The permission error on the snakeoil cert is weird, since it is readable by all. I'm guessing it's a > side effect of the earlier problems. You didn't say which exact version of PostgreSQL you are running, but I bet you it is 9.1.16 and you are hitting the "Fsync Permissions Bug" introduced with that release: https://wiki.postgresql.org/wiki/May_2015_Fsync_Permissions_Bug There are plans to release a fix for that shortly. A better workaround than the one specified in the current version of the Wiki page might be to replace the symbolic link with a copy and change ownership of the files to "postgres". A restore wouldn't help you in this case, but since you probably felt the pain, please implement a backup strategy. It may come handy some day if recreating the database is not a nice option. Yours, Laurenz Albe
Mostly following the instructions at https://wiki.postgresql.org/wiki/May_2015_Fsync_Permissions_Bug#I.27ve_hit_this_bug_and_I_can.27t_restart_Postgres._What_do_I_do.3F I was able to restart the postgres. <log> 2015-05-30 13:45:38 PDT LOG: database system was interrupted; last known up at 2015-05-22 09:22:25 PDT 2015-05-30 13:45:38 PDT LOG: incomplete startup packet 2015-05-30 13:45:39 PDT LOG: database system was not properly shut down; automatic recovery in progress 2015-05-30 13:45:39 PDT LOG: record with zero length at 0/41856D18 2015-05-30 13:45:39 PDT LOG: redo is not required 2015-05-30 13:45:39 PDT LOG: database system is ready to accept connections 2015-05-30 13:45:39 PDT LOG: autovacuum launcher started </log> I actually tried first renaming server.{crt,key} and copying files as suggested on the wiki. I did a chown just on the newserver.{crt,key}. Postgres gave the same permission error on my renamed links, so I deleted them. Then /etc/init.d/postgresqlstart succeeded. Thanks to Albe, Tom, and those contributing to the wiki for your help. Ross Boylan ________________________________________ From: Albe Laurenz [laurenz.albe@wien.gv.at] Sent: Tuesday, May 26, 2015 2:12 AM To: Boylan, Ross; pgsql-admin@postgresql.org Subject: RE: PostgreSQL 9.1 "database system was interrupted; last known up at" Ross Boylan wrote: > I had to power cycle my system because it became unresponsive. Now PosgtreSQL will not start. I > would like advice about how to proceed; I think pg_resetxlog is my next step. I have made a copy of > the current database files. > > <log file="postgresql-9.1-main.log"> > 2015-05-25 10:44:21 PDT LOG: database system was interrupted; last known up at 2015-05-22 09:22:25 PDT > 2015-05-25 10:44:21 PDT LOG: incomplete startup packet > 2015-05-25 10:44:21 PDT FATAL: could not open file "/etc/ssl/certs/ssl-cert-snakeoil.pem": Permission denied > 2015-05-25 10:44:21 PDT LOG: startup process (PID 5180) exited with exit code 1 > 2015-05-25 10:44:21 PDT LOG: aborting startup due to startup process failure > </log> > > I am running PostgreSQL 9.1 on Debian wheezy aka 7 aka oldstable. > Installed via the Debian package. I think I accepted the defaults, and have not changed the > configuration since. > Linux 3.2.0, stock Debian kernel, amd64. > Connect via emacs sql-postgresql or psql. An init script controls startup. > > When the system became unresponsive I was able to ssh in; the X process had gone crazy and could not > be killed. Most key file systems had been remounted read-only, and many commands (includiing shutdown > and telinit) produced errors, often I/O errors, when run. The last kern.log entries showed a process > being killed. There was quite a lot of inode deletion and log replaying on restart. The log message > above was from just after the restart. > > I have no backups*, but could recreate the database in the worst case. I haven't done anything with > the database in at least a week, I think, and so if I could get back the state as of 5/22 that would > be fine. > > Filesystem is ext3 on dm-crypt on LVM. > > The permission error on the snakeoil cert is weird, since it is readable by all. I'm guessing it's a > side effect of the earlier problems. You didn't say which exact version of PostgreSQL you are running, but I bet you it is 9.1.16 and you are hitting the "Fsync Permissions Bug" introduced with that release: https://wiki.postgresql.org/wiki/May_2015_Fsync_Permissions_Bug There are plans to release a fix for that shortly. A better workaround than the one specified in the current version of the Wiki page might be to replace the symbolic link with a copy and change ownership of the files to "postgres". A restore wouldn't help you in this case, but since you probably felt the pain, please implement a backup strategy. It may come handy some day if recreating the database is not a nice option. Yours, Laurenz Albe