Thread: 9.4 pg_control corruption
I've encountered a corrupt pg_control file on my 9.4 development cluster. I've mostly been using the cluster for changeset extraction / slony testing. This is a 9.4 (currently commit 6ad903d70a440e + a walsender change discussed in another thread) but would have had the initdb done with an earlier 9.4 snapshot. /usr/local/pgsql94wal/bin$ ./pg_controldata ../data WARNING: Calculated CRC checksum does not match value stored in file. Either the file is corrupt, or it has a different layout than this program is expecting. The results below are untrustworthy. pg_control version number: 937 Catalog version number: 201405111 Database system identifier: 6014096177254975326 Database cluster state: in production pg_control last modified: Tue 08 Jul 2014 06:15:58 PM EDT Latest checkpoint location: 5/44DC5FC8 Prior checkpoint location: 5/44C58B88 Latest checkpoint's REDO location: 5/44DC5FC8 Latest checkpoint's REDO WAL file: 000000010000000500000044 Latest checkpoint's TimeLineID: 1 Latest checkpoint's PrevTimeLineID: 1 Latest checkpoint's full_page_writes: on Latest checkpoint's NextXID: 0/1558590 Latest checkpoint's NextOID: 505898 Latest checkpoint's NextMultiXactId: 3285 Latest checkpoint's NextMultiOffset: 6569 Latest checkpoint's oldestXID: 1281 Latest checkpoint's oldestXID's DB: 1 Latest checkpoint's oldestActiveXID: 0 Latest checkpoint's oldestMultiXid: 1 Latest checkpoint's oldestMulti's DB: 1 Time of latest checkpoint: Tue 08 Jul 2014 06:15:23 PM EDT Fake LSN counter for unlogged rels: 0/1 Minimum recovery ending location: 0/0 Min recovery ending loc's timeline: 0 Backup start location: 0/0 Backup end location: 0/0 End-of-backup record required: no Current wal_level setting: logical Current wal_log_hints setting: off Current max_connections setting: 200 Current max_worker_processes setting: 8 Current max_prepared_xacts setting: 0 Current max_locks_per_xact setting: 64 Maximum data alignment: 8 Database block size: 8192 Blocks per segment of large relation: 131072 WAL block size: 8192 Bytes per WAL segment: 16777216 Maximum length of identifiers: 64 Maximum columns in an index: 32 Maximum size of a TOAST chunk: 1996 Size of a large-object chunk: 65793 Date/time type storage: floating-point numbers Float4 argument passing: by reference Float8 argument passing: by reference Data page checksum version: 2602751502 ssinger@ssinger-laptop:/usr/local/pgsql94wal/bin$ Before this postgres crashed, and seemed to have problems recovering. I might have hit CTRL-C but I didn't do anything drastic like issue a kill -9. test1 [unknown] 2014-07-08 18:15:18.986 EDTFATAL: the database system is in recovery mode test1 [unknown] 2014-07-08 18:15:20.482 EDTWARNING: terminating connection because of crash of another server process test1 [unknown] 2014-07-08 18:15:20.482 EDTDETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. test1 [unknown] 2014-07-08 18:15:20.482 EDTHINT: In a moment you should be able to reconnect to the database and repeat your command. 2014-07-08 18:15:20.483 EDTLOG: all server processes terminated; reinitializing 2014-07-08 18:15:20.720 EDTLOG: database system was interrupted; last known up at 2014-07-08 18:15:15 EDT 2014-07-08 18:15:20.865 EDTLOG: database system was not properly shut down; automatic recovery in progress 2014-07-08 18:15:20.954 EDTLOG: redo starts at 5/41023848 2014-07-08 18:15:23.153EDTLOG: unexpected pageaddr 4/D8DC6000 in log segment 000000010000000500000044, offset 14442496 2014-07-08 18:15:23.153 EDTLOG: redo done at 5/44DC5F60 2014-07-0818:15:23.153 EDTLOG: last completed transaction was at log time 2014-07-08 18:15:17.874937-04 test2 [unknown] 2014-07-08 18:15:24.247 EDTFATAL: the database system is in recovery mode test2 [unknown] 2014-07-08 18:15:24.772 EDTFATAL: the database system is in recovery mode test2 [unknown] 2014-07-08 18:15:25.281 EDTFATAL: the database system is in recovery mode test1 [unknown] 2014-07-08 18:15:25.547 EDTFATAL: the database system is in recovery mode test2 [unknown] 2014-07-08 18:15:25.548 EDTFATAL: the database system is in recovery mode test3 [unknown] 2014-07-08 18:15:25.549 EDTFATAL: the database system is in recovery mode test4 [unknown] 2014-07-08 18:15:25.557 EDTFATAL: the database system is in recovery mode test5 [unknown] 2014-07-08 18:15:25.582 EDTFATAL: the database system is in recovery mode test2 [unknown] 2014-07-08 18:15:25.584 EDTFATAL: the database system is in recovery mode test1 [unknown] 2014-07-08 18:15:25.618 EDTFATAL: the database system is in recovery mode test2 [unknown] 2014-07-08 18:15:25.619 EDTFATAL: the database system is in recovery mode test3 [unknown] 2014-07-08 18:15:25.621 EDTFATAL: the database system is in recovery mode test4 [unknown] 2014-07-08 18:15:25.622 EDTFATAL: the database system is in recovery mode test5 [unknown] 2014-07-08 18:15:25.623 EDTFATAL: the database system is in recovery mode test1 [unknown] 2014-07-08 18:15:25.624 EDTFATAL: the database system is in recovery mode test1 [unknown] 2014-07-08 18:15:25.633 EDTFATAL: the database system is in recovery mode ^C 2014-07-08 18:15:52.316 EDTLOG: received fast shutdown request The core file in gdb shows ore was generated by `postgres: autovacuum w'. Program terminated with signal 6, Aborted. #0 0x00007f18be8af295 in ?? () (gdb) where #0 0x00007f18be8af295 in ?? () #1 0x00007f18be8b2438 in ?? () #2 0x0000000000000020 in ?? () #3 0x0000000000000000 in ?? () I can't rule out that the hardware my laptop is misbehaving but I haven't noticed any other problems doing non 9.4 stuff. Has anyone seen anything similar with 9.4? Is there anything specific I should investigate (I don't care about recovering the cluster).
Steve Singer <steve@ssinger.info> writes: > I've encountered a corrupt pg_control file on my 9.4 development > cluster. I've mostly been using the cluster for changeset extraction / > slony testing. > This is a 9.4 (currently commit 6ad903d70a440e + a walsender change > discussed in another thread) but would have had the initdb done with an > earlier 9.4 snapshot. Somehow or other you missed the update to pg_control version number 942. There's no obvious reason to think that this pg_control file is corrupt on its own terms, but the pg_controldata version you're using expects the 942 layout. The fact that the server wasn't complaining about this suggests that you've not recompiled the server, or at least not xlog.c. Possibly the odd failure to restart indicates that you have a partially updated server executable? regards, tom lane
On 07/08/2014 10:14 PM, Tom Lane wrote: > Steve Singer <steve@ssinger.info> writes: >> I've encountered a corrupt pg_control file on my 9.4 development >> cluster. I've mostly been using the cluster for changeset extraction / >> slony testing. >> This is a 9.4 (currently commit 6ad903d70a440e + a walsender change >> discussed in another thread) but would have had the initdb done with an >> earlier 9.4 snapshot. > Somehow or other you missed the update to pg_control version number 942. > There's no obvious reason to think that this pg_control file is corrupt > on its own terms, but the pg_controldata version you're using expects > the 942 layout. The fact that the server wasn't complaining about this > suggests that you've not recompiled the server, or at least not xlog.c. > Possibly the odd failure to restart indicates that you have a partially > updated server executable? The server is complaining about that, it started to after the crash (which is why I ran pg_controldata) ssinger@ssinger-laptop:/usr/local/pgsql94wal/bin$ ./postgres -D ../data 2014-07-08 22:28:57.796 EDTFATAL: database filesare incompatible with server 2014-07-08 22:28:57.796 EDTDETAIL: The database cluster was initialized with PG_CONTROL_VERSION 937, but the server was compiled with PG_CONTROL_VERSION 942. 2014-07-08 22:28:57.796 EDTHINT: It looks like you need to initdb. ssinger@ssinger-laptop:/usr/local/pgsql94wal/bin$ The server seemed fine (and it was 9.4 because I was using 9.4 features) The server crashed The server performed crash recovery The server server wouldn't start and pg_controldata shows the attached output I wasn't recompiling or reinstalling around this time either. > regards, tom lane > >
Steve Singer <steve@ssinger.info> writes: > On 07/08/2014 10:14 PM, Tom Lane wrote: >> There's no obvious reason to think that this pg_control file is corrupt >> on its own terms, but the pg_controldata version you're using expects >> the 942 layout. The fact that the server wasn't complaining about this >> suggests that you've not recompiled the server, or at least not xlog.c. > The server is complaining about that, it started to after the crash Then you updated your sources, recompiled and reinstalled, but failed to restart the server when you did that. Else it would have complained on the spot. If you had any valuable data in the installation, we could talk about how to get it out; but since you didn't I'd suggest just re-initdb and move on. I don't see anything unexpected here. regards, tom lane
Hi,dear steven && pgsql-hackers I've encountered the similar phenonmenon with 9.4 . 1. environment 1.1 OS version postgres@lhl-Latitude-E5420:~$ cat /etc/issue Ubuntu 13.10 \n \l postgres@lhl-Latitude-E5420:~$ uname -av Linux lhl-Latitude-E5420 3.11.0-12-generic #19-Ubuntu SMP Wed Oct 9 16:20:46 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux 1.2 PostgreSQL version postgres@lhl-Latitude-E5420:~$ /opt/pg94/bin/pg_controldata --version pg_controldata (PostgreSQL) 9.4beta2 postgres@lhl-Latitude-E5420:~$ /opt/pg94/bin/pg_config BINDIR = /opt/pg94/bin DOCDIR = /opt/pg94/share/doc/postgresql HTMLDIR = /opt/pg94/share/doc/postgresql INCLUDEDIR = /opt/pg94/include PKGINCLUDEDIR = /opt/pg94/include/postgresql INCLUDEDIR-SERVER = /opt/pg94/include/postgresql/server LIBDIR = /opt/pg94/lib PKGLIBDIR = /opt/pg94/lib/postgresql LOCALEDIR = /opt/pg94/share/locale MANDIR = /opt/pg94/share/man SHAREDIR = /opt/pg94/share/postgresql SYSCONFDIR = /opt/pg94/etc/postgresql PGXS = /opt/pg94/lib/postgresql/pgxs/src/makefiles/pgxs.mk CONFIGURE = '--prefix=/opt/pg94' '--with-perl' '--with-libxml' '--with-libxslt' '--with-ossp-uuid' CC = gcc CPPFLAGS = -D_GNU_SOURCE -I/usr/include/libxml2 CFLAGS = -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard CFLAGS_SL = -fpic LDFLAGS = -L../../../src/common -Wl,--as-needed -Wl,-rpath,'/opt/pg94/lib',--enable-new-dtags LDFLAGS_EX = LDFLAGS_SL = LIBS = -lpgcommon -lpgport -lxslt -lxml2 -lz -lreadline -lrt -lcrypt -ldl -lm VERSION = PostgreSQL 9.4beta2 2. phenonmenon I have a PostgreSQL datadir named /export/pg94beta1_data/ which was initialized with PostgreSQL 9.4beta1, postgres@lhl-Latitude-E5420:~$ /opt/pg94/bin/pg_controldata /export/pg94beta1_data/ WARNING: Calculated CRC checksum does not match value stored in file. Either the file is corrupt, or it has a different layout than this program is expecting. The results below are untrustworthy. pg_control version number: 937 Catalog version number: 201405111 Database system identifier: 6014427290583411360 Database cluster state: in production pg_control last modified: 2014年07月27日 星期日 16时36分50秒 Latest checkpoint location: 0/17462890 Prior checkpoint location: 0/17462828 Latest checkpoint's REDO location: 0/17462890 Latest checkpoint's REDO WAL file: 000000010000000000000017 Latest checkpoint's TimeLineID: 1 Latest checkpoint's PrevTimeLineID: 1 Latest checkpoint's full_page_writes: off Latest checkpoint's NextXID: 0/1387 Latest checkpoint's NextOID: 22220 Latest checkpoint's NextMultiXactId: 1 Latest checkpoint's NextMultiOffset: 0 Latest checkpoint's oldestXID: 715 Latest checkpoint's oldestXID's DB: 1 Latest checkpoint's oldestActiveXID: 0 Latest checkpoint's oldestMultiXid: 1 Latest checkpoint's oldestMulti's DB: 1 Time of latest checkpoint: 2014年07月27日 星期日 16时36分50秒 Fake LSN counter for unlogged rels: 0/1 Minimum recovery ending location: 0/0 Min recovery ending loc's timeline: 0 Backup start location: 0/0 Backup end location: 0/0 End-of-backup record required: no Current wal_level setting: minimal Current wal_log_hints setting: off Current max_connections setting: 100 Current max_worker_processes setting: 8 Current max_prepared_xacts setting: 0 Current max_locks_per_xact setting: 64 Maximum data alignment: 8 Database block size: 8192 Blocks per segment of large relation: 131072 WAL block size: 8192 Bytes per WAL segment: 16777216 Maximum length of identifiers: 64 Maximum columns in an index: 32 Maximum size of a TOAST chunk: 1996 Size of a large-object chunk: 65793 Date/time type storage: floating-point numbers Float4 argument passing: by reference Float8 argument passing: by reference Data page checksum version: 307500851 but the server complained about the following when I started it with PostgreSQL 9.4beta2, postgres@lhl-Latitude-E5420:~$ /opt/pg94/bin/pg_ctl -D /export/pg94beta1_data/ start server starting postgres@lhl-Latitude-E5420:~$ [ 2014-07-27 19:23:57.922 CST 27983 53d4e14d.6d4f 1 0]FATAL: database files are incompatible with server [ 2014-07-27 19:23:57.922 CST 27983 53d4e14d.6d4f 2 0]DETAIL: The database cluster was initialized with PG_CONTROL_VERSION 937, but the server was compiled with PG_CONTROL_VERSION 942. [ 2014-07-27 19:23:57.922 CST 27983 53d4e14d.6d4f 3 0]HINT: It looks like you need to initdb. I always think that it should not come up the PG_CONTROL_VERSION mismatch when the PostgreSQL version upgrade between the small version . Is there some important differences in PostgreSQL 9.4 ? Thanks Best Regards! 于 2014年07月09日 10:36, Steve Singer 写道: > On 07/08/2014 10:14 PM, Tom Lane wrote: >> Steve Singer <steve@ssinger.info> writes: >>> I've encountered a corrupt pg_control file on my 9.4 development >>> cluster. I've mostly been using the cluster for changeset extraction / >>> slony testing. >>> This is a 9.4 (currently commit 6ad903d70a440e + a walsender change >>> discussed in another thread) but would have had the initdb done with an >>> earlier 9.4 snapshot. >> Somehow or other you missed the update to pg_control version number 942. >> There's no obvious reason to think that this pg_control file is corrupt >> on its own terms, but the pg_controldata version you're using expects >> the 942 layout. The fact that the server wasn't complaining about this >> suggests that you've not recompiled the server, or at least not xlog.c. >> Possibly the odd failure to restart indicates that you have a partially >> updated server executable? > > > The server is complaining about that, it started to after the crash > (which is why I ran pg_controldata) > > ssinger@ssinger-laptop:/usr/local/pgsql94wal/bin$ ./postgres -D ../data > 2014-07-08 22:28:57.796 EDTFATAL: database files are incompatible > with server > 2014-07-08 22:28:57.796 EDTDETAIL: The database cluster was > initialized with PG_CONTROL_VERSION 937, but the server was compiled > with PG_CONTROL_VERSION 942. > 2014-07-08 22:28:57.796 EDTHINT: It looks like you need to initdb. > ssinger@ssinger-laptop:/usr/local/pgsql94wal/bin$ > > > The server seemed fine (and it was 9.4 because I was using 9.4 features) > The server crashed > The server performed crash recovery > The server server wouldn't start and pg_controldata shows the attached > output > > I wasn't recompiling or reinstalling around this time either. > > > >> regards, tom lane >> >> > > >
李海龙 <hailong.li@qunar.com> writes: > I have a PostgreSQL datadir named /export/pg94beta1_data/ which was > initialized with PostgreSQL 9.4beta1, > [ and 9.4beta2 won't start with it ] This is expected; you need to initdb. Or use pg_upgrade to upgrade the cluster. We had to change pg_control format post-beta1. regards, tom lane
Understand! Before I wrote last email, I had initialized a new db with PostgreSQL 9.4beta2 and restored the pg_dumpall data of /export/pg94beta1_data/ Thanks Best Regards! at 2014-07-28 00:35 +08, Tom Lane wrote: > 李海龙 <hailong.li@qunar.com> writes: >> I have a PostgreSQL datadir named /export/pg94beta1_data/ which was >> initialized with PostgreSQL 9.4beta1, >> [ and 9.4beta2 won't start with it ] > This is expected; you need to initdb. Or use pg_upgrade to upgrade > the cluster. We had to change pg_control format post-beta1. > > regards, tom lane > >
On 07/27/2014 09:35 AM, Tom Lane wrote: > 李海龙 <hailong.li@qunar.com> writes: >> I have a PostgreSQL datadir named /export/pg94beta1_data/ which was >> initialized with PostgreSQL 9.4beta1, >> [ and 9.4beta2 won't start with it ] > > This is expected; you need to initdb. Or use pg_upgrade to upgrade > the cluster. We had to change pg_control format post-beta1. Thank you for testing that though! -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com