Thread: 9.4 pg_control corruption

9.4 pg_control corruption

From
Steve Singer
Date:
I've encountered a corrupt pg_control  file on my 9.4 development 
cluster.  I've mostly been using the cluster for changeset extraction / 
slony testing.

This is a 9.4 (currently commit 6ad903d70a440e  + a walsender change 
discussed in another thread) but would have had the initdb done with an 
earlier 9.4 snapshot.



/usr/local/pgsql94wal/bin$ ./pg_controldata ../data
WARNING: Calculated CRC checksum does not match value stored in file.
Either the file is corrupt, or it has a different layout than this program
is expecting.  The results below are untrustworthy.

pg_control version number:            937
Catalog version number:               201405111
Database system identifier:           6014096177254975326
Database cluster state:               in production
pg_control last modified:             Tue 08 Jul 2014 06:15:58 PM EDT
Latest checkpoint location:           5/44DC5FC8
Prior checkpoint location:            5/44C58B88
Latest checkpoint's REDO location:    5/44DC5FC8
Latest checkpoint's REDO WAL file:    000000010000000500000044
Latest checkpoint's TimeLineID:       1
Latest checkpoint's PrevTimeLineID:   1
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID:          0/1558590
Latest checkpoint's NextOID:          505898
Latest checkpoint's NextMultiXactId:  3285
Latest checkpoint's NextMultiOffset:  6569
Latest checkpoint's oldestXID:        1281
Latest checkpoint's oldestXID's DB:   1
Latest checkpoint's oldestActiveXID:  0
Latest checkpoint's oldestMultiXid:   1
Latest checkpoint's oldestMulti's DB: 1
Time of latest checkpoint:            Tue 08 Jul 2014 06:15:23 PM EDT
Fake LSN counter for unlogged rels:   0/1
Minimum recovery ending location:     0/0
Min recovery ending loc's timeline:   0
Backup start location:                0/0
Backup end location:                  0/0
End-of-backup record required:        no
Current wal_level setting:            logical
Current wal_log_hints setting:        off
Current max_connections setting:      200
Current max_worker_processes setting: 8
Current max_prepared_xacts setting:   0
Current max_locks_per_xact setting:   64
Maximum data alignment:               8
Database block size:                  8192
Blocks per segment of large relation: 131072
WAL block size:                       8192
Bytes per WAL segment:                16777216
Maximum length of identifiers:        64
Maximum columns in an index:          32
Maximum size of a TOAST chunk:        1996
Size of a large-object chunk:         65793
Date/time type storage:               floating-point numbers
Float4 argument passing:              by reference
Float8 argument passing:              by reference
Data page checksum version:           2602751502
ssinger@ssinger-laptop:/usr/local/pgsql94wal/bin$


Before this postgres crashed, and seemed to have problems recovering. I 
might have hit CTRL-C but I didn't do anything drastic like issue a kill -9.


test1 [unknown] 2014-07-08 18:15:18.986 EDTFATAL:  the database system 
is in recovery mode
test1 [unknown] 2014-07-08 18:15:20.482 EDTWARNING:  terminating 
connection because of crash of another server process
test1 [unknown] 2014-07-08 18:15:20.482 EDTDETAIL:  The postmaster has 
commanded this server process to roll back the current transaction and 
exit, because another server process exited abnormally and possibly 
corrupted shared memory.
test1 [unknown] 2014-07-08 18:15:20.482 EDTHINT:  In a moment you should 
be able to reconnect to the database and repeat your command.  2014-07-08 18:15:20.483 EDTLOG:  all server processes
terminated;
 
reinitializing  2014-07-08 18:15:20.720 EDTLOG:  database system was interrupted; 
last known up at 2014-07-08 18:15:15 EDT  2014-07-08 18:15:20.865 EDTLOG:  database system was not properly 
shut down; automatic recovery in progress  2014-07-08 18:15:20.954 EDTLOG:  redo starts at 5/41023848  2014-07-08
18:15:23.153EDTLOG:  unexpected pageaddr 4/D8DC6000 in 
 
log segment 000000010000000500000044, offset 14442496  2014-07-08 18:15:23.153 EDTLOG:  redo done at 5/44DC5F60
2014-07-0818:15:23.153 EDTLOG:  last completed transaction was at 
 
log time 2014-07-08 18:15:17.874937-04
test2 [unknown] 2014-07-08 18:15:24.247 EDTFATAL:  the database system 
is in recovery mode
test2 [unknown] 2014-07-08 18:15:24.772 EDTFATAL:  the database system 
is in recovery mode
test2 [unknown] 2014-07-08 18:15:25.281 EDTFATAL:  the database system 
is in recovery mode
test1 [unknown] 2014-07-08 18:15:25.547 EDTFATAL:  the database system 
is in recovery mode
test2 [unknown] 2014-07-08 18:15:25.548 EDTFATAL:  the database system 
is in recovery mode
test3 [unknown] 2014-07-08 18:15:25.549 EDTFATAL:  the database system 
is in recovery mode
test4 [unknown] 2014-07-08 18:15:25.557 EDTFATAL:  the database system 
is in recovery mode
test5 [unknown] 2014-07-08 18:15:25.582 EDTFATAL:  the database system 
is in recovery mode
test2 [unknown] 2014-07-08 18:15:25.584 EDTFATAL:  the database system 
is in recovery mode
test1 [unknown] 2014-07-08 18:15:25.618 EDTFATAL:  the database system 
is in recovery mode
test2 [unknown] 2014-07-08 18:15:25.619 EDTFATAL:  the database system 
is in recovery mode
test3 [unknown] 2014-07-08 18:15:25.621 EDTFATAL:  the database system 
is in recovery mode
test4 [unknown] 2014-07-08 18:15:25.622 EDTFATAL:  the database system 
is in recovery mode
test5 [unknown] 2014-07-08 18:15:25.623 EDTFATAL:  the database system 
is in recovery mode
test1 [unknown] 2014-07-08 18:15:25.624 EDTFATAL:  the database system 
is in recovery mode
test1 [unknown] 2014-07-08 18:15:25.633 EDTFATAL:  the database system 
is in recovery mode
^C  2014-07-08 18:15:52.316 EDTLOG:  received fast shutdown request


The core file in gdb shows
ore was generated by `postgres: autovacuum w'.
Program terminated with signal 6, Aborted.
#0  0x00007f18be8af295 in ?? ()
(gdb) where
#0  0x00007f18be8af295 in ?? ()
#1  0x00007f18be8b2438 in ?? ()
#2  0x0000000000000020 in ?? ()
#3  0x0000000000000000 in ?? ()


I can't rule out that the hardware my laptop is misbehaving but I 
haven't noticed any other problems doing non 9.4 stuff.

Has anyone seen anything similar with 9.4? Is there anything specific I 
should investigate (I don't care about recovering the cluster).





Re: 9.4 pg_control corruption

From
Tom Lane
Date:
Steve Singer <steve@ssinger.info> writes:
> I've encountered a corrupt pg_control  file on my 9.4 development 
> cluster.  I've mostly been using the cluster for changeset extraction / 
> slony testing.

> This is a 9.4 (currently commit 6ad903d70a440e  + a walsender change 
> discussed in another thread) but would have had the initdb done with an 
> earlier 9.4 snapshot.

Somehow or other you missed the update to pg_control version number 942.
There's no obvious reason to think that this pg_control file is corrupt
on its own terms, but the pg_controldata version you're using expects
the 942 layout.  The fact that the server wasn't complaining about this
suggests that you've not recompiled the server, or at least not xlog.c.
Possibly the odd failure to restart indicates that you have a partially
updated server executable?
        regards, tom lane



Re: 9.4 pg_control corruption

From
Steve Singer
Date:
On 07/08/2014 10:14 PM, Tom Lane wrote:
> Steve Singer <steve@ssinger.info> writes:
>> I've encountered a corrupt pg_control  file on my 9.4 development
>> cluster.  I've mostly been using the cluster for changeset extraction /
>> slony testing.
>> This is a 9.4 (currently commit 6ad903d70a440e  + a walsender change
>> discussed in another thread) but would have had the initdb done with an
>> earlier 9.4 snapshot.
> Somehow or other you missed the update to pg_control version number 942.
> There's no obvious reason to think that this pg_control file is corrupt
> on its own terms, but the pg_controldata version you're using expects
> the 942 layout.  The fact that the server wasn't complaining about this
> suggests that you've not recompiled the server, or at least not xlog.c.
> Possibly the odd failure to restart indicates that you have a partially
> updated server executable?


The server  is complaining about that, it started to after the crash 
(which is why I ran pg_controldata)

ssinger@ssinger-laptop:/usr/local/pgsql94wal/bin$ ./postgres -D ../data  2014-07-08 22:28:57.796 EDTFATAL:  database
filesare incompatible 
 
with server  2014-07-08 22:28:57.796 EDTDETAIL:  The database cluster was 
initialized with PG_CONTROL_VERSION 937, but the server was compiled 
with PG_CONTROL_VERSION 942.  2014-07-08 22:28:57.796 EDTHINT:  It looks like you need to initdb.
ssinger@ssinger-laptop:/usr/local/pgsql94wal/bin$


The server seemed fine (and it was 9.4 because I was using 9.4 features)
The server crashed
The server performed crash recovery
The server server wouldn't start and pg_controldata shows the attached 
output

I wasn't recompiling or reinstalling around this time either.



>             regards, tom lane
>
>




Re: 9.4 pg_control corruption

From
Tom Lane
Date:
Steve Singer <steve@ssinger.info> writes:
> On 07/08/2014 10:14 PM, Tom Lane wrote:
>> There's no obvious reason to think that this pg_control file is corrupt
>> on its own terms, but the pg_controldata version you're using expects
>> the 942 layout.  The fact that the server wasn't complaining about this
>> suggests that you've not recompiled the server, or at least not xlog.c.

> The server  is complaining about that, it started to after the crash 

Then you updated your sources, recompiled and reinstalled, but failed to
restart the server when you did that.  Else it would have complained on
the spot.

If you had any valuable data in the installation, we could talk about how
to get it out; but since you didn't I'd suggest just re-initdb and move
on.  I don't see anything unexpected here.
        regards, tom lane



Re: 9.4 pg_control corruption

From
李海龙
Date:
Hi,dear steven && pgsql-hackers

I've encountered the similar phenonmenon with 9.4 .



1.  environment

1.1 OS version

postgres@lhl-Latitude-E5420:~$ cat /etc/issue
Ubuntu 13.10 \n \l

postgres@lhl-Latitude-E5420:~$ uname -av
Linux lhl-Latitude-E5420 3.11.0-12-generic #19-Ubuntu SMP Wed Oct 9 
16:20:46 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux


1.2 PostgreSQL version


postgres@lhl-Latitude-E5420:~$ /opt/pg94/bin/pg_controldata --version
pg_controldata (PostgreSQL) 9.4beta2
postgres@lhl-Latitude-E5420:~$ /opt/pg94/bin/pg_config
BINDIR = /opt/pg94/bin
DOCDIR = /opt/pg94/share/doc/postgresql
HTMLDIR = /opt/pg94/share/doc/postgresql
INCLUDEDIR = /opt/pg94/include
PKGINCLUDEDIR = /opt/pg94/include/postgresql
INCLUDEDIR-SERVER = /opt/pg94/include/postgresql/server
LIBDIR = /opt/pg94/lib
PKGLIBDIR = /opt/pg94/lib/postgresql
LOCALEDIR = /opt/pg94/share/locale
MANDIR = /opt/pg94/share/man
SHAREDIR = /opt/pg94/share/postgresql
SYSCONFDIR = /opt/pg94/etc/postgresql
PGXS = /opt/pg94/lib/postgresql/pgxs/src/makefiles/pgxs.mk
CONFIGURE = '--prefix=/opt/pg94' '--with-perl' '--with-libxml' 
'--with-libxslt' '--with-ossp-uuid'
CC = gcc
CPPFLAGS = -D_GNU_SOURCE -I/usr/include/libxml2
CFLAGS = -O2 -Wall -Wmissing-prototypes -Wpointer-arith 
-Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute 
-Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard
CFLAGS_SL = -fpic
LDFLAGS = -L../../../src/common -Wl,--as-needed 
-Wl,-rpath,'/opt/pg94/lib',--enable-new-dtags
LDFLAGS_EX =
LDFLAGS_SL =
LIBS = -lpgcommon -lpgport -lxslt -lxml2 -lz -lreadline -lrt -lcrypt 
-ldl -lm
VERSION = PostgreSQL 9.4beta2


2.  phenonmenon

I have a PostgreSQL datadir named /export/pg94beta1_data/ which was 
initialized with PostgreSQL 9.4beta1,



postgres@lhl-Latitude-E5420:~$ /opt/pg94/bin/pg_controldata 
/export/pg94beta1_data/
WARNING: Calculated CRC checksum does not match value stored in file.
Either the file is corrupt, or it has a different layout than this program
is expecting.  The results below are untrustworthy.

pg_control version number:            937
Catalog version number:               201405111
Database system identifier:           6014427290583411360
Database cluster state:               in production
pg_control last modified:             2014年07月27日 星期日 16时36分50秒
Latest checkpoint location:           0/17462890
Prior checkpoint location:            0/17462828
Latest checkpoint's REDO location:    0/17462890
Latest checkpoint's REDO WAL file:    000000010000000000000017
Latest checkpoint's TimeLineID:       1
Latest checkpoint's PrevTimeLineID:   1
Latest checkpoint's full_page_writes: off
Latest checkpoint's NextXID:          0/1387
Latest checkpoint's NextOID:          22220
Latest checkpoint's NextMultiXactId:  1
Latest checkpoint's NextMultiOffset:  0
Latest checkpoint's oldestXID:        715
Latest checkpoint's oldestXID's DB:   1
Latest checkpoint's oldestActiveXID:  0
Latest checkpoint's oldestMultiXid:   1
Latest checkpoint's oldestMulti's DB: 1
Time of latest checkpoint:            2014年07月27日 星期日 16时36分50秒
Fake LSN counter for unlogged rels:   0/1
Minimum recovery ending location:     0/0
Min recovery ending loc's timeline:   0
Backup start location:                0/0
Backup end location:                  0/0
End-of-backup record required:        no
Current wal_level setting:            minimal
Current wal_log_hints setting:        off
Current max_connections setting:      100
Current max_worker_processes setting: 8
Current max_prepared_xacts setting:   0
Current max_locks_per_xact setting:   64
Maximum data alignment:               8
Database block size:                  8192
Blocks per segment of large relation: 131072
WAL block size:                       8192
Bytes per WAL segment:                16777216
Maximum length of identifiers:        64
Maximum columns in an index:          32
Maximum size of a TOAST chunk:        1996
Size of a large-object chunk:         65793
Date/time type storage:               floating-point numbers
Float4 argument passing:              by reference
Float8 argument passing:              by reference
Data page checksum version:           307500851



but the server complained about the following when I started it with 
PostgreSQL 9.4beta2,

postgres@lhl-Latitude-E5420:~$  /opt/pg94/bin/pg_ctl -D 
/export/pg94beta1_data/ start
server starting
postgres@lhl-Latitude-E5420:~$ [    2014-07-27 19:23:57.922 CST 27983 
53d4e14d.6d4f 1 0]FATAL:  database files are incompatible with server
[    2014-07-27 19:23:57.922 CST 27983 53d4e14d.6d4f 2 0]DETAIL: The 
database cluster was initialized with PG_CONTROL_VERSION 937, but the 
server was compiled with PG_CONTROL_VERSION 942.
[    2014-07-27 19:23:57.922 CST 27983 53d4e14d.6d4f 3 0]HINT:  It looks 
like you need to initdb.




I always think that it should not come up the PG_CONTROL_VERSION 
mismatch when the PostgreSQL version upgrade between the small version .

Is there some important differences in PostgreSQL 9.4 ?




Thanks

Best Regards!


于 2014年07月09日 10:36, Steve Singer 写道:
> On 07/08/2014 10:14 PM, Tom Lane wrote:
>> Steve Singer <steve@ssinger.info> writes:
>>> I've encountered a corrupt pg_control file on my 9.4 development
>>> cluster.  I've mostly been using the cluster for changeset extraction /
>>> slony testing.
>>> This is a 9.4 (currently commit 6ad903d70a440e  + a walsender change
>>> discussed in another thread) but would have had the initdb done with an
>>> earlier 9.4 snapshot.
>> Somehow or other you missed the update to pg_control version number 942.
>> There's no obvious reason to think that this pg_control file is corrupt
>> on its own terms, but the pg_controldata version you're using expects
>> the 942 layout.  The fact that the server wasn't complaining about this
>> suggests that you've not recompiled the server, or at least not xlog.c.
>> Possibly the odd failure to restart indicates that you have a partially
>> updated server executable?
>
>
> The server  is complaining about that, it started to after the crash 
> (which is why I ran pg_controldata)
>
> ssinger@ssinger-laptop:/usr/local/pgsql94wal/bin$ ./postgres -D ../data
>   2014-07-08 22:28:57.796 EDTFATAL:  database files are incompatible 
> with server
>   2014-07-08 22:28:57.796 EDTDETAIL:  The database cluster was 
> initialized with PG_CONTROL_VERSION 937, but the server was compiled 
> with PG_CONTROL_VERSION 942.
>   2014-07-08 22:28:57.796 EDTHINT:  It looks like you need to initdb.
> ssinger@ssinger-laptop:/usr/local/pgsql94wal/bin$
>
>
> The server seemed fine (and it was 9.4 because I was using 9.4 features)
> The server crashed
> The server performed crash recovery
> The server server wouldn't start and pg_controldata shows the attached 
> output
>
> I wasn't recompiling or reinstalling around this time either.
>
>
>
>>             regards, tom lane
>>
>>
>
>
>

Re: 9.4 pg_control corruption

From
Tom Lane
Date:
李海龙 <hailong.li@qunar.com> writes:
> I have a PostgreSQL datadir named /export/pg94beta1_data/ which was 
> initialized with PostgreSQL 9.4beta1,
> [ and 9.4beta2 won't start with it ]

This is expected; you need to initdb.  Or use pg_upgrade to upgrade
the cluster.  We had to change pg_control format post-beta1.
        regards, tom lane



Re: 9.4 pg_control corruption

From
李海龙
Date:

Understand!

Before I wrote last email, I had initialized a new db with PostgreSQL 
9.4beta2 and restored the pg_dumpall data of /export/pg94beta1_data/


Thanks

Best Regards!

at 2014-07-28 00:35 +08, Tom Lane wrote:
> 李海龙 <hailong.li@qunar.com> writes:
>> I have a PostgreSQL datadir named /export/pg94beta1_data/ which was
>> initialized with PostgreSQL 9.4beta1,
>> [ and 9.4beta2 won't start with it ]
> This is expected; you need to initdb.  Or use pg_upgrade to upgrade
> the cluster.  We had to change pg_control format post-beta1.
>
>             regards, tom lane
>
>

Re: 9.4 pg_control corruption

From
Josh Berkus
Date:
On 07/27/2014 09:35 AM, Tom Lane wrote:
> 李海龙 <hailong.li@qunar.com> writes:
>> I have a PostgreSQL datadir named /export/pg94beta1_data/ which was 
>> initialized with PostgreSQL 9.4beta1,
>> [ and 9.4beta2 won't start with it ]
> 
> This is expected; you need to initdb.  Or use pg_upgrade to upgrade
> the cluster.  We had to change pg_control format post-beta1.

Thank you for testing that though!

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com