9.4 pg_control corruption - Mailing list pgsql-hackers

From Steve Singer
Subject 9.4 pg_control corruption
Date
Msg-id BLU436-SMTP2539162CBA275312AE14DFADC0F0@phx.gbl
Whole thread Raw
Responses Re: 9.4 pg_control corruption  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
I've encountered a corrupt pg_control  file on my 9.4 development 
cluster.  I've mostly been using the cluster for changeset extraction / 
slony testing.

This is a 9.4 (currently commit 6ad903d70a440e  + a walsender change 
discussed in another thread) but would have had the initdb done with an 
earlier 9.4 snapshot.



/usr/local/pgsql94wal/bin$ ./pg_controldata ../data
WARNING: Calculated CRC checksum does not match value stored in file.
Either the file is corrupt, or it has a different layout than this program
is expecting.  The results below are untrustworthy.

pg_control version number:            937
Catalog version number:               201405111
Database system identifier:           6014096177254975326
Database cluster state:               in production
pg_control last modified:             Tue 08 Jul 2014 06:15:58 PM EDT
Latest checkpoint location:           5/44DC5FC8
Prior checkpoint location:            5/44C58B88
Latest checkpoint's REDO location:    5/44DC5FC8
Latest checkpoint's REDO WAL file:    000000010000000500000044
Latest checkpoint's TimeLineID:       1
Latest checkpoint's PrevTimeLineID:   1
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID:          0/1558590
Latest checkpoint's NextOID:          505898
Latest checkpoint's NextMultiXactId:  3285
Latest checkpoint's NextMultiOffset:  6569
Latest checkpoint's oldestXID:        1281
Latest checkpoint's oldestXID's DB:   1
Latest checkpoint's oldestActiveXID:  0
Latest checkpoint's oldestMultiXid:   1
Latest checkpoint's oldestMulti's DB: 1
Time of latest checkpoint:            Tue 08 Jul 2014 06:15:23 PM EDT
Fake LSN counter for unlogged rels:   0/1
Minimum recovery ending location:     0/0
Min recovery ending loc's timeline:   0
Backup start location:                0/0
Backup end location:                  0/0
End-of-backup record required:        no
Current wal_level setting:            logical
Current wal_log_hints setting:        off
Current max_connections setting:      200
Current max_worker_processes setting: 8
Current max_prepared_xacts setting:   0
Current max_locks_per_xact setting:   64
Maximum data alignment:               8
Database block size:                  8192
Blocks per segment of large relation: 131072
WAL block size:                       8192
Bytes per WAL segment:                16777216
Maximum length of identifiers:        64
Maximum columns in an index:          32
Maximum size of a TOAST chunk:        1996
Size of a large-object chunk:         65793
Date/time type storage:               floating-point numbers
Float4 argument passing:              by reference
Float8 argument passing:              by reference
Data page checksum version:           2602751502
ssinger@ssinger-laptop:/usr/local/pgsql94wal/bin$


Before this postgres crashed, and seemed to have problems recovering. I 
might have hit CTRL-C but I didn't do anything drastic like issue a kill -9.


test1 [unknown] 2014-07-08 18:15:18.986 EDTFATAL:  the database system 
is in recovery mode
test1 [unknown] 2014-07-08 18:15:20.482 EDTWARNING:  terminating 
connection because of crash of another server process
test1 [unknown] 2014-07-08 18:15:20.482 EDTDETAIL:  The postmaster has 
commanded this server process to roll back the current transaction and 
exit, because another server process exited abnormally and possibly 
corrupted shared memory.
test1 [unknown] 2014-07-08 18:15:20.482 EDTHINT:  In a moment you should 
be able to reconnect to the database and repeat your command.  2014-07-08 18:15:20.483 EDTLOG:  all server processes
terminated;
 
reinitializing  2014-07-08 18:15:20.720 EDTLOG:  database system was interrupted; 
last known up at 2014-07-08 18:15:15 EDT  2014-07-08 18:15:20.865 EDTLOG:  database system was not properly 
shut down; automatic recovery in progress  2014-07-08 18:15:20.954 EDTLOG:  redo starts at 5/41023848  2014-07-08
18:15:23.153EDTLOG:  unexpected pageaddr 4/D8DC6000 in 
 
log segment 000000010000000500000044, offset 14442496  2014-07-08 18:15:23.153 EDTLOG:  redo done at 5/44DC5F60
2014-07-0818:15:23.153 EDTLOG:  last completed transaction was at 
 
log time 2014-07-08 18:15:17.874937-04
test2 [unknown] 2014-07-08 18:15:24.247 EDTFATAL:  the database system 
is in recovery mode
test2 [unknown] 2014-07-08 18:15:24.772 EDTFATAL:  the database system 
is in recovery mode
test2 [unknown] 2014-07-08 18:15:25.281 EDTFATAL:  the database system 
is in recovery mode
test1 [unknown] 2014-07-08 18:15:25.547 EDTFATAL:  the database system 
is in recovery mode
test2 [unknown] 2014-07-08 18:15:25.548 EDTFATAL:  the database system 
is in recovery mode
test3 [unknown] 2014-07-08 18:15:25.549 EDTFATAL:  the database system 
is in recovery mode
test4 [unknown] 2014-07-08 18:15:25.557 EDTFATAL:  the database system 
is in recovery mode
test5 [unknown] 2014-07-08 18:15:25.582 EDTFATAL:  the database system 
is in recovery mode
test2 [unknown] 2014-07-08 18:15:25.584 EDTFATAL:  the database system 
is in recovery mode
test1 [unknown] 2014-07-08 18:15:25.618 EDTFATAL:  the database system 
is in recovery mode
test2 [unknown] 2014-07-08 18:15:25.619 EDTFATAL:  the database system 
is in recovery mode
test3 [unknown] 2014-07-08 18:15:25.621 EDTFATAL:  the database system 
is in recovery mode
test4 [unknown] 2014-07-08 18:15:25.622 EDTFATAL:  the database system 
is in recovery mode
test5 [unknown] 2014-07-08 18:15:25.623 EDTFATAL:  the database system 
is in recovery mode
test1 [unknown] 2014-07-08 18:15:25.624 EDTFATAL:  the database system 
is in recovery mode
test1 [unknown] 2014-07-08 18:15:25.633 EDTFATAL:  the database system 
is in recovery mode
^C  2014-07-08 18:15:52.316 EDTLOG:  received fast shutdown request


The core file in gdb shows
ore was generated by `postgres: autovacuum w'.
Program terminated with signal 6, Aborted.
#0  0x00007f18be8af295 in ?? ()
(gdb) where
#0  0x00007f18be8af295 in ?? ()
#1  0x00007f18be8b2438 in ?? ()
#2  0x0000000000000020 in ?? ()
#3  0x0000000000000000 in ?? ()


I can't rule out that the hardware my laptop is misbehaving but I 
haven't noticed any other problems doing non 9.4 stuff.

Has anyone seen anything similar with 9.4? Is there anything specific I 
should investigate (I don't care about recovering the cluster).





pgsql-hackers by date:

Previous
From: Mark Kirkwood
Date:
Subject: Re: postgresql.auto.conf and reload
Next
From: Tom Lane
Date:
Subject: Re: 9.4 pg_control corruption