Standby stopped working after PANIC: WAL contains references to invalid pages - Mailing list pgsql-general

From Dan Kogan
Subject Standby stopped working after PANIC: WAL contains references to invalid pages
Date
Msg-id 60B572D9298D944580F7D51195DD3080468902FFCA@VMBX125.ihostexchange.net
Whole thread Raw
Responses Re: Standby stopped working after PANIC: WAL contains references to invalid pages  (Lonni J Friedman <netllama@gmail.com>)
List pgsql-general

Hello,

 

Today our standby instance stopped working with this error in the log:

 

2013-06-22 16:27:32 UTC [8367]: [247-1] [] WARNING:  page 158130 of relation pg_tblspc/16447/PG_9.2_201204301/16448/39154429 is uninitialized

2013-06-22 16:27:32 UTC [8367]: [248-1] [] CONTEXT:  xlog redo vacuum: rel 16447/16448/39154429; blk 158134, lastBlockVacuumed 158129

2013-06-22 16:27:32 UTC [8367]: [249-1] [] PANIC:  WAL contains references to invalid pages

2013-06-22 16:27:32 UTC [8367]: [250-1] [] CONTEXT:  xlog redo vacuum: rel 16447/16448/39154429; blk 158134, lastBlockVacuumed 158129

2013-06-22 16:27:32 UTC [8366]: [3-1] [] LOG:  startup process (PID 8367) was terminated by signal 6: Aborted

2013-06-22 16:27:32 UTC [8366]: [4-1] [] LOG:  terminating any other active server processes

 

After re-start the same exact error occurred.

 

We thought that maybe we hit this bug - http://postgresql.1045698.n5.nabble.com/Completely-broken-replica-after-PANIC-WAL-contains-references-to-invalid-pages-td5750072.html.

However, there is nothing in our log about sub-transactions, so it didn't seem the same to us.

 

Any advice on how to further debug this so we can avoid this in the future is appreciated.

 

Environment:

 

AWS, High I/O instance (hi1.4xlarge), 60GB RAM

 

Software and settings:

 

PostgreSQL 9.2.4 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu/Linaro 4.5.2-8ubuntu4) 4.5.2, 64-bit

 

archive_command          rsync -a %p slave:/var/lib/postgresql/replication_load/%f

archive_mode   on

autovacuum_freeze_max_age 1000000000

autovacuum_max_workers        6

checkpoint_completion_target 0.9

checkpoint_segments   128

checkpoint_timeout       30min

default_text_search_config       pg_catalog.english

hot_standby      on

lc_messages      en_US.UTF-8

lc_monetary      en_US.UTF-8

lc_numeric          en_US.UTF-8

lc_time en_US.UTF-8

listen_addresses              *

log_checkpoints               on

log_destination                stderr

log_line_prefix %t [%p]: [%l-1] [%h]

log_min_duration_statement    -1

log_min_error_statement           error

log_min_messages         error

log_timezone    UTC

maintenance_work_mem           1GB

max_connections            1200

max_standby_streaming_delay                90s

max_wal_senders           5

port       5432

random_page_cost        2

seq_page_cost 1

shared_buffers                4GB

ssl           off

ssl_cert_file       /etc/ssl/certs/ssl-cert-snakeoil.pem

ssl_key_file        /etc/ssl/private/ssl-cert-snakeoil.key

synchronous_commit    off

TimeZone            UTC

wal_keep_segments     128

wal_level             hot_standby

work_mem        8MB

 

root@ip-10-148-131-236:~# /usr/local/pgsql/bin/pg_controldata /usr/local/pgsql/data

pg_control version number:            922

Catalog version number:               201204301

Database system identifier:           5838668587531239413

Database cluster state:               in archive recovery

pg_control last modified:             Sat 22 Jun 2013 06:13:07 PM UTC

Latest checkpoint location:           2250/18CA0790

Prior checkpoint location:            2250/18CA0790

Latest checkpoint's REDO location:    224F/E127B078

Latest checkpoint's TimeLineID:       2

Latest checkpoint's full_page_writes: on

Latest checkpoint's NextXID:          1/2018629527

Latest checkpoint's NextOID:          43086248

Latest checkpoint's NextMultiXactId:  7088726

Latest checkpoint's NextMultiOffset:  20617234

Latest checkpoint's oldestXID:        1690316999

Latest checkpoint's oldestXID's DB:   16448

Latest checkpoint's oldestActiveXID:  2018629527

Time of latest checkpoint:            Sat 22 Jun 2013 03:24:05 PM UTC

Minimum recovery ending location:     2251/5EA631F0

Backup start location:                0/0

Backup end location:                  0/0

End-of-backup record required:        no

Current wal_level setting:            hot_standby

Current max_connections setting:      1200

Current max_prepared_xacts setting:   0

Current max_locks_per_xact setting:   64

Maximum data alignment:               8

Database block size:                  8192

Blocks per segment of large relation: 131072

WAL block size:                       8192

Bytes per WAL segment:                16777216

Maximum length of identifiers:        64

Maximum columns in an index:          32

Maximum size of a TOAST chunk:        1996

Date/time type storage:               64-bit integers

Float4 argument passing:              by value

Float8 argument passing:              by value

root@ip-10-148-131-236:~#

 

Thanks again.

 

Dan

pgsql-general by date:

Previous
From: Michael Angeletti
Date:
Subject: WAL archiving not starting at the beginning
Next
From: Lonni J Friedman
Date:
Subject: Re: Standby stopped working after PANIC: WAL contains references to invalid pages