Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages - Mailing list pgsql-bugs

From Sergey Konoplev
Subject Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages
Date
Msg-id CAL_0b1s4QCkFy_55kk_8XWcJPs7wsgVWf8vn4=jXe6V4R7Hxmg@mail.gmail.com
Whole thread Raw
Responses Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages  (Andres Freund <andres@2ndquadrant.com>)
Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages  (Sergey Konoplev <gray.ru@gmail.com>)
Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages  (Andres Freund <andres@2ndquadrant.com>)
List pgsql-bugs
Hi,

Linux 2.6.32
PostgreSQL 9.2.6

It was suffering from this problem on 9.2.4, mostly last couple of
weeks when I had to rebuild the replica almost every 3 days, and I
hoped it would be fixed in 9.2.6, but it is not.

Yesterday I upgraded both master and replica servers to 9.2.6,
performed SET vacuum_freeze_table_age =3D 0; VACUUM; and then rebuild
the replica from scratch.

Below are replica logs, pg_controldata output, configuration and the
backtrace of the core dump. I can keep the replica untouched (without
rebuilding it) for several more hours, so please let me know if you
need me to show you something else.

Replica logs:

2013-12-19 20:51:22 MSK 19938 @ from  [vxid:1/0 txid:0] [] WARNING:
page 14833 of relation base/16436/3321003988 is uninitialized
2013-12-19 20:51:22 MSK 19938 @ from  [vxid:1/0 txid:0] [] CONTEXT:
xlog redo vacuum: rel 1663/16436/3321003988; blk 38538,
lastBlockVacuumed 0
2013-12-19 20:51:22 MSK 19938 @ from  [vxid:1/0 txid:0] [] PANIC:  WAL
contains references to invalid pages
2013-12-19 20:51:22 MSK 19938 @ from  [vxid:1/0 txid:0] [] CONTEXT:
xlog redo vacuum: rel 1663/16436/3321003988; blk 38538,
lastBlockVacuumed 0
2013-12-19 20:51:22 MSK 19935 @ from  [vxid: txid:0] [] LOG:  startup
process (PID 19938) was terminated by signal 6: Aborted
2013-12-19 20:51:22 MSK 19935 @ from  [vxid: txid:0] [] LOG:
terminating any other active server processes

pg_controldata output:

pg_control version number:            922
Catalog version number:               201204301
Database system identifier:           5858109675396804534
Database cluster state:               in archive recovery
pg_control last modified:             Thu 19 Dec 2013 08:37:31 PM MSK
Latest checkpoint location:           3271/79F16848
Prior checkpoint location:            3270/77E32598
Latest checkpoint's REDO location:    3270/929671E0
Latest checkpoint's TimeLineID:       2
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID:          4/1572485244
Latest checkpoint's NextOID:          3349674903
Latest checkpoint's NextMultiXactId:  6730476
Latest checkpoint's NextMultiOffset:  30866630
Latest checkpoint's oldestXID:        1340099111
Latest checkpoint's oldestXID's DB:   16436
Latest checkpoint's oldestActiveXID:  1572038146
Time of latest checkpoint:            Thu 19 Dec 2013 07:00:09 PM MSK
Minimum recovery ending location:     3272/666EE368
Backup start location:                0/0
Backup end location:                  0/0
End-of-backup record required:        no
Current wal_level setting:            hot_standby
Current max_connections setting:      550
Current max_prepared_xacts setting:   0
Current max_locks_per_xact setting:   64
Maximum data alignment:               8
Database block size:                  8192
Blocks per segment of large relation: 131072
WAL block size:                       8192
Bytes per WAL segment:                16777216
Maximum length of identifiers:        64
Maximum columns in an index:          32
Maximum size of a TOAST chunk:        1996
Date/time type storage:               64-bit integers
Float4 argument passing:              by value
Float8 argument passing:              by value

Non default configuration:

 application_name =3D psql
 archive_command =3D (disabled)
 autovacuum_analyze_scale_factor =3D 0.05
 autovacuum_max_workers =3D 5
 autovacuum_naptime =3D 5
 autovacuum_vacuum_cost_delay =3D 5
 autovacuum_vacuum_scale_factor =3D 0.05
 bgwriter_delay =3D 10
 bgwriter_lru_multiplier =3D 10
 checkpoint_completion_target =3D 0.9
 checkpoint_segments =3D 1024
 checkpoint_timeout =3D 3600
 checkpoint_warning =3D 300
 client_encoding =3D UTF8
 DateStyle =3D ISO, DMY
 default_text_search_config =3D pg_catalog.russian
 effective_cache_size =3D 17432576
 effective_io_concurrency =3D 32
 hot_standby =3D on
 hot_standby_feedback =3D on
 lc_collate =3D ru_RU.UTF-8
 lc_ctype =3D ru_RU.UTF-8
 lc_messages =3D en_US.UTF-8
 lc_monetary =3D ru_RU.UTF-8
 lc_numeric =3D ru_RU.UTF-8
 lc_time =3D ru_RU.UTF-8
 listen_addresses =3D *
 log_checkpoints =3D on
 log_directory =3D /var/log/pgsql
 log_file_mode =3D 0600
 log_filename =3D postgresql-%Y-%m-%d.log
 log_line_prefix =3D %t %p %u@%d from %h [vxid:%v txid:%x] [%i]
 log_lock_waits =3D on
 log_rotation_size =3D 0
 log_statement =3D ddl
 log_timezone =3D W-SU
 log_truncate_on_rotation =3D on
 logging_collector =3D on
 maintenance_work_mem =3D 1048576
 max_connections =3D 550
 max_stack_depth =3D 2048
 max_standby_streaming_delay =3D 300000
 max_wal_senders =3D 3
 pg_stat_statements.track =3D all
 port =3D 6432
 server_encoding =3D UTF8
 shared_buffers =3D 4587520
 shared_preload_libraries =3D pg_stat_statements
 synchronous_commit =3D off
 TimeZone =3D W-SU
 track_activity_query_size =3D 4096
 transaction_isolation =3D read committed
 unix_socket_permissions =3D 0777
 wal_buffers =3D 2048
 wal_keep_segments =3D 2048
 wal_level =3D hot_standby
 work_mem =3D 262144

Backtrace of the coredump:

Core was generated by `postgres: startup process   recovering
000000020000327200000066 '.
Program terminated with signal 6, Aborted.
#0  0x00007f6a5a8818a5 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install
postgresql92-server-9.2.6-1PGDG.el6.x86_64
(gdb) set print pretty on
(gdb) bt f
#0  0x00007f6a5a8818a5 in raise () from /lib64/libc.so.6
No symbol table info available.
#1  0x00007f6a5a883085 in abort () from /lib64/libc.so.6
No symbol table info available.
#2  0x000000000071c580 in errfinish (dummy=3D<value optimized out>) at elog=
.c:534
        edata =3D 0xb88fc0
        elevel =3D 22
        oldcontext =3D 0x134a740
        econtext =3D 0x0
        __func__ =3D "errfinish"
#3  0x000000000071d09c in elog_finish (elevel=3D<value optimized out>,
fmt=3D<value optimized out>) at elog.c:1213
        edata =3D 0xb88fc0
        oldcontext =3D 0x134a740
        __func__ =3D "elog_finish"
#4  0x00000000004a6d6d in log_invalid_page (node=3D...,
forkno=3DMAIN_FORKNUM, blkno=3D14833, present=3D1 '\001') at xlogutils.c:91
        key =3D {
          node =3D {
            spcNode =3D 1663,
            dbNode =3D 16436,
            relNode =3D 3321003988
          },
          forkno =3D 25867822,
          blkno =3D 0
        }
        hentry =3D <value optimized out>
        found =3D <value optimized out>
        __func__ =3D "log_invalid_page"
#5  0x00000000004a6ea0 in XLogReadBufferExtended (rnode=3D...,
forknum=3DMAIN_FORKNUM, blkno=3D14833, mode=3DRBM_NORMAL) at xlogutils.c:35=
8
        page =3D <value optimized out>
        lastblock =3D 38539
        buffer =3D <value optimized out>
        smgr =3D <value optimized out>
#6  0x00000000004897a2 in btree_xlog_vacuum (lsn=3D<value optimized
out>, record=3D0x7f6a4fb20010) at nbtxlog.c:510
        blkno =3D 14833
        xlrec =3D 0x7f6a4fb20030
        buffer =3D <value optimized out>
        page =3D <value optimized out>
        opaque =3D <value optimized out>
#7  btree_redo (lsn=3D<value optimized out>, record=3D0x7f6a4fb20010) at
nbtxlog.c:1062
        info =3D <value optimized out>
       __func__ =3D "btree_redo"
#8  0x00000000004a3468 in StartupXLOG () at xlog.c:6931
        recoveryContinue =3D 1 '\001'
        recoveryApply =3D <value optimized out>
        errcontext =3D {
          previous =3D 0x0,
          callback =3D 0x49d5a0 <rm_redo_error_callback>,
          arg =3D 0x7f6a4fb20010
        }
        xtime =3D <value optimized out>
        rmid =3D <value optimized out>
        xlogctl =3D 0x7f6a6de08480
        Insert =3D <value optimized out>
        checkPoint =3D {
          redo =3D {
            xlogid =3D 12896,
            xrecoff =3D 4261478784
          },
          ThisTimeLineID =3D 2,
          fullPageWrites =3D 1 '\001',
          nextXidEpoch =3D 4,
          nextXid =3D 1452297001,
          nextOid =3D 3344849815,
          nextMulti =3D 6695025,
          nextMultiOffset =3D 30699263,
          oldestXid =3D 1256929431,
          oldestXidDB =3D 12865,
          time =3D 1387392623,
          oldestActiveXid =3D 1452282892
        }
        wasShutdown =3D <value optimized out>
        reachedStopPoint =3D 0 '\000'
        haveBackupLabel =3D <value optimized out>
        RecPtr =3D <value optimized out>
        checkPointLoc =3D {
          xlogid =3D 12896,
          xrecoff =3D 4271100680
        }
        EndOfLog =3D <value optimized out>
        endLogId =3D <value optimized out>
        endLogSeg =3D <value optimized out>
        record =3D 0x7f6a4fb20010
        freespace =3D <value optimized out>
        oldestActiveXID =3D <value optimized out>
        backupEndRequired =3D <value optimized out>
        backupFromStandby =3D <value optimized out>
        dbstate_at_startup =3D <value optimized out>
        __func__ =3D "StartupXLOG"
#9  0x00000000006153b8 in StartupProcessMain () at startup.c:220
        __func__ =3D "StartupProcessMain"
#10 0x00000000004c3837 in AuxiliaryProcessMain (argc=3D2,
argv=3D0x7fffa02b0100) at bootstrap.c:418
        progname =3D 0x854a1a "postgres"
        flag =3D <value optimized out>
        userDoption =3D 0x0
        __func__ =3D "AuxiliaryProcessMain"
#11 0x00000000006105f3 in StartChildProcess (type=3DStartupProcess) at
postmaster.c:4522
        pid =3D <value optimized out>
        av =3D {0x854a1a "postgres", 0x7fffa02b0150 "-x2", 0x0,
          0x7fffa02f0814
"H\213\025e\376\377\377H+B(\213J8H#B0H\017\257\301\213J<H\323\370D;\"u\245H=
\003C\b1\322\061\311H\213\063H=3D\377=C9=9A;v\033\017\037\204",
          0x7fffa02b0170 "\300\226\064\001", 0x0, 0x7fffa02b0150 "-x2",
          0x7fffa02f0814
"H\213\025e\376\377\377H+B(\213J8H#B0H\017\257\301\213J<H\323\370D;\"u\245H=
\003C\b1\322\061\311H\213\063H=3D\377=C9=9A;v\033\017\037\204",
0xb41080 ">-\262R",
          0x0}
        ac =3D 2
        typebuf =3D
"-x2\000\377\177\000\000\274\b/\240\377\177\000\000\005\000\000\000\000\000=
\000\000\005\000\000\000\000\000\000"
        __func__ =3D "StartChildProcess"
#12 0x0000000000614c3c in PostmasterMain (argc=3D<value optimized out>,
argv=3D<value optimized out>) at postmaster.c:1123
        opt =3D <value optimized out>
        status =3D <value optimized out>
        userDoption =3D <value optimized out>
        listen_addr_saved =3D 1 '\001'
        i =3D <value optimized out>
        __func__ =3D "PostmasterMain"
#13 0x00000000005b37e0 in main (argc=3D5, argv=3D0x13496a0) at main.c:199
No locals.


--=20
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

http://www.linkedin.com/in/grayhemp
+1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979
gray.ru@gmail.com

pgsql-bugs by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: Update with subselect sometimes returns wrong result
Next
From: Andres Freund
Date:
Subject: Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages