Thread: Completely broken replica after PANIC: WAL contains references to invalid pages

Completely broken replica after PANIC: WAL contains references to invalid pages

From

Sergey Konoplev

Date:

29 March 2013, 08:30:37

Hi all,

A couple of days ago I found the replica stopped after the PANIC message:

PANIC:  WAL contains references to invalid pages

When I tried to restart it I got this FATAL:

FATAL:  could not access status of transaction 280557568

Below is the description of the server and information from PostgreSQL
and system logs. After googling the problem I have found nothing like
this.

Any thoughts of what it could be and how to prevent it in the future?

Hardware:

IBM System x3650 M4, 148GB RAM, NAS

Software:

PostgreSQL 9.2.3, yum.postgresql.org
CentOS 6.3, kernel 2.6.32-279.22.1.el6.x86_64

Configuration:

listen_addresses = '*'
max_connections = 550
shared_buffers = 35GB
work_mem = 256MB
maintenance_work_mem = 1GB
bgwriter_delay = 10ms
bgwriter_lru_multiplier = 10.0
effective_io_concurrency = 32
wal_level = hot_standby
synchronous_commit = off
checkpoint_segments = 1024
checkpoint_timeout = 1h
checkpoint_completion_target = 0.9
checkpoint_warning = 5min
max_wal_senders = 3
wal_keep_segments = 2048
hot_standby = on
max_standby_streaming_delay = 5min
hot_standby_feedback = on
effective_cache_size = 133GB
log_directory = '/var/log/pgsql'
log_filename = 'postgresql-%Y-%m-%d.log'
log_checkpoints = on
log_line_prefix = '%t %p %u@%d from %h [vxid:%v txid:%x] [%i] '
log_lock_waits = on
log_statement = 'ddl'
log_timezone = 'W-SU'
track_activity_query_size = 4096
autovacuum_max_workers = 5
autovacuum_naptime = 5s
autovacuum_vacuum_scale_factor = 0.05
autovacuum_analyze_scale_factor = 0.05
autovacuum_vacuum_cost_delay = 5ms
datestyle = 'iso, dmy'
timezone = 'W-SU'
lc_messages = 'en_US.UTF-8'
lc_monetary = 'ru_RU.UTF-8'
lc_numeric = 'ru_RU.UTF-8'
lc_time = 'ru_RU.UTF-8'
default_text_search_config = 'pg_catalog.russian'

System:

# Controls the maximum shared segment size, in bytes
kernel.shmmax = 53287555072

# Controls the maximum number of shared memory segments, in pages
kernel.shmall = 13009657

# Maximum number of file-handles
fs.file-max = 65535

# pdflush tuning to prevent lag spikes
vm.dirty_ratio = 10
vm.dirty_background_ratio = 1
vm.dirty_expire_centisecs = 499

# Prevent the scheduler breakdown
kernel.sched_migration_cost = 5000000

# Turned off to provide more CPU to PostgreSQL
kernel.sched_autogroup_enabled = 0

# Setup hugepages
vm.hugetlb_shm_group = 26
vm.hugepages_treat_as_movable = 0
vm.nr_overcommit_hugepages = 512

# The Huge Page Size is 2048kB, so for 35GB shared buffers the number is 17920
vm.nr_hugepages = 17920

# Turn off the NUMA local pages reclaim as it leads to wrong caching
strategy for databases
vm.zone_reclaim_mode = 0

Environment:

HUGETLB_SHM=yes
LD_PRELOAD='/usr/lib64/libhugetlbfs.so'
export HUGETLB_SHM LD_PRELOAD

When it is stopped:

2013-03-26 11:50:32 MSK 3775 @ from  [vxid: txid:0] [] LOG:
restartpoint complete: wrote 1685004 buffers (36.7%); 0 transaction
log file(s) added, 0 removed, 555 recycled; write=3237.402 s,
sync=0.071 s, total=3237.507 s; sync files=2673, longest=0.008 s,
average=0.000 s
2013-03-26 11:50:32 MSK 3775 @ from  [vxid: txid:0] [] LOG:  recovery
restart point at 2538/6E154AC0
2013-03-26 11:50:32 MSK 3775 @ from  [vxid: txid:0] [] DETAIL:  last
completed transaction was at log time 2013-03-26 11:50:31.613948+04
2013-03-26 11:50:32 MSK 3775 @ from  [vxid: txid:0] [] LOG:
restartpoint starting: xlog
2013-03-26 11:51:16 MSK 3773 @ from  [vxid:1/0 txid:0] [] WARNING:
page 451 of relation base/16436/2686702648 is uninitialized
2013-03-26 11:51:16 MSK 3773 @ from  [vxid:1/0 txid:0] [] CONTEXT:
xlog redo vacuum: rel 1663/16436/2686702648; blk 2485,
lastBlockVacuumed 0
2013-03-26 11:51:16 MSK 3773 @ from  [vxid:1/0 txid:0] [] PANIC:  WAL
contains references to invalid pages
2013-03-26 11:51:16 MSK 3773 @ from  [vxid:1/0 txid:0] [] CONTEXT:
xlog redo vacuum: rel 1663/16436/2686702648; blk 2485,
lastBlockVacuumed 0
2013-03-26 11:51:16 MSK 3770 @ from  [vxid: txid:0] [] LOG:  startup
process (PID 3773) was terminated by signal 6: Aborted
2013-03-26 11:51:16 MSK 3770 @ from  [vxid: txid:0] [] LOG:
terminating any other active server processes

From /var/log/messages:

Mar 26 10:50:52 tms2 kernel: : postmaster: page allocation failure.
order:8, mode:0xd0
Mar 26 10:50:52 tms2 kernel: : Pid: 3774, comm: postmaster Not tainted
2.6.32-279.22.1.el6.x86_64 #1
Mar 26 10:50:52 tms2 kernel: : Call Trace:
Mar 26 10:50:52 tms2 kernel: : [<ffffffff8112343f>] ?
__alloc_pages_nodemask+0x77f/0x940
Mar 26 10:50:52 tms2 kernel: : [<ffffffff8115d3e2>] ? kmem_getpages+0x62/0x170
Mar 26 10:50:52 tms2 kernel: : [<ffffffff8115dffa>] ? fallback_alloc+0x1ba/0x270
Mar 26 10:50:52 tms2 kernel: : [<ffffffff8115da4f>] ? cache_grow+0x2cf/0x320
Mar 26 10:50:52 tms2 kernel: : [<ffffffff8115dd79>] ?
____cache_alloc_node+0x99/0x160
Mar 26 10:50:52 tms2 kernel: : [<ffffffff813fd455>] ?
dma_pin_iovec_pages+0xb5/0x230
Mar 26 10:50:52 tms2 kernel: : [<ffffffff8115eb49>] ? __kmalloc+0x189/0x220
Mar 26 10:50:52 tms2 kernel: : [<ffffffff813fd455>] ?
dma_pin_iovec_pages+0xb5/0x230
Mar 26 10:50:52 tms2 kernel: : [<ffffffff8141a47c>] ? lock_sock_nested+0xac/0xc0
Mar 26 10:50:52 tms2 kernel: : [<ffffffff8146edaa>] ? tcp_recvmsg+0x4ca/0xe80
Mar 26 10:50:52 tms2 kernel: : [<ffffffffa029a67e>] ?
xfs_vm_write_end+0x2e/0x60 [xfs]
Mar 26 10:50:52 tms2 kernel: : [<ffffffffa0292f39>] ?
xfs_trans_unlocked_item+0x39/0x60 [xfs]
Mar 26 10:50:52 tms2 kernel: : [<ffffffff8148f90a>] ? inet_recvmsg+0x5a/0x90
Mar 26 10:50:52 tms2 kernel: : [<ffffffff81418b93>] ? sock_recvmsg+0x133/0x160
Mar 26 10:50:52 tms2 kernel: : [<ffffffffa029f542>] ?
xfs_rw_iunlock+0x32/0x40 [xfs]
Mar 26 10:50:52 tms2 kernel: : [<ffffffff81090be0>] ?
autoremove_wake_function+0x0/0x40
Mar 26 10:50:52 tms2 kernel: : [<ffffffff81173fff>] ? __dentry_open+0x23f/0x360
Mar 26 10:50:52 tms2 kernel: : [<ffffffff811764ca>] ? do_sync_write+0xfa/0x140
Mar 26 10:50:52 tms2 kernel: : [<ffffffff811875d0>] ? do_filp_open+0x780/0xd60
Mar 26 10:50:52 tms2 kernel: : [<ffffffff81418d0e>] ? sys_recvfrom+0xee/0x180
Mar 26 10:50:52 tms2 kernel: : [<ffffffff81183105>] ? putname+0x35/0x50
Mar 26 10:50:52 tms2 kernel: : [<ffffffff81176842>] ? vfs_write+0x132/0x1a0
Mar 26 10:50:52 tms2 kernel: : [<ffffffff810d3f47>] ?
audit_syscall_entry+0x1d7/0x200
Mar 26 10:50:52 tms2 kernel: : [<ffffffff8100b072>] ?
system_call_fastpath+0x16/0x1b

After restart:

2013-03-27 10:43:01 MSK 1535 @ from  [vxid: txid:0] [] LOG:  database
system was interrupted while in recovery at log time 2013-03-26
10:08:37 MSK
2013-03-27 10:43:01 MSK 1535 @ from  [vxid: txid:0] [] HINT:  If this
has occurred more than once some data might be corrupted and you might
need to choose an earlier recovery target.
2013-03-27 10:43:01 MSK 1535 @ from  [vxid: txid:0] [] LOG:  entering
standby mode
2013-03-27 10:43:01 MSK 1535 @ from  [vxid:1/0 txid:0] [] LOG:  redo
starts at 2538/6E154AC0
2013-03-27 10:43:02 MSK 1535 @ from  [vxid:1/0 txid:0] [] LOG:  file
"pg_subtrans/10B7" doesn't exist, reading as zeroes
2013-03-27 10:43:02 MSK 1535 @ from  [vxid:1/0 txid:0] [] CONTEXT:
xlog redo xid assignment xtop 280482186: subxacts: 280483328 280483333
280483343 280483354 280483356 280483368 280483377 280483382 280483392
280483404 280483416 280483429 280483440 280483451 280483460 280483472
280483487 280483500 280483516 280483530 280483541 280483555 280483565
280483574 280483585 280483595 280483604 280483607 280483617 280483626
280483636 280483646 280483656 280483665 280483677 280483688 280483699
280483709 280483719 280483730 280483739 280483749 280483759 280483761
280483771 280483782 280483799 280483800 280483811 280483821 280483824
280483836 280483847 280483859 280483871 280483874 280483883 280483897
280483906 280483915 280483925 280483937 280483948 280483958
2013-03-27 10:43:02 MSK 1535 @ from  [vxid:1/0 txid:0] [] LOG:  file
"pg_subtrans/10B7" doesn't exist, reading as zeroes
2013-03-27 10:43:02 MSK 1535 @ from  [vxid:1/0 txid:0] [] CONTEXT:
xlog redo xid assignment xtop 280482270: subxacts: 280485056 280485070
280485083 280485086 280485098 280485113 280485132 280485144 280485156
280485167 280485178 280485188 280485201 280485217 280485234 280485249
280485267 280485293 280485309 280485327 280485333 280485345 280485353
280485373 280485388 280485405 280485420 280485434 280485457 280485476
280485482 280485507 280485516 280485531 280485537 280485550 280485565
280485568 280485585 280485587 280485601 280485613 280485634 280485639
280485656 280485669 280485684 280485690 280485693 280485712 280485730
280485754 280485757 280485779 280485801 280485808 280485811 280485830
280485856 280485880 280485900 280485920 280485941 280485946

[ skipped several more messages of this kind]

2013-03-27 10:43:03 MSK 1535 @ from  [vxid:1/0 txid:0] [] LOG:  file
"pg_subtrans/10B8" doesn't exist, reading as zeroes
2013-03-27 10:43:03 MSK 1535 @ from  [vxid:1/0 txid:0] [] CONTEXT:
xlog redo xid assignment xtop 280549936: subxacts: 280555514 280555515
280555516 280555517 280555518 280555519 280555520 280555521 280555522
280555523 280555524 280555525 280555526 280555527 280555528 280555529
280555530 280555531 280555532 280555533 280555534 280555535 280555536
280555537 280555538 280555539 280555540 280555541 280555542 280555543
280555544 280555545 280555546 280555547 280555548 280555549 280555550
280555551 280555552 280555553 280555554 280555555 280555556 280555557
280555558 280555559 280555560 280555561 280555562 280555563 280555564
280555565 280555566 280555567 280555568 280555569 280555570 280555571
280555572 280555573 280555574 280555575 280555576 280555577
2013-03-27 10:43:03 MSK 1535 @ from  [vxid:1/0 txid:0] [] FATAL:
could not access status of transaction 280557568
2013-03-27 10:43:03 MSK 1535 @ from  [vxid:1/0 txid:0] [] DETAIL:
Could not read from file "pg_subtrans/10B8" at offset 253952: Success.
2013-03-27 10:43:03 MSK 1535 @ from  [vxid:1/0 txid:0] [] CONTEXT:
xlog redo xid assignment xtop 280555981: subxacts: 280557520 280557521
280557522 280557523 280557524 280557525 280557526 280557527 280557528
280557529 280557530 280557531 280557532 280557533 280557534 280557535
280557536 280557537 280557538 280557539 280557540 280557541 280557542
280557543 280557544 280557545 280557546 280557547 280557548 280557549
280557550 280557551 280557552 280557553 280557554 280557555 280557556
280557557 280557558 280557559 280557560 280557561 280557562 280557563
280557564 280557565 280557566 280557567 280557568 280557569 280557570
280557571 280557572 280557573 280557574 280557575 280557576 280557577
280557578 280557579 280557580 280557581 280557582 280557583
2013-03-27 10:43:03 MSK 1532 @ from  [vxid: txid:0] [] LOG:  startup
process (PID 1535) exited with exit code 1
2013-03-27 10:43:03 MSK 1532 @ from  [vxid: txid:0] [] LOG:
terminating any other active server processes

Thank you in advance.

--
Kind regards,
Sergey Konoplev
Database and Software Consultant

Profile: http://www.linkedin.com/in/grayhemp
Phone: USA +1 (415) 867-9984, Russia +7 (901) 903-0499, +7 (988) 888-1979
Skype: gray-hemp
Jabber: gray.ru@gmail.com

Re: Completely broken replica after PANIC: WAL contains references to invalid pages

From

"anarazel@anarazel.de"

Date:

29 March 2013, 23:53:10

Hi

Sergey Konoplev <gray.ru@gmail.com> schrieb:

>Hi all,
>
>A couple of days ago I found the replica stopped after the PANIC
>message:
>
>PANIC:  WAL contains references to invalid pages
>
>When I tried to restart it I got this FATAL:
>
>FATAL:  could not access status of transaction 280557568
>
>Below is the description of the server and information from PostgreSQL
>and system logs. After googling the problem I have found nothing like
>this.
>
>Any thoughts of what it could be and how to prevent it in the future?

I think I See whats going on. Do you still have the datadir available? If=
 so, could you send the pg_controldata output?


Andres

>Hardware:
>
>IBM System x3650 M4, 148GB RAM, NAS
>
>Software:
>
>PostgreSQL 9.2.3, yum.postgresql.org
>CentOS 6.3, kernel 2.6.32-279.22.1.el6.x86_64
>
>Configuration:
>
>listen_addresses =3D '*'
>max_connections =3D 550
>shared_buffers =3D 35GB
>work_mem =3D 256MB
>maintenance_work_mem =3D 1GB
>bgwriter_delay =3D 10ms
>bgwriter_lru_multiplier =3D 10.0
>effective_io_concurrency =3D 32
>wal_level =3D hot_standby
>synchronous_commit =3D off
>checkpoint_segments =3D 1024
>checkpoint_timeout =3D 1h
>checkpoint_completion_target =3D 0.9
>checkpoint_warning =3D 5min
>max_wal_senders =3D 3
>wal_keep_segments =3D 2048
>hot_standby =3D on
>max_standby_streaming_delay =3D 5min
>hot_standby_feedback =3D on
>effective_cache_size =3D 133GB
>log_directory =3D '/var/log/pgsql'
>log_filename =3D 'postgresql-%Y-%m-%d.log'
>log_checkpoints =3D on
>log_line_prefix =3D '%t %p %u@%d from %h [vxid:%v txid:%x] [%i] '
>log_lock_waits =3D on
>log_statement =3D 'ddl'
>log_timezone =3D 'W-SU'
>track_activity_query_size =3D 4096
>autovacuum_max_workers =3D 5
>autovacuum_naptime =3D 5s
>autovacuum_vacuum_scale_factor =3D 0.05
>autovacuum_analyze_scale_factor =3D 0.05
>autovacuum_vacuum_cost_delay =3D 5ms
>datestyle =3D 'iso, dmy'
>timezone =3D 'W-SU'
>lc_messages =3D 'en_US.UTF-8'
>lc_monetary =3D 'ru_RU.UTF-8'
>lc_numeric =3D 'ru_RU.UTF-8'
>lc_time =3D 'ru_RU.UTF-8'
>default_text_search_config =3D 'pg_catalog.russian'
>
>System:
>
># Controls the maximum shared segment size, in bytes
>kernel.shmmax =3D 53287555072
>
># Controls the maximum number of shared memory segments, in pages
>kernel.shmall =3D 13009657
>
># Maximum number of file-handles
>fs.file-max =3D 65535
>
># pdflush tuning to prevent lag spikes
>vm.dirty_ratio =3D 10
>vm.dirty_background_ratio =3D 1
>vm.dirty_expire_centisecs =3D 499
>
># Prevent the scheduler breakdown
>kernel.sched_migration_cost =3D 5000000
>
># Turned off to provide more CPU to PostgreSQL
>kernel.sched_autogroup_enabled =3D 0
>
># Setup hugepages
>vm.hugetlb_shm_group =3D 26
>vm.hugepages_treat_as_movable =3D 0
>vm.nr_overcommit_hugepages =3D 512
>
># The Huge Page Size is 2048kB, so for 35GB shared buffers the number
>is 17920
>vm.nr_hugepages =3D 17920
>
># Turn off the NUMA local pages reclaim as it leads to wrong caching
>strategy for databases
>vm.zone_reclaim_mode =3D 0
>
>Environment:
>
>HUGETLB_SHM=3Dyes
>LD_PRELOAD=3D'/usr/lib64/libhugetlbfs.so'
>export HUGETLB_SHM LD_PRELOAD
>
>When it is stopped:
>
>2013-03-26 11:50:32 MSK 3775 @ from  [vxid: txid:0] [] LOG:
>restartpoint complete: wrote 1685004 buffers (36.7%); 0 transaction
>log file(s) added, 0 removed, 555 recycled; write=3D3237.402 s,
>sync=3D0.071 s, total=3D3237.507 s; sync files=3D2673, longest=3D0.008 s=
,
>average=3D0.000 s
>2013-03-26 11:50:32 MSK 3775 @ from  [vxid: txid:0] [] LOG:  recovery
>restart point at 2538/6E154AC0
>2013-03-26 11:50:32 MSK 3775 @ from  [vxid: txid:0] [] DETAIL:  last
>completed transaction was at log time 2013-03-26 11:50:31.613948+04
>2013-03-26 11:50:32 MSK 3775 @ from  [vxid: txid:0] [] LOG:
>restartpoint starting: xlog
>2013-03-26 11:51:16 MSK 3773 @ from  [vxid:1/0 txid:0] [] WARNING:
>page 451 of relation base/16436/2686702648 is uninitialized
>2013-03-26 11:51:16 MSK 3773 @ from  [vxid:1/0 txid:0] [] CONTEXT:
>xlog redo vacuum: rel 1663/16436/2686702648; blk 2485,
>lastBlockVacuumed 0
>2013-03-26 11:51:16 MSK 3773 @ from  [vxid:1/0 txid:0] [] PANIC:  WAL
>contains references to invalid pages
>2013-03-26 11:51:16 MSK 3773 @ from  [vxid:1/0 txid:0] [] CONTEXT:
>xlog redo vacuum: rel 1663/16436/2686702648; blk 2485,
>lastBlockVacuumed 0
>2013-03-26 11:51:16 MSK 3770 @ from  [vxid: txid:0] [] LOG:  startup
>process (PID 3773) was terminated by signal 6: Aborted
>2013-03-26 11:51:16 MSK 3770 @ from  [vxid: txid:0] [] LOG:
>terminating any other active server processes
>
>From /var/log/messages:
>
>Mar 26 10:50:52 tms2 kernel: : postmaster: page allocation failure.
>order:8, mode:0xd0
>Mar 26 10:50:52 tms2 kernel: : Pid: 3774, comm: postmaster Not tainted
>2.6.32-279.22.1.el6.x86_64 #1
>Mar 26 10:50:52 tms2 kernel: : Call Trace:
>Mar 26 10:50:52 tms2 kernel: : [<ffffffff8112343f>] ?
>__alloc_pages_nodemask+0x77f/0x940
>Mar 26 10:50:52 tms2 kernel: : [<ffffffff8115d3e2>] ?
>kmem_getpages+0x62/0x170
>Mar 26 10:50:52 tms2 kernel: : [<ffffffff8115dffa>] ?
>fallback_alloc+0x1ba/0x270
>Mar 26 10:50:52 tms2 kernel: : [<ffffffff8115da4f>] ?
>cache_grow+0x2cf/0x320
>Mar 26 10:50:52 tms2 kernel: : [<ffffffff8115dd79>] ?
>____cache_alloc_node+0x99/0x160
>Mar 26 10:50:52 tms2 kernel: : [<ffffffff813fd455>] ?
>dma_pin_iovec_pages+0xb5/0x230
>Mar 26 10:50:52 tms2 kernel: : [<ffffffff8115eb49>] ?
>__kmalloc+0x189/0x220
>Mar 26 10:50:52 tms2 kernel: : [<ffffffff813fd455>] ?
>dma_pin_iovec_pages+0xb5/0x230
>Mar 26 10:50:52 tms2 kernel: : [<ffffffff8141a47c>] ?
>lock_sock_nested+0xac/0xc0
>Mar 26 10:50:52 tms2 kernel: : [<ffffffff8146edaa>] ?
>tcp_recvmsg+0x4ca/0xe80
>Mar 26 10:50:52 tms2 kernel: : [<ffffffffa029a67e>] ?
>xfs_vm_write_end+0x2e/0x60 [xfs]
>Mar 26 10:50:52 tms2 kernel: : [<ffffffffa0292f39>] ?
>xfs_trans_unlocked_item+0x39/0x60 [xfs]
>Mar 26 10:50:52 tms2 kernel: : [<ffffffff8148f90a>] ?
>inet_recvmsg+0x5a/0x90
>Mar 26 10:50:52 tms2 kernel: : [<ffffffff81418b93>] ?
>sock_recvmsg+0x133/0x160
>Mar 26 10:50:52 tms2 kernel: : [<ffffffffa029f542>] ?
>xfs_rw_iunlock+0x32/0x40 [xfs]
>Mar 26 10:50:52 tms2 kernel: : [<ffffffff81090be0>] ?
>autoremove_wake_function+0x0/0x40
>Mar 26 10:50:52 tms2 kernel: : [<ffffffff81173fff>] ?
>__dentry_open+0x23f/0x360
>Mar 26 10:50:52 tms2 kernel: : [<ffffffff811764ca>] ?
>do_sync_write+0xfa/0x140
>Mar 26 10:50:52 tms2 kernel: : [<ffffffff811875d0>] ?
>do_filp_open+0x780/0xd60
>Mar 26 10:50:52 tms2 kernel: : [<ffffffff81418d0e>] ?
>sys_recvfrom+0xee/0x180
>Mar 26 10:50:52 tms2 kernel: : [<ffffffff81183105>] ? putname+0x35/0x50
>Mar 26 10:50:52 tms2 kernel: : [<ffffffff81176842>] ?
>vfs_write+0x132/0x1a0
>Mar 26 10:50:52 tms2 kernel: : [<ffffffff810d3f47>] ?
>audit_syscall_entry+0x1d7/0x200
>Mar 26 10:50:52 tms2 kernel: : [<ffffffff8100b072>] ?
>system_call_fastpath+0x16/0x1b
>
>After restart:
>
>2013-03-27 10:43:01 MSK 1535 @ from  [vxid: txid:0] [] LOG:  database
>system was interrupted while in recovery at log time 2013-03-26
>10:08:37 MSK
>2013-03-27 10:43:01 MSK 1535 @ from  [vxid: txid:0] [] HINT:  If this
>has occurred more than once some data might be corrupted and you might
>need to choose an earlier recovery target.
>2013-03-27 10:43:01 MSK 1535 @ from  [vxid: txid:0] [] LOG:  entering
>standby mode
>2013-03-27 10:43:01 MSK 1535 @ from  [vxid:1/0 txid:0] [] LOG:  redo
>starts at 2538/6E154AC0
>2013-03-27 10:43:02 MSK 1535 @ from  [vxid:1/0 txid:0] [] LOG:  file
>"pg_subtrans/10B7" doesn't exist, reading as zeroes
>2013-03-27 10:43:02 MSK 1535 @ from  [vxid:1/0 txid:0] [] CONTEXT:
>xlog redo xid assignment xtop 280482186: subxacts: 280483328 280483333
>280483343 280483354 280483356 280483368 280483377 280483382 280483392
>280483404 280483416 280483429 280483440 280483451 280483460 280483472
>280483487 280483500 280483516 280483530 280483541 280483555 280483565
>280483574 280483585 280483595 280483604 280483607 280483617 280483626
>280483636 280483646 280483656 280483665 280483677 280483688 280483699
>280483709 280483719 280483730 280483739 280483749 280483759 280483761
>280483771 280483782 280483799 280483800 280483811 280483821 280483824
>280483836 280483847 280483859 280483871 280483874 280483883 280483897
>280483906 280483915 280483925 280483937 280483948 280483958
>2013-03-27 10:43:02 MSK 1535 @ from  [vxid:1/0 txid:0] [] LOG:  file
>"pg_subtrans/10B7" doesn't exist, reading as zeroes
>2013-03-27 10:43:02 MSK 1535 @ from  [vxid:1/0 txid:0] [] CONTEXT:
>xlog redo xid assignment xtop 280482270: subxacts: 280485056 280485070
>280485083 280485086 280485098 280485113 280485132 280485144 280485156
>280485167 280485178 280485188 280485201 280485217 280485234 280485249
>280485267 280485293 280485309 280485327 280485333 280485345 280485353
>280485373 280485388 280485405 280485420 280485434 280485457 280485476
>280485482 280485507 280485516 280485531 280485537 280485550 280485565
>280485568 280485585 280485587 280485601 280485613 280485634 280485639
>280485656 280485669 280485684 280485690 280485693 280485712 280485730
>280485754 280485757 280485779 280485801 280485808 280485811 280485830
>280485856 280485880 280485900 280485920 280485941 280485946
>
>[ skipped several more messages of this kind]
>
>2013-03-27 10:43:03 MSK 1535 @ from  [vxid:1/0 txid:0] [] LOG:  file
>"pg_subtrans/10B8" doesn't exist, reading as zeroes
>2013-03-27 10:43:03 MSK 1535 @ from  [vxid:1/0 txid:0] [] CONTEXT:
>xlog redo xid assignment xtop 280549936: subxacts: 280555514 280555515
>280555516 280555517 280555518 280555519 280555520 280555521 280555522
>280555523 280555524 280555525 280555526 280555527 280555528 280555529
>280555530 280555531 280555532 280555533 280555534 280555535 280555536
>280555537 280555538 280555539 280555540 280555541 280555542 280555543
>280555544 280555545 280555546 280555547 280555548 280555549 280555550
>280555551 280555552 280555553 280555554 280555555 280555556 280555557
>280555558 280555559 280555560 280555561 280555562 280555563 280555564
>280555565 280555566 280555567 280555568 280555569 280555570 280555571
>280555572 280555573 280555574 280555575 280555576 280555577
>2013-03-27 10:43:03 MSK 1535 @ from  [vxid:1/0 txid:0] [] FATAL:
>could not access status of transaction 280557568
>2013-03-27 10:43:03 MSK 1535 @ from  [vxid:1/0 txid:0] [] DETAIL:
>Could not read from file "pg_subtrans/10B8" at offset 253952: Success.
>2013-03-27 10:43:03 MSK 1535 @ from  [vxid:1/0 txid:0] [] CONTEXT:
>xlog redo xid assignment xtop 280555981: subxacts: 280557520 280557521
>280557522 280557523 280557524 280557525 280557526 280557527 280557528
>280557529 280557530 280557531 280557532 280557533 280557534 280557535
>280557536 280557537 280557538 280557539 280557540 280557541 280557542
>280557543 280557544 280557545 280557546 280557547 280557548 280557549
>280557550 280557551 280557552 280557553 280557554 280557555 280557556
>280557557 280557558 280557559 280557560 280557561 280557562 280557563
>280557564 280557565 280557566 280557567 280557568 280557569 280557570
>280557571 280557572 280557573 280557574 280557575 280557576 280557577
>280557578 280557579 280557580 280557581 280557582 280557583
>2013-03-27 10:43:03 MSK 1532 @ from  [vxid: txid:0] [] LOG:  startup
>process (PID 1535) exited with exit code 1
>2013-03-27 10:43:03 MSK 1532 @ from  [vxid: txid:0] [] LOG:
>terminating any other active server processes
>
>Thank you in advance.
>
>--
>Kind regards,
>Sergey Konoplev
>Database and Software Consultant
>
>Profile: http://www.linkedin.com/in/grayhemp
>Phone: USA +1 (415) 867-9984, Russia +7 (901) 903-0499, +7 (988)
>888-1979
>Skype: gray-hemp
>Jabber: gray.ru@gmail.com


---=20
Please excuse brevity and formatting - I am writing this on my mobile pho=
ne.
---=20
Please excuse brevity and formatting - I am writing this on my mobile pho=
ne.

Re: Completely broken replica after PANIC: WAL contains references to invalid pages

From

Sergey Konoplev

Date:

30 March 2013, 00:20:30

On Fri, Mar 29, 2013 at 1:52 PM, anarazel@anarazel.de
<andres@anarazel.de> wrote:
> I think I See whats going on. Do you still have the datadir available? If=
 so, could you send the pg_controldata output?

I have already rebuilt the replica, however below is the output if it is us=
eful:

pg_control version number:            922
Catalog version number:               201204301
Database system identifier:           5858109675396804534
Database cluster state:               in archive recovery
pg_control last modified:             =D0=A1=D0=B1=D1=82 30 =D0=9C=D0=B0=D1=
=80 2013 00:21:11
Latest checkpoint location:           258B/BDBBE748
Prior checkpoint location:            258B/86DABCB8
Latest checkpoint's REDO location:    258B/8B78BED0
Latest checkpoint's TimeLineID:       2
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID:          0/423386899
Latest checkpoint's NextOID:          2758636912
Latest checkpoint's NextMultiXactId:  103920
Latest checkpoint's NextMultiOffset:  431309
Latest checkpoint's oldestXID:        225634745
Latest checkpoint's oldestXID's DB:   16436
Latest checkpoint's oldestActiveXID:  421766298
Time of latest checkpoint:            =D0=9F=D1=82=D0=BD 29 =D0=9C=D0=B0=D1=
=80 2013 22:33:01
Minimum recovery ending location:     258C/14AA5FA0
Backup start location:                0/0
Backup end location:                  0/0
End-of-backup record required:        no
Current wal_level setting:            hot_standby
Current max_connections setting:      550
Current max_prepared_xacts setting:   0
Current max_locks_per_xact setting:   64
Maximum data alignment:               8
Database block size:                  8192
Blocks per segment of large relation: 131072
WAL block size:                       8192
Bytes per WAL segment:                16777216
Maximum length of identifiers:        64
Maximum columns in an index:          32
Maximum size of a TOAST chunk:        1996
Date/time type storage:               64-bit integers
Float4 argument passing:              by value
Float8 argument passing:              by value

>
>
> Andres
>
>>Hardware:
>>
>>IBM System x3650 M4, 148GB RAM, NAS
>>
>>Software:
>>
>>PostgreSQL 9.2.3, yum.postgresql.org
>>CentOS 6.3, kernel 2.6.32-279.22.1.el6.x86_64
>>
>>Configuration:
>>
>>listen_addresses =3D '*'
>>max_connections =3D 550
>>shared_buffers =3D 35GB
>>work_mem =3D 256MB
>>maintenance_work_mem =3D 1GB
>>bgwriter_delay =3D 10ms
>>bgwriter_lru_multiplier =3D 10.0
>>effective_io_concurrency =3D 32
>>wal_level =3D hot_standby
>>synchronous_commit =3D off
>>checkpoint_segments =3D 1024
>>checkpoint_timeout =3D 1h
>>checkpoint_completion_target =3D 0.9
>>checkpoint_warning =3D 5min
>>max_wal_senders =3D 3
>>wal_keep_segments =3D 2048
>>hot_standby =3D on
>>max_standby_streaming_delay =3D 5min
>>hot_standby_feedback =3D on
>>effective_cache_size =3D 133GB
>>log_directory =3D '/var/log/pgsql'
>>log_filename =3D 'postgresql-%Y-%m-%d.log'
>>log_checkpoints =3D on
>>log_line_prefix =3D '%t %p %u@%d from %h [vxid:%v txid:%x] [%i] '
>>log_lock_waits =3D on
>>log_statement =3D 'ddl'
>>log_timezone =3D 'W-SU'
>>track_activity_query_size =3D 4096
>>autovacuum_max_workers =3D 5
>>autovacuum_naptime =3D 5s
>>autovacuum_vacuum_scale_factor =3D 0.05
>>autovacuum_analyze_scale_factor =3D 0.05
>>autovacuum_vacuum_cost_delay =3D 5ms
>>datestyle =3D 'iso, dmy'
>>timezone =3D 'W-SU'
>>lc_messages =3D 'en_US.UTF-8'
>>lc_monetary =3D 'ru_RU.UTF-8'
>>lc_numeric =3D 'ru_RU.UTF-8'
>>lc_time =3D 'ru_RU.UTF-8'
>>default_text_search_config =3D 'pg_catalog.russian'
>>
>>System:
>>
>># Controls the maximum shared segment size, in bytes
>>kernel.shmmax =3D 53287555072
>>
>># Controls the maximum number of shared memory segments, in pages
>>kernel.shmall =3D 13009657
>>
>># Maximum number of file-handles
>>fs.file-max =3D 65535
>>
>># pdflush tuning to prevent lag spikes
>>vm.dirty_ratio =3D 10
>>vm.dirty_background_ratio =3D 1
>>vm.dirty_expire_centisecs =3D 499
>>
>># Prevent the scheduler breakdown
>>kernel.sched_migration_cost =3D 5000000
>>
>># Turned off to provide more CPU to PostgreSQL
>>kernel.sched_autogroup_enabled =3D 0
>>
>># Setup hugepages
>>vm.hugetlb_shm_group =3D 26
>>vm.hugepages_treat_as_movable =3D 0
>>vm.nr_overcommit_hugepages =3D 512
>>
>># The Huge Page Size is 2048kB, so for 35GB shared buffers the number
>>is 17920
>>vm.nr_hugepages =3D 17920
>>
>># Turn off the NUMA local pages reclaim as it leads to wrong caching
>>strategy for databases
>>vm.zone_reclaim_mode =3D 0
>>
>>Environment:
>>
>>HUGETLB_SHM=3Dyes
>>LD_PRELOAD=3D'/usr/lib64/libhugetlbfs.so'
>>export HUGETLB_SHM LD_PRELOAD
>>
>>When it is stopped:
>>
>>2013-03-26 11:50:32 MSK 3775 @ from  [vxid: txid:0] [] LOG:
>>restartpoint complete: wrote 1685004 buffers (36.7%); 0 transaction
>>log file(s) added, 0 removed, 555 recycled; write=3D3237.402 s,
>>sync=3D0.071 s, total=3D3237.507 s; sync files=3D2673, longest=3D0.008 s,
>>average=3D0.000 s
>>2013-03-26 11:50:32 MSK 3775 @ from  [vxid: txid:0] [] LOG:  recovery
>>restart point at 2538/6E154AC0
>>2013-03-26 11:50:32 MSK 3775 @ from  [vxid: txid:0] [] DETAIL:  last
>>completed transaction was at log time 2013-03-26 11:50:31.613948+04
>>2013-03-26 11:50:32 MSK 3775 @ from  [vxid: txid:0] [] LOG:
>>restartpoint starting: xlog
>>2013-03-26 11:51:16 MSK 3773 @ from  [vxid:1/0 txid:0] [] WARNING:
>>page 451 of relation base/16436/2686702648 is uninitialized
>>2013-03-26 11:51:16 MSK 3773 @ from  [vxid:1/0 txid:0] [] CONTEXT:
>>xlog redo vacuum: rel 1663/16436/2686702648; blk 2485,
>>lastBlockVacuumed 0
>>2013-03-26 11:51:16 MSK 3773 @ from  [vxid:1/0 txid:0] [] PANIC:  WAL
>>contains references to invalid pages
>>2013-03-26 11:51:16 MSK 3773 @ from  [vxid:1/0 txid:0] [] CONTEXT:
>>xlog redo vacuum: rel 1663/16436/2686702648; blk 2485,
>>lastBlockVacuumed 0
>>2013-03-26 11:51:16 MSK 3770 @ from  [vxid: txid:0] [] LOG:  startup
>>process (PID 3773) was terminated by signal 6: Aborted
>>2013-03-26 11:51:16 MSK 3770 @ from  [vxid: txid:0] [] LOG:
>>terminating any other active server processes
>>
> >From /var/log/messages:
>>
>>Mar 26 10:50:52 tms2 kernel: : postmaster: page allocation failure.
>>order:8, mode:0xd0
>>Mar 26 10:50:52 tms2 kernel: : Pid: 3774, comm: postmaster Not tainted
>>2.6.32-279.22.1.el6.x86_64 #1
>>Mar 26 10:50:52 tms2 kernel: : Call Trace:
>>Mar 26 10:50:52 tms2 kernel: : [<ffffffff8112343f>] ?
>>__alloc_pages_nodemask+0x77f/0x940
>>Mar 26 10:50:52 tms2 kernel: : [<ffffffff8115d3e2>] ?
>>kmem_getpages+0x62/0x170
>>Mar 26 10:50:52 tms2 kernel: : [<ffffffff8115dffa>] ?
>>fallback_alloc+0x1ba/0x270
>>Mar 26 10:50:52 tms2 kernel: : [<ffffffff8115da4f>] ?
>>cache_grow+0x2cf/0x320
>>Mar 26 10:50:52 tms2 kernel: : [<ffffffff8115dd79>] ?
>>____cache_alloc_node+0x99/0x160
>>Mar 26 10:50:52 tms2 kernel: : [<ffffffff813fd455>] ?
>>dma_pin_iovec_pages+0xb5/0x230
>>Mar 26 10:50:52 tms2 kernel: : [<ffffffff8115eb49>] ?
>>__kmalloc+0x189/0x220
>>Mar 26 10:50:52 tms2 kernel: : [<ffffffff813fd455>] ?
>>dma_pin_iovec_pages+0xb5/0x230
>>Mar 26 10:50:52 tms2 kernel: : [<ffffffff8141a47c>] ?
>>lock_sock_nested+0xac/0xc0
>>Mar 26 10:50:52 tms2 kernel: : [<ffffffff8146edaa>] ?
>>tcp_recvmsg+0x4ca/0xe80
>>Mar 26 10:50:52 tms2 kernel: : [<ffffffffa029a67e>] ?
>>xfs_vm_write_end+0x2e/0x60 [xfs]
>>Mar 26 10:50:52 tms2 kernel: : [<ffffffffa0292f39>] ?
>>xfs_trans_unlocked_item+0x39/0x60 [xfs]
>>Mar 26 10:50:52 tms2 kernel: : [<ffffffff8148f90a>] ?
>>inet_recvmsg+0x5a/0x90
>>Mar 26 10:50:52 tms2 kernel: : [<ffffffff81418b93>] ?
>>sock_recvmsg+0x133/0x160
>>Mar 26 10:50:52 tms2 kernel: : [<ffffffffa029f542>] ?
>>xfs_rw_iunlock+0x32/0x40 [xfs]
>>Mar 26 10:50:52 tms2 kernel: : [<ffffffff81090be0>] ?
>>autoremove_wake_function+0x0/0x40
>>Mar 26 10:50:52 tms2 kernel: : [<ffffffff81173fff>] ?
>>__dentry_open+0x23f/0x360
>>Mar 26 10:50:52 tms2 kernel: : [<ffffffff811764ca>] ?
>>do_sync_write+0xfa/0x140
>>Mar 26 10:50:52 tms2 kernel: : [<ffffffff811875d0>] ?
>>do_filp_open+0x780/0xd60
>>Mar 26 10:50:52 tms2 kernel: : [<ffffffff81418d0e>] ?
>>sys_recvfrom+0xee/0x180
>>Mar 26 10:50:52 tms2 kernel: : [<ffffffff81183105>] ? putname+0x35/0x50
>>Mar 26 10:50:52 tms2 kernel: : [<ffffffff81176842>] ?
>>vfs_write+0x132/0x1a0
>>Mar 26 10:50:52 tms2 kernel: : [<ffffffff810d3f47>] ?
>>audit_syscall_entry+0x1d7/0x200
>>Mar 26 10:50:52 tms2 kernel: : [<ffffffff8100b072>] ?
>>system_call_fastpath+0x16/0x1b
>>
>>After restart:
>>
>>2013-03-27 10:43:01 MSK 1535 @ from  [vxid: txid:0] [] LOG:  database
>>system was interrupted while in recovery at log time 2013-03-26
>>10:08:37 MSK
>>2013-03-27 10:43:01 MSK 1535 @ from  [vxid: txid:0] [] HINT:  If this
>>has occurred more than once some data might be corrupted and you might
>>need to choose an earlier recovery target.
>>2013-03-27 10:43:01 MSK 1535 @ from  [vxid: txid:0] [] LOG:  entering
>>standby mode
>>2013-03-27 10:43:01 MSK 1535 @ from  [vxid:1/0 txid:0] [] LOG:  redo
>>starts at 2538/6E154AC0
>>2013-03-27 10:43:02 MSK 1535 @ from  [vxid:1/0 txid:0] [] LOG:  file
>>"pg_subtrans/10B7" doesn't exist, reading as zeroes
>>2013-03-27 10:43:02 MSK 1535 @ from  [vxid:1/0 txid:0] [] CONTEXT:
>>xlog redo xid assignment xtop 280482186: subxacts: 280483328 280483333
>>280483343 280483354 280483356 280483368 280483377 280483382 280483392
>>280483404 280483416 280483429 280483440 280483451 280483460 280483472
>>280483487 280483500 280483516 280483530 280483541 280483555 280483565
>>280483574 280483585 280483595 280483604 280483607 280483617 280483626
>>280483636 280483646 280483656 280483665 280483677 280483688 280483699
>>280483709 280483719 280483730 280483739 280483749 280483759 280483761
>>280483771 280483782 280483799 280483800 280483811 280483821 280483824
>>280483836 280483847 280483859 280483871 280483874 280483883 280483897
>>280483906 280483915 280483925 280483937 280483948 280483958
>>2013-03-27 10:43:02 MSK 1535 @ from  [vxid:1/0 txid:0] [] LOG:  file
>>"pg_subtrans/10B7" doesn't exist, reading as zeroes
>>2013-03-27 10:43:02 MSK 1535 @ from  [vxid:1/0 txid:0] [] CONTEXT:
>>xlog redo xid assignment xtop 280482270: subxacts: 280485056 280485070
>>280485083 280485086 280485098 280485113 280485132 280485144 280485156
>>280485167 280485178 280485188 280485201 280485217 280485234 280485249
>>280485267 280485293 280485309 280485327 280485333 280485345 280485353
>>280485373 280485388 280485405 280485420 280485434 280485457 280485476
>>280485482 280485507 280485516 280485531 280485537 280485550 280485565
>>280485568 280485585 280485587 280485601 280485613 280485634 280485639
>>280485656 280485669 280485684 280485690 280485693 280485712 280485730
>>280485754 280485757 280485779 280485801 280485808 280485811 280485830
>>280485856 280485880 280485900 280485920 280485941 280485946
>>
>>[ skipped several more messages of this kind]
>>
>>2013-03-27 10:43:03 MSK 1535 @ from  [vxid:1/0 txid:0] [] LOG:  file
>>"pg_subtrans/10B8" doesn't exist, reading as zeroes
>>2013-03-27 10:43:03 MSK 1535 @ from  [vxid:1/0 txid:0] [] CONTEXT:
>>xlog redo xid assignment xtop 280549936: subxacts: 280555514 280555515
>>280555516 280555517 280555518 280555519 280555520 280555521 280555522
>>280555523 280555524 280555525 280555526 280555527 280555528 280555529
>>280555530 280555531 280555532 280555533 280555534 280555535 280555536
>>280555537 280555538 280555539 280555540 280555541 280555542 280555543
>>280555544 280555545 280555546 280555547 280555548 280555549 280555550
>>280555551 280555552 280555553 280555554 280555555 280555556 280555557
>>280555558 280555559 280555560 280555561 280555562 280555563 280555564
>>280555565 280555566 280555567 280555568 280555569 280555570 280555571
>>280555572 280555573 280555574 280555575 280555576 280555577
>>2013-03-27 10:43:03 MSK 1535 @ from  [vxid:1/0 txid:0] [] FATAL:
>>could not access status of transaction 280557568
>>2013-03-27 10:43:03 MSK 1535 @ from  [vxid:1/0 txid:0] [] DETAIL:
>>Could not read from file "pg_subtrans/10B8" at offset 253952: Success.
>>2013-03-27 10:43:03 MSK 1535 @ from  [vxid:1/0 txid:0] [] CONTEXT:
>>xlog redo xid assignment xtop 280555981: subxacts: 280557520 280557521
>>280557522 280557523 280557524 280557525 280557526 280557527 280557528
>>280557529 280557530 280557531 280557532 280557533 280557534 280557535
>>280557536 280557537 280557538 280557539 280557540 280557541 280557542
>>280557543 280557544 280557545 280557546 280557547 280557548 280557549
>>280557550 280557551 280557552 280557553 280557554 280557555 280557556
>>280557557 280557558 280557559 280557560 280557561 280557562 280557563
>>280557564 280557565 280557566 280557567 280557568 280557569 280557570
>>280557571 280557572 280557573 280557574 280557575 280557576 280557577
>>280557578 280557579 280557580 280557581 280557582 280557583
>>2013-03-27 10:43:03 MSK 1532 @ from  [vxid: txid:0] [] LOG:  startup
>>process (PID 1535) exited with exit code 1
>>2013-03-27 10:43:03 MSK 1532 @ from  [vxid: txid:0] [] LOG:
>>terminating any other active server processes
>>
>>Thank you in advance.
>>
>>--
>>Kind regards,
>>Sergey Konoplev
>>Database and Software Consultant
>>
>>Profile: http://www.linkedin.com/in/grayhemp
>>Phone: USA +1 (415) 867-9984, Russia +7 (901) 903-0499, +7 (988)
>>888-1979
>>Skype: gray-hemp
>>Jabber: gray.ru@gmail.com
>
>
> ---
> Please excuse brevity and formatting - I am writing this on my mobile pho=
ne.
> ---
> Please excuse brevity and formatting - I am writing this on my mobile pho=
ne.



--
Kind regards,
Sergey Konoplev
Database and Software Consultant

Profile: http://www.linkedin.com/in/grayhemp
Phone: USA +1 (415) 867-9984, Russia +7 (901) 903-0499, +7 (988) 888-1979
Skype: gray-hemp
Jabber: gray.ru@gmail.com

Re: Completely broken replica after PANIC: WAL contains references to invalid pages

From

Andres Freund

Date:

30 March 2013, 00:38:42

On 2013-03-29 14:19:59 -0700, Sergey Konoplev wrote:
> On Fri, Mar 29, 2013 at 1:52 PM, anarazel@anarazel.de
> <andres@anarazel.de> wrote:
> > I think I See whats going on. Do you still have the datadir available=
? If so, could you send the pg_controldata output?
>=20
> I have already rebuilt the replica, however below is the output if it i=
s useful:

I have to admit, I find it a bit confusing that so many people report a
bug and then immediately destroy all evidence of the bug. Just seems to
a happen a bit too frequently.

> pg_control version number:            922
> Catalog version number:               201204301
> Database system identifier:           5858109675396804534
> Database cluster state:               in archive recovery
> pg_control last modified:             =D0=A1=D0=B1=D1=82 30 =D0=9C=D0=B0=
=D1=80 2013 00:21:11
> Latest checkpoint location:           258B/BDBBE748
> Prior checkpoint location:            258B/86DABCB8
> Latest checkpoint's REDO location:    258B/8B78BED0
> Latest checkpoint's TimeLineID:       2
> Latest checkpoint's full_page_writes: on
> Latest checkpoint's NextXID:          0/423386899
> Latest checkpoint's NextOID:          2758636912
> Latest checkpoint's NextMultiXactId:  103920
> Latest checkpoint's NextMultiOffset:  431309
> Latest checkpoint's oldestXID:        225634745
> Latest checkpoint's oldestXID's DB:   16436
> Latest checkpoint's oldestActiveXID:  421766298
> Time of latest checkpoint:            =D0=9F=D1=82=D0=BD 29 =D0=9C=D0=B0=
=D1=80 2013 22:33:01

Thats not a pg_controldata output from the broken replica though, or is
it? I guess its from a new standby?

Andres
Greetings,

Andres Freund

--=20
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Completely broken replica after PANIC: WAL contains references to invalid pages

From

Sergey Konoplev

Date:

30 March 2013, 00:53:51

On Fri, Mar 29, 2013 at 2:38 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> I have to admit, I find it a bit confusing that so many people report a
> bug and then immediately destroy all evidence of the bug. Just seems to
> a happen a bit too frequently.

You see, businesses usually need it up ASAP again. Sorry, I must have
note down the output of pg_controldata straight after it got broken, I
just have not came up to it.

> Thats not a pg_controldata output from the broken replica though, or is
> it? I guess its from a new standby?

That was the output from the standby that was rsync-ed on top of the
broken one. I thought you might find something useful in it.

Can I test your guess some other way? And what was the guess?

--
Kind regards,
Sergey Konoplev
Database and Software Consultant

Profile: http://www.linkedin.com/in/grayhemp
Phone: USA +1 (415) 867-9984, Russia +7 (901) 903-0499, +7 (988) 888-1979
Skype: gray-hemp
Jabber: gray.ru@gmail.com

Re: Completely broken replica after PANIC: WAL contains references to invalid pages

From

Andres Freund

Date:

30 March 2013, 20:22:15

On 2013-03-29 14:53:26 -0700, Sergey Konoplev wrote:
> On Fri, Mar 29, 2013 at 2:38 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> > I have to admit, I find it a bit confusing that so many people report a
> > bug and then immediately destroy all evidence of the bug. Just seems to
> > a happen a bit too frequently.
>
> You see, businesses usually need it up ASAP again. Sorry, I must have
> note down the output of pg_controldata straight after it got broken, I
> just have not came up to it.

But the business will also need the standby working correctly in case
of a critical incident of a primary. So it should have quite an interest
in fixing bugs in that area. Yes, I realize thats not always easy to do
:(.

> > Thats not a pg_controldata output from the broken replica though, or is
> > it? I guess its from a new standby?
>
> That was the output from the standby that was rsync-ed on top of the
> broken one. I thought you might find something useful in it.

> Can I test your guess some other way? And what was the guess?

Don't think you can easily test it. And after reading more code I am
pretty sure my original guess was bogus. As was my second. And third ;)

But I think I see what could be going on:

During HS we maintain pg_subtrans so we can deal with more than
PGPROC_MAX_CACHED_SUBXIDS in one TX. For that we need to regularly
extend subtrans so the pages are initialized when we setup the
topxid<->subxid mapping in ProcArrayApplyXidAssignment(). The call to
ExtendSUBTRANS happens in RecordKnownAssignedTransactionIds() which is
called from several places, including ProcArrayApplyXidAssignment().
The logic it uses is:
    if (TransactionIdFollows(xid, latestObservedXid))
    {
        TransactionId next_expected_xid;

        /*
         * Extend clog and subtrans like we do in GetNewTransactionId() during
         * normal operation using individual extend steps. Typical case
         * requires almost no activity.
         */
        next_expected_xid = latestObservedXid;
        TransactionIdAdvance(next_expected_xid);
        while (TransactionIdPrecedesOrEquals(next_expected_xid, xid))
        {
            ExtendCLOG(next_expected_xid);
            ExtendSUBTRANS(next_expected_xid);

            TransactionIdAdvance(next_expected_xid);
        }

So if the xid is later than latestObservedXid we extend subtrans one by
one. So far so good. But we initialize it in
ProcArrayApplyRecoveryInfo() when consistency is initially reached:
                             latestObservedXid = running->nextXid;
                             TransactionIdRetreat(latestObservedXid);
Before that subtrans has initially been started up with:
            if (wasShutdown)
                oldestActiveXID = PrescanPreparedTransactions(&xids, &nxids);
            else
                oldestActiveXID = checkPoint.oldestActiveXid;
...
            StartupSUBTRANS(oldestActiveXID);

That means its only initialized up to checkPoint.oldestActiveXid. As it
can take some time till we reach consistency it seems rather plausible
that there now will be a gap in initilized pages. From
checkPoint.oldestActiveXid to running->nextXid if there are pages
inbetween.

Does that explanation sound about right to anybody else? I'll provide a
patch for the issue in a while, for now I'll try to reproduce it.

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Completely broken replica after PANIC: WAL contains references to invalid pages

From

ajmcello

Date:

31 March 2013, 06:42:05

unsubscribe


On Thu, Mar 28, 2013 at 10:30 PM, Sergey Konoplev <gray.ru@gmail.com> wrote:

> Hi all,
>
> A couple of days ago I found the replica stopped after the PANIC message:
>
> PANIC:  WAL contains references to invalid pages
>
> When I tried to restart it I got this FATAL:
>
> FATAL:  could not access status of transaction 280557568
>
> Below is the description of the server and information from PostgreSQL
> and system logs. After googling the problem I have found nothing like
> this.
>
> Any thoughts of what it could be and how to prevent it in the future?
>
> Hardware:
>
> IBM System x3650 M4, 148GB RAM, NAS
>
> Software:
>
> PostgreSQL 9.2.3, yum.postgresql.org
> CentOS 6.3, kernel 2.6.32-279.22.1.el6.x86_64
>
> Configuration:
>
> listen_addresses = '*'
> max_connections = 550
> shared_buffers = 35GB
> work_mem = 256MB
> maintenance_work_mem = 1GB
> bgwriter_delay = 10ms
> bgwriter_lru_multiplier = 10.0
> effective_io_concurrency = 32
> wal_level = hot_standby
> synchronous_commit = off
> checkpoint_segments = 1024
> checkpoint_timeout = 1h
> checkpoint_completion_target = 0.9
> checkpoint_warning = 5min
> max_wal_senders = 3
> wal_keep_segments = 2048
> hot_standby = on
> max_standby_streaming_delay = 5min
> hot_standby_feedback = on
> effective_cache_size = 133GB
> log_directory = '/var/log/pgsql'
> log_filename = 'postgresql-%Y-%m-%d.log'
> log_checkpoints = on
> log_line_prefix = '%t %p %u@%d from %h [vxid:%v txid:%x] [%i] '
> log_lock_waits = on
> log_statement = 'ddl'
> log_timezone = 'W-SU'
> track_activity_query_size = 4096
> autovacuum_max_workers = 5
> autovacuum_naptime = 5s
> autovacuum_vacuum_scale_factor = 0.05
> autovacuum_analyze_scale_factor = 0.05
> autovacuum_vacuum_cost_delay = 5ms
> datestyle = 'iso, dmy'
> timezone = 'W-SU'
> lc_messages = 'en_US.UTF-8'
> lc_monetary = 'ru_RU.UTF-8'
> lc_numeric = 'ru_RU.UTF-8'
> lc_time = 'ru_RU.UTF-8'
> default_text_search_config = 'pg_catalog.russian'
>
> System:
>
> # Controls the maximum shared segment size, in bytes
> kernel.shmmax = 53287555072
>
> # Controls the maximum number of shared memory segments, in pages
> kernel.shmall = 13009657
>
> # Maximum number of file-handles
> fs.file-max = 65535
>
> # pdflush tuning to prevent lag spikes
> vm.dirty_ratio = 10
> vm.dirty_background_ratio = 1
> vm.dirty_expire_centisecs = 499
>
> # Prevent the scheduler breakdown
> kernel.sched_migration_cost = 5000000
>
> # Turned off to provide more CPU to PostgreSQL
> kernel.sched_autogroup_enabled = 0
>
> # Setup hugepages
> vm.hugetlb_shm_group = 26
> vm.hugepages_treat_as_movable = 0
> vm.nr_overcommit_hugepages = 512
>
> # The Huge Page Size is 2048kB, so for 35GB shared buffers the number is
> 17920
> vm.nr_hugepages = 17920
>
> # Turn off the NUMA local pages reclaim as it leads to wrong caching
> strategy for databases
> vm.zone_reclaim_mode = 0
>
> Environment:
>
> HUGETLB_SHM=yes
> LD_PRELOAD='/usr/lib64/libhugetlbfs.so'
> export HUGETLB_SHM LD_PRELOAD
>
> When it is stopped:
>
> 2013-03-26 11:50:32 MSK 3775 @ from  [vxid: txid:0] [] LOG:
> restartpoint complete: wrote 1685004 buffers (36.7%); 0 transaction
> log file(s) added, 0 removed, 555 recycled; write=3237.402 s,
> sync=0.071 s, total=3237.507 s; sync files=2673, longest=0.008 s,
> average=0.000 s
> 2013-03-26 11:50:32 MSK 3775 @ from  [vxid: txid:0] [] LOG:  recovery
> restart point at 2538/6E154AC0
> 2013-03-26 11:50:32 MSK 3775 @ from  [vxid: txid:0] [] DETAIL:  last
> completed transaction was at log time 2013-03-26 11:50:31.613948+04
> 2013-03-26 11:50:32 MSK 3775 @ from  [vxid: txid:0] [] LOG:
> restartpoint starting: xlog
> 2013-03-26 11:51:16 MSK 3773 @ from  [vxid:1/0 txid:0] [] WARNING:
> page 451 of relation base/16436/2686702648 is uninitialized
> 2013-03-26 11:51:16 MSK 3773 @ from  [vxid:1/0 txid:0] [] CONTEXT:
> xlog redo vacuum: rel 1663/16436/2686702648; blk 2485,
> lastBlockVacuumed 0
> 2013-03-26 11:51:16 MSK 3773 @ from  [vxid:1/0 txid:0] [] PANIC:  WAL
> contains references to invalid pages
> 2013-03-26 11:51:16 MSK 3773 @ from  [vxid:1/0 txid:0] [] CONTEXT:
> xlog redo vacuum: rel 1663/16436/2686702648; blk 2485,
> lastBlockVacuumed 0
> 2013-03-26 11:51:16 MSK 3770 @ from  [vxid: txid:0] [] LOG:  startup
> process (PID 3773) was terminated by signal 6: Aborted
> 2013-03-26 11:51:16 MSK 3770 @ from  [vxid: txid:0] [] LOG:
> terminating any other active server processes
>
> From /var/log/messages:
>
> Mar 26 10:50:52 tms2 kernel: : postmaster: page allocation failure.
> order:8, mode:0xd0
> Mar 26 10:50:52 tms2 kernel: : Pid: 3774, comm: postmaster Not tainted
> 2.6.32-279.22.1.el6.x86_64 #1
> Mar 26 10:50:52 tms2 kernel: : Call Trace:
> Mar 26 10:50:52 tms2 kernel: : [<ffffffff8112343f>] ?
> __alloc_pages_nodemask+0x77f/0x940
> Mar 26 10:50:52 tms2 kernel: : [<ffffffff8115d3e2>] ?
> kmem_getpages+0x62/0x170
> Mar 26 10:50:52 tms2 kernel: : [<ffffffff8115dffa>] ?
> fallback_alloc+0x1ba/0x270
> Mar 26 10:50:52 tms2 kernel: : [<ffffffff8115da4f>] ?
> cache_grow+0x2cf/0x320
> Mar 26 10:50:52 tms2 kernel: : [<ffffffff8115dd79>] ?
> ____cache_alloc_node+0x99/0x160
> Mar 26 10:50:52 tms2 kernel: : [<ffffffff813fd455>] ?
> dma_pin_iovec_pages+0xb5/0x230
> Mar 26 10:50:52 tms2 kernel: : [<ffffffff8115eb49>] ? __kmalloc+0x189/0x220
> Mar 26 10:50:52 tms2 kernel: : [<ffffffff813fd455>] ?
> dma_pin_iovec_pages+0xb5/0x230
> Mar 26 10:50:52 tms2 kernel: : [<ffffffff8141a47c>] ?
> lock_sock_nested+0xac/0xc0
> Mar 26 10:50:52 tms2 kernel: : [<ffffffff8146edaa>] ?
> tcp_recvmsg+0x4ca/0xe80
> Mar 26 10:50:52 tms2 kernel: : [<ffffffffa029a67e>] ?
> xfs_vm_write_end+0x2e/0x60 [xfs]
> Mar 26 10:50:52 tms2 kernel: : [<ffffffffa0292f39>] ?
> xfs_trans_unlocked_item+0x39/0x60 [xfs]
> Mar 26 10:50:52 tms2 kernel: : [<ffffffff8148f90a>] ?
> inet_recvmsg+0x5a/0x90
> Mar 26 10:50:52 tms2 kernel: : [<ffffffff81418b93>] ?
> sock_recvmsg+0x133/0x160
> Mar 26 10:50:52 tms2 kernel: : [<ffffffffa029f542>] ?
> xfs_rw_iunlock+0x32/0x40 [xfs]
> Mar 26 10:50:52 tms2 kernel: : [<ffffffff81090be0>] ?
> autoremove_wake_function+0x0/0x40
> Mar 26 10:50:52 tms2 kernel: : [<ffffffff81173fff>] ?
> __dentry_open+0x23f/0x360
> Mar 26 10:50:52 tms2 kernel: : [<ffffffff811764ca>] ?
> do_sync_write+0xfa/0x140
> Mar 26 10:50:52 tms2 kernel: : [<ffffffff811875d0>] ?
> do_filp_open+0x780/0xd60
> Mar 26 10:50:52 tms2 kernel: : [<ffffffff81418d0e>] ?
> sys_recvfrom+0xee/0x180
> Mar 26 10:50:52 tms2 kernel: : [<ffffffff81183105>] ? putname+0x35/0x50
> Mar 26 10:50:52 tms2 kernel: : [<ffffffff81176842>] ? vfs_write+0x132/0x1a0
> Mar 26 10:50:52 tms2 kernel: : [<ffffffff810d3f47>] ?
> audit_syscall_entry+0x1d7/0x200
> Mar 26 10:50:52 tms2 kernel: : [<ffffffff8100b072>] ?
> system_call_fastpath+0x16/0x1b
>
> After restart:
>
> 2013-03-27 10:43:01 MSK 1535 @ from  [vxid: txid:0] [] LOG:  database
> system was interrupted while in recovery at log time 2013-03-26
> 10:08:37 MSK
> 2013-03-27 10:43:01 MSK 1535 @ from  [vxid: txid:0] [] HINT:  If this
> has occurred more than once some data might be corrupted and you might
> need to choose an earlier recovery target.
> 2013-03-27 10:43:01 MSK 1535 @ from  [vxid: txid:0] [] LOG:  entering
> standby mode
> 2013-03-27 10:43:01 MSK 1535 @ from  [vxid:1/0 txid:0] [] LOG:  redo
> starts at 2538/6E154AC0
> 2013-03-27 10:43:02 MSK 1535 @ from  [vxid:1/0 txid:0] [] LOG:  file
> "pg_subtrans/10B7" doesn't exist, reading as zeroes
> 2013-03-27 10:43:02 MSK 1535 @ from  [vxid:1/0 txid:0] [] CONTEXT:
> xlog redo xid assignment xtop 280482186: subxacts: 280483328 280483333
> 280483343 280483354 280483356 280483368 280483377 280483382 280483392
> 280483404 280483416 280483429 280483440 280483451 280483460 280483472
> 280483487 280483500 280483516 280483530 280483541 280483555 280483565
> 280483574 280483585 280483595 280483604 280483607 280483617 280483626
> 280483636 280483646 280483656 280483665 280483677 280483688 280483699
> 280483709 280483719 280483730 280483739 280483749 280483759 280483761
> 280483771 280483782 280483799 280483800 280483811 280483821 280483824
> 280483836 280483847 280483859 280483871 280483874 280483883 280483897
> 280483906 280483915 280483925 280483937 280483948 280483958
> 2013-03-27 10:43:02 MSK 1535 @ from  [vxid:1/0 txid:0] [] LOG:  file
> "pg_subtrans/10B7" doesn't exist, reading as zeroes
> 2013-03-27 10:43:02 MSK 1535 @ from  [vxid:1/0 txid:0] [] CONTEXT:
> xlog redo xid assignment xtop 280482270: subxacts: 280485056 280485070
> 280485083 280485086 280485098 280485113 280485132 280485144 280485156
> 280485167 280485178 280485188 280485201 280485217 280485234 280485249
> 280485267 280485293 280485309 280485327 280485333 280485345 280485353
> 280485373 280485388 280485405 280485420 280485434 280485457 280485476
> 280485482 280485507 280485516 280485531 280485537 280485550 280485565
> 280485568 280485585 280485587 280485601 280485613 280485634 280485639
> 280485656 280485669 280485684 280485690 280485693 280485712 280485730
> 280485754 280485757 280485779 280485801 280485808 280485811 280485830
> 280485856 280485880 280485900 280485920 280485941 280485946
>
> [ skipped several more messages of this kind]
>
> 2013-03-27 10:43:03 MSK 1535 @ from  [vxid:1/0 txid:0] [] LOG:  file
> "pg_subtrans/10B8" doesn't exist, reading as zeroes
> 2013-03-27 10:43:03 MSK 1535 @ from  [vxid:1/0 txid:0] [] CONTEXT:
> xlog redo xid assignment xtop 280549936: subxacts: 280555514 280555515
> 280555516 280555517 280555518 280555519 280555520 280555521 280555522
> 280555523 280555524 280555525 280555526 280555527 280555528 280555529
> 280555530 280555531 280555532 280555533 280555534 280555535 280555536
> 280555537 280555538 280555539 280555540 280555541 280555542 280555543
> 280555544 280555545 280555546 280555547 280555548 280555549 280555550
> 280555551 280555552 280555553 280555554 280555555 280555556 280555557
> 280555558 280555559 280555560 280555561 280555562 280555563 280555564
> 280555565 280555566 280555567 280555568 280555569 280555570 280555571
> 280555572 280555573 280555574 280555575 280555576 280555577
> 2013-03-27 10:43:03 MSK 1535 @ from  [vxid:1/0 txid:0] [] FATAL:
> could not access status of transaction 280557568
> 2013-03-27 10:43:03 MSK 1535 @ from  [vxid:1/0 txid:0] [] DETAIL:
> Could not read from file "pg_subtrans/10B8" at offset 253952: Success.
> 2013-03-27 10:43:03 MSK 1535 @ from  [vxid:1/0 txid:0] [] CONTEXT:
> xlog redo xid assignment xtop 280555981: subxacts: 280557520 280557521
> 280557522 280557523 280557524 280557525 280557526 280557527 280557528
> 280557529 280557530 280557531 280557532 280557533 280557534 280557535
> 280557536 280557537 280557538 280557539 280557540 280557541 280557542
> 280557543 280557544 280557545 280557546 280557547 280557548 280557549
> 280557550 280557551 280557552 280557553 280557554 280557555 280557556
> 280557557 280557558 280557559 280557560 280557561 280557562 280557563
> 280557564 280557565 280557566 280557567 280557568 280557569 280557570
> 280557571 280557572 280557573 280557574 280557575 280557576 280557577
> 280557578 280557579 280557580 280557581 280557582 280557583
> 2013-03-27 10:43:03 MSK 1532 @ from  [vxid: txid:0] [] LOG:  startup
> process (PID 1535) exited with exit code 1
> 2013-03-27 10:43:03 MSK 1532 @ from  [vxid: txid:0] [] LOG:
> terminating any other active server processes
>
> Thank you in advance.
>
> --
> Kind regards,
> Sergey Konoplev
> Database and Software Consultant
>
> Profile: http://www.linkedin.com/in/grayhemp
> Phone: USA +1 (415) 867-9984, Russia +7 (901) 903-0499, +7 (988) 888-1979
> Skype: gray-hemp
> Jabber: gray.ru@gmail.com
>
>
> --
> Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-bugs
>

Re: Completely broken replica after PANIC: WAL contains references to invalid pages

From

Simon Riggs

Date:

01 April 2013, 10:49:47

On 30 March 2013 17:21, Andres Freund <andres@2ndquadrant.com> wrote:

> So if the xid is later than latestObservedXid we extend subtrans one by
> one. So far so good. But we initialize it in
> ProcArrayApplyRecoveryInfo() when consistency is initially reached:
>                              latestObservedXid = running->nextXid;
>                              TransactionIdRetreat(latestObservedXid);
> Before that subtrans has initially been started up with:
>                         if (wasShutdown)
>                                 oldestActiveXID = PrescanPreparedTransactions(&xids, &nxids);
>                         else
>                                 oldestActiveXID = checkPoint.oldestActiveXid;
> ...
>                         StartupSUBTRANS(oldestActiveXID);
>
> That means its only initialized up to checkPoint.oldestActiveXid. As it
> can take some time till we reach consistency it seems rather plausible
> that there now will be a gap in initilized pages. From
> checkPoint.oldestActiveXid to running->nextXid if there are pages
> inbetween.

That was an old bug.

StartupSUBTRANS() now explicitly fills that gap. Are you saying it
does that incorrectly? How?

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Completely broken replica after PANIC: WAL contains references to invalid pages

From

Andres Freund

Date:

02 April 2013, 13:11:12

On 2013-04-01 08:49:16 +0100, Simon Riggs wrote:
> On 30 March 2013 17:21, Andres Freund <andres@2ndquadrant.com> wrote:
>
> > So if the xid is later than latestObservedXid we extend subtrans one by
> > one. So far so good. But we initialize it in
> > ProcArrayApplyRecoveryInfo() when consistency is initially reached:
> >                              latestObservedXid = running->nextXid;
> >                              TransactionIdRetreat(latestObservedXid);
> > Before that subtrans has initially been started up with:
> >                         if (wasShutdown)
> >                                 oldestActiveXID = PrescanPreparedTransactions(&xids, &nxids);
> >                         else
> >                                 oldestActiveXID = checkPoint.oldestActiveXid;
> > ...
> >                         StartupSUBTRANS(oldestActiveXID);
> >
> > That means its only initialized up to checkPoint.oldestActiveXid. As it
> > can take some time till we reach consistency it seems rather plausible
> > that there now will be a gap in initilized pages. From
> > checkPoint.oldestActiveXid to running->nextXid if there are pages
> > inbetween.
>
> That was an old bug.
>
> StartupSUBTRANS() now explicitly fills that gap. Are you saying it
> does that incorrectly? How?

Well, no. I think StartupSUBTRANS does this correctly, but there's a gap
between the call to Startup* and the first call to ExtendSUBTRANS. The
latter is only called *after* we reached STANDBY_INITIALIZED via
ProcArrayApplyRecoveryInfo(). The problem is that we StartupSUBTRANS to
checkPoint.oldestActiveXid while we start to ExtendSUBTRANS from
running->nextXid - 1. There very well can be a gap inbetween.
The window isn't terribly big but if you use subtransactions as heavily
as Sergey seems to be it doesn't seem unlikely to hit it.

Let me come up with a testcase and patch.

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Completely broken replica after PANIC: WAL contains references to invalid pages

From

Andres Freund

Date:

02 April 2013, 21:27:23

On 2013-04-02 12:10:12 +0200, Andres Freund wrote:
> On 2013-04-01 08:49:16 +0100, Simon Riggs wrote:
> > On 30 March 2013 17:21, Andres Freund <andres@2ndquadrant.com> wrote:
> >
> > > So if the xid is later than latestObservedXid we extend subtrans one by
> > > one. So far so good. But we initialize it in
> > > ProcArrayApplyRecoveryInfo() when consistency is initially reached:
> > >                              latestObservedXid = running->nextXid;
> > >                              TransactionIdRetreat(latestObservedXid);
> > > Before that subtrans has initially been started up with:
> > >                         if (wasShutdown)
> > >                                 oldestActiveXID = PrescanPreparedTransactions(&xids, &nxids);
> > >                         else
> > >                                 oldestActiveXID = checkPoint.oldestActiveXid;
> > > ...
> > >                         StartupSUBTRANS(oldestActiveXID);
> > >
> > > That means its only initialized up to checkPoint.oldestActiveXid. As it
> > > can take some time till we reach consistency it seems rather plausible
> > > that there now will be a gap in initilized pages. From
> > > checkPoint.oldestActiveXid to running->nextXid if there are pages
> > > inbetween.
> >
> > That was an old bug.
> >
> > StartupSUBTRANS() now explicitly fills that gap. Are you saying it
> > does that incorrectly? How?
>
> Well, no. I think StartupSUBTRANS does this correctly, but there's a gap
> between the call to Startup* and the first call to ExtendSUBTRANS. The
> latter is only called *after* we reached STANDBY_INITIALIZED via
> ProcArrayApplyRecoveryInfo(). The problem is that we StartupSUBTRANS to
> checkPoint.oldestActiveXid while we start to ExtendSUBTRANS from
> running->nextXid - 1. There very well can be a gap inbetween.
> The window isn't terribly big but if you use subtransactions as heavily
> as Sergey seems to be it doesn't seem unlikely to hit it.
>
> Let me come up with a testcase and patch.

Developing a testcase was trivial, pgbench running the following function:
CREATE OR REPLACE FUNCTION recurse_and_assign_txid(level bigint DEFAULT 0)
RETURNS bigint
LANGUAGE plpgsql AS $b$
BEGIN
    IF level < 500 THEN
        RETURN recurse_and_assign_txid(level + 1);
    ELSE
        -- assign xid in subtxn and parents
        CREATE TEMPORARY TABLE foo();
        DROP TABLE foo;
        RETURN txid_current()::bigint;
    END IF;
EXCEPTION WHEN others THEN
    RAISE NOTICE 'unexpected';
END
$b$;

When now restarting a standby (so it restarts from another checkpoint) it
frequently crashed with various errors:
* pg_subtrans/xxx does not exist
* (warning) pg_subtrans page does not exist, assuming zero
* xid overwritten in SubTransSetParent

So I think my theory is correct.

The attached patch fixes this although I don't like the way it knowledge of the
point up to which StartupSUBTRANS zeroes pages is handled.

Makes sense?

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

0001-Ensure-that-SUBTRANS-is-initalized-gaplessly-when-st.patch

Re: Completely broken replica after PANIC: WAL contains references to invalid pages

From

Sergey Konoplev

Date:

05 April 2013, 17:10:38

On Tue, Apr 2, 2013 at 11:26 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> The attached patch fixes this although I don't like the way it knowledge of the
> point up to which StartupSUBTRANS zeroes pages is handled.

Thank you for the patch, Andres.

Is it included in 9.2.4?

BTW, it has happened again and I am going to make a copy of the
cluster to be able to provide you some extra information. Do you still
need it?

--
Kind regards,
Sergey Konoplev
Database and Software Consultant

Profile: http://www.linkedin.com/in/grayhemp
Phone: USA +1 (415) 867-9984, Russia +7 (901) 903-0499, +7 (988) 888-1979
Skype: gray-hemp
Jabber: gray.ru@gmail.com

Re: Completely broken replica after PANIC: WAL contains references to invalid pages

From

Andres Freund

Date:

05 April 2013, 17:15:43

On 2013-04-05 07:10:12 -0700, Sergey Konoplev wrote:
> On Tue, Apr 2, 2013 at 11:26 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> > The attached patch fixes this although I don't like the way it knowledge of the
> > point up to which StartupSUBTRANS zeroes pages is handled.
>
> Thank you for the patch, Andres.
>
> Is it included in 9.2.4?

No. Too late for that. It hasn't bee committed yet.

> BTW, it has happened again and I am going to make a copy of the
> cluster to be able to provide you some extra information. Do you still
> need it?

Cool. It would be very helpful if you could apply the patch and verify
that it works, it has been written somewhat blindly. Also I am afraid
that at least last time there was a second bug involved.

Could you show the log?

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Completely broken replica after PANIC: WAL contains references to invalid pages

From

Sergey Konoplev

Date:

05 April 2013, 17:22:32

On Fri, Apr 5, 2013 at 7:15 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> Cool. It would be very helpful if you could apply the patch and verify
> that it works, it has been written somewhat blindly. Also I am afraid
> that at least last time there was a second bug involved.

Okay, I will try to talk to admins but I am afraid it could take long.

> Could you show the log?

2013-04-05 17:26:31 MSK 2113 @ from  [vxid: txid:0] [] LOG:  database
system was shut down in recovery at 2013-04-05 17:18:02 MSK
2013-04-05 17:26:32 MSK 2113 @ from  [vxid: txid:0] [] LOG:  entering
standby mode
2013-04-05 17:26:32 MSK 2113 @ from  [vxid:1/0 txid:0] [] LOG:  redo
starts at 25BD/907338F8
2013-04-05 17:26:32 MSK 2113 @ from  [vxid:1/0 txid:0] [] LOG:  file
"pg_subtrans/28E5" doesn't exist, reading as zeroes
2013-04-05 17:26:32 MSK 2113 @ from  [vxid:1/0 txid:0] [] CONTEXT:
xlog redo xid assignment xtop 686136255: subxacts: 686137344 686137345
686137346 686137347 686137348 686137349 686137350 686137351 686137352
686137353 686137354 686137355 686137356 686137357 686137358 686137359
686137360 686137361 686137362 686137363 686137364 686137365 686137366
686137367 686137368 686137369 686137370 686137371 686137372 686137373
686137374 686137375 686137376 686137377 686137378 686137379 686137380
686137381 686137382 686137383 686137384 686137385 686137386 686137387
686137388 686137389 686137390 686137391 686137392 686137393 686137394
686137395 686137396 686137397 686137398 686137399 686137400 686137401
686137402 686137403 686137404 686137405 686137406 686137407
2013-04-05 17:26:32 MSK 2113 @ from  [vxid:1/0 txid:0] [] LOG:  file
"pg_subtrans/28E5" doesn't exist, reading as zeroes
2013-04-05 17:26:32 MSK 2113 @ from  [vxid:1/0 txid:0] [] CONTEXT:
xlog redo xid assignment xtop 686136255: subxacts: 686139330 686139331
686139332 686139333 686139334 686139335 686139336 686139337 686139338
686139339 686139340 686139341 686139342 686139343 686139344 686139345
686139346 686139347 686139348 686139349 686139350 686139351 686139352
686139353 686139354 686139355 686139356 686139357 686139358 686139359
686139360 686139361 686139362 686139363 686139364 686139365 686139366
686139367 686139368 686139369 686139370 686139371 686139372 686139373
686139374 686139375 686139376 686139377 686139378 686139379 686139380
686139381 686139382 686139383 686139384 686139385 686139386 686139387
686139388 686139389 686139390 686139391 686139392 686139393

[some more like this]

2013-04-05 17:26:36 MSK 2113 @ from  [vxid:1/0 txid:0] [] LOG:  file
"pg_subtrans/28E6" doesn't exist, reading as zeroes
2013-04-05 17:26:36 MSK 2113 @ from  [vxid:1/0 txid:0] [] CONTEXT:
xlog redo xid assignment xtop 686216055: subxacts: 686222447 686222448
686222449 686222450 686222451 686222452 686222453 686222454 686222455
686222456 686222457 686222459 686222460 686222461 686222462 686222463
686222464 686222502 686222561 686222647 686222722 686223272 686223359
686223360 686223361 686223363 686223364 686223365 686223366 686223367
686223368 686223369 686223370 686223371 686223372 686223373 686223374
686223375 686223376 686223377 686223378 686223379 686223380 686223381
686223382 686223383 686223384 686223385 686223386 686223387 686223388
686223389 686223390 686223391 686223392 686223393 686223394 686223395
686223396 686223397 686223398 686223399 686223400 686223401
2013-04-05 17:26:36 MSK 2113 @ from  [vxid:1/0 txid:0] [] FATAL:
could not access status of transaction 686225586
2013-04-05 17:26:36 MSK 2113 @ from  [vxid:1/0 txid:0] [] DETAIL:
Could not read from file "pg_subtrans/28E6" at offset 253952: Success.
2013-04-05 17:26:36 MSK 2113 @ from  [vxid:1/0 txid:0] [] CONTEXT:
xlog redo xid assignment xtop 686225585: subxacts: 686225586 686225587
686225588 686225589 686225590 686225591 686225592 686225593 686225594
686225595 686225596 686225597 686225598 686225599 686225600 686225601
686225602 686225603 686225604 686225605 686225606 686225607 686225608
686225609 686225610 686225611 686225612 686225613 686225614 686225615
686225616 686225617 686225621 686225622 686225625 686225626 686225628
686225632 686225633 686225636 686225637 686225638 686225639 686225640
686225641 686225644 686225645 686225646 686225649 686225650 686225657
686225658 686225661 686225662 686225665 686225666 686225670 686225671
686225672 686225673 686225678 686225679 686225684 686225685
2013-04-05 17:26:36 MSK 2110 @ from  [vxid: txid:0] [] LOG:  startup
process (PID 2113) exited with exit code 1
2013-04-05 17:26:36 MSK 2110 @ from  [vxid: txid:0] [] LOG:
terminating any other active server processes


--
Kind regards,
Sergey Konoplev
Database and Software Consultant

Profile: http://www.linkedin.com/in/grayhemp
Phone: USA +1 (415) 867-9984, Russia +7 (901) 903-0499, +7 (988) 888-1979
Skype: gray-hemp
Jabber: gray.ru@gmail.com

Re: Completely broken replica after PANIC: WAL contains references to invalid pages

From

Andres Freund

Date:

05 April 2013, 17:33:11

On 2013-04-05 07:22:08 -0700, Sergey Konoplev wrote:
> On Fri, Apr 5, 2013 at 7:15 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> > Cool. It would be very helpful if you could apply the patch and verify
> > that it works, it has been written somewhat blindly. Also I am afraid
> > that at least last time there was a second bug involved.
>
> Okay, I will try to talk to admins but I am afraid it could take long.

Ok.

> > Could you show the log?
>
> 2013-04-05 17:26:31 MSK 2113 @ from  [vxid: txid:0] [] LOG:  database
> system was shut down in recovery at 2013-04-05 17:18:02 MSK
> 2013-04-05 17:26:32 MSK 2113 @ from  [vxid: txid:0] [] LOG:  entering
> standby mode
> 2013-04-05 17:26:32 MSK 2113 @ from  [vxid:1/0 txid:0] [] LOG:  redo
> starts at 25BD/907338F8
> 2013-04-05 17:26:32 MSK 2113 @ from  [vxid:1/0 txid:0] [] LOG:  file
> "pg_subtrans/28E5" doesn't exist, reading as zeroes

Looks like it could be fixed by the patch. But that seems to imply that
you restarted not long before that? Could you check if theres a
different error before those?

Greetings,

Andres Freund

PS: The tander.ru addresses seem to bounce all mail I send them...

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Completely broken replica after PANIC: WAL contains references to invalid pages

From

Sergey Konoplev

Date:

05 April 2013, 17:45:13

On Fri, Apr 5, 2013 at 7:33 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> Looks like it could be fixed by the patch. But that seems to imply that
> you restarted not long before that? Could you check if theres a
> different error before those?

Yes it had happened straight after restart this time. There are no any
errors in logs before it.


--
Kind regards,
Sergey Konoplev
Database and Software Consultant

Profile: http://www.linkedin.com/in/grayhemp
Phone: USA +1 (415) 867-9984, Russia +7 (901) 903-0499, +7 (988) 888-1979
Skype: gray-hemp
Jabber: gray.ru@gmail.com

Re: Completely broken replica after PANIC: WAL contains references to invalid pages

From

Sergey Konoplev

Date:

10 May 2013, 05:29:16

On Tue, Apr 2, 2013 at 11:26 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> The attached patch fixes this although I don't like the way it knowledge of the
> point up to which StartupSUBTRANS zeroes pages is handled.

One month has passed since the patched version was installed in our
production environment and can confirm that everything works perfect.
Thank you very much for your prompt help, Andres.

--
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

Profile: http://www.linkedin.com/in/grayhemp
Phone: USA +1 (415) 867-9984, Russia +7 (901) 903-0499, +7 (988) 888-1979
Skype: gray-hemp
Jabber: gray.ru@gmail.com

Re: Completely broken replica after PANIC: WAL contains references to invalid pages

From

Sergey Konoplev

Date:

11 June 2013, 06:37:12

Hi,

On Thu, May 9, 2013 at 7:28 PM, Sergey Konoplev <gray.ru@gmail.com> wrote:
> On Tue, Apr 2, 2013 at 11:26 AM, Andres Freund <andres@2ndquadrant.com> wrote:
>> The attached patch fixes this although I don't like the way it knowledge of the
>> point up to which StartupSUBTRANS zeroes pages is handled.
>
> One month has passed since the patched version was installed in our
> production environment and can confirm that everything works perfect.
> Thank you very much for your prompt help, Andres.

Are there any plans to commit this patch and what version it is going
to be done to?

Thank you.

--
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

Profile: http://www.linkedin.com/in/grayhemp
Phone: USA +1 (415) 867-9984, Russia +7 (901) 903-0499, +7 (988) 888-1979
Skype: gray-hemp
Jabber: gray.ru@gmail.com

Re: Completely broken replica after PANIC: WAL contains references to invalid pages

From

Simon Riggs

Date:

11 June 2013, 09:43:14

On 11 June 2013 04:36, Sergey Konoplev <gray.ru@gmail.com> wrote:
> Hi,
>
> On Thu, May 9, 2013 at 7:28 PM, Sergey Konoplev <gray.ru@gmail.com> wrote:
>> On Tue, Apr 2, 2013 at 11:26 AM, Andres Freund <andres@2ndquadrant.com> wrote:
>>> The attached patch fixes this although I don't like the way it knowledge of the
>>> point up to which StartupSUBTRANS zeroes pages is handled.
>>
>> One month has passed since the patched version was installed in our
>> production environment and can confirm that everything works perfect.
>> Thank you very much for your prompt help, Andres.
>
> Are there any plans to commit this patch and what version it is going
> to be done to?
>
> Thank you.

I'll be committing this soon, since we're likely coming up to the next
point release soon.

Thanks for the reminder.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Completely broken replica after PANIC: WAL contains references to invalid pages

From

Sergey Konoplev

Date:

11 June 2013, 10:09:40

On Mon, Jun 10, 2013 at 11:43 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 11 June 2013 04:36, Sergey Konoplev <gray.ru@gmail.com> wrote:
>> Are there any plans to commit this patch and what version it is going
>> to be done to?
>
> I'll be committing this soon, since we're likely coming up to the next
> point release soon.

I see, thank you.

Just curious, what is the planned date for the next minor release, and
BTW where is it possible to see the roadmap for minor releases?

> Thanks for the reminder.

You are welcome.

--
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

Profile: http://www.linkedin.com/in/grayhemp
Phone: USA +1 (415) 867-9984, Russia +7 (901) 903-0499, +7 (988) 888-1979
Skype: gray-hemp
Jabber: gray.ru@gmail.com

Re: Completely broken replica after PANIC: WAL contains references to invalid pages

From

Tom Lane

Date:

11 June 2013, 16:50:20

Sergey Konoplev <gray.ru@gmail.com> writes:
> Just curious, what is the planned date for the next minor release, and
> BTW where is it possible to see the roadmap for minor releases?

There is no planned date, and certainly no "roadmap".  We make minor
releases when the core team judges that enough (or severe enough)
fixes have accumulated since the last time.  Historically we've averaged
about four minor releases a year, but that's not set in stone anywhere.

            regards, tom lane

Re: Completely broken replica after PANIC: WAL contains references to invalid pages

From

Sergey Konoplev

Date:

20 August 2013, 08:49:04

Hi all,

I recently noticed that I continue to receive "allocation failure"
messages from postmaster in /var/log/messages on the server where the
problem occurred.

Aug 17 23:00:51 tms2 kernel: : postmaster: page allocation failure.
order:5, mode:0xd0
Aug 17 23:00:51 tms2 kernel: : Pid: 21223, comm: postmaster Not
tainted 2.6.32-279.22.1.el6.x86_64 #1
Aug 17 23:00:51 tms2 kernel: : Call Trace:
Aug 17 23:00:51 tms2 kernel: : [<ffffffff8112343f>] ?
__alloc_pages_nodemask+0x77f/0x940
Aug 17 23:00:51 tms2 kernel: : [<ffffffff8115d3e2>] ? kmem_getpages+0x62/0x170
Aug 17 23:00:51 tms2 kernel: : [<ffffffff8115dffa>] ? fallback_alloc+0x1ba/0x270
Aug 17 23:00:51 tms2 kernel: : [<ffffffff8115da4f>] ? cache_grow+0x2cf/0x320
Aug 17 23:00:51 tms2 kernel: : [<ffffffff8115dd79>] ?
____cache_alloc_node+0x99/0x160
Aug 17 23:00:51 tms2 kernel: : [<ffffffff813fd455>] ?
dma_pin_iovec_pages+0xb5/0x230
Aug 17 23:00:51 tms2 kernel: : [<ffffffff8115eb49>] ? __kmalloc+0x189/0x220
Aug 17 23:00:51 tms2 kernel: : [<ffffffff813fd455>] ?
dma_pin_iovec_pages+0xb5/0x230
Aug 17 23:00:51 tms2 kernel: : [<ffffffff8141a47c>] ? lock_sock_nested+0xac/0xc0
Aug 17 23:00:51 tms2 kernel: : [<ffffffff8146edaa>] ? tcp_recvmsg+0x4ca/0xe80
Aug 17 23:00:51 tms2 kernel: : [<ffffffff8148f90a>] ? inet_recvmsg+0x5a/0x90
Aug 17 23:00:51 tms2 kernel: : [<ffffffff81418b93>] ? sock_recvmsg+0x133/0x160
Aug 17 23:00:51 tms2 kernel: : [<ffffffff81090be0>] ?
autoremove_wake_function+0x0/0x40
Aug 17 23:00:51 tms2 kernel: : [<ffffffff8117660a>] ? do_sync_read+0xfa/0x140
Aug 17 23:00:51 tms2 kernel: : [<ffffffff81418d0e>] ? sys_recvfrom+0xee/0x180
Aug 17 23:00:51 tms2 kernel: : [<ffffffff8118b09d>] ?
poll_select_set_timeout+0x8d/0xa0
Aug 17 23:00:51 tms2 kernel: : [<ffffffff810d3f47>] ?
audit_syscall_entry+0x1d7/0x200
Aug 17 23:00:51 tms2 kernel: : [<ffffffff8100b072>] ?
system_call_fastpath+0x16/0x1b

Except these messages everything is just fine. The server is a passive
(very rarely queried) hot standby.

Are there any ideas of why and what could it be?

Thank you.

On Tue, Jun 11, 2013 at 6:50 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Sergey Konoplev <gray.ru@gmail.com> writes:
>> Just curious, what is the planned date for the next minor release, and
>> BTW where is it possible to see the roadmap for minor releases?
>
> There is no planned date, and certainly no "roadmap".  We make minor
> releases when the core team judges that enough (or severe enough)
> fixes have accumulated since the last time.  Historically we've averaged
> about four minor releases a year, but that's not set in stone anywhere.
>
>                         regards, tom lane



--
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

http://www.linkedin.com/in/grayhemp
+1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979
gray.ru@gmail.com

Re: Completely broken replica after PANIC: WAL contains references to invalid pages

From

Sergey Konoplev

Date:

11 October 2013, 00:34:06

On Tue, Jun 11, 2013 at 6:50 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Sergey Konoplev <gray.ru@gmail.com> writes:
>> Just curious, what is the planned date for the next minor release, and
>> BTW where is it possible to see the roadmap for minor releases?
>
> There is no planned date, and certainly no "roadmap".  We make minor
> releases when the core team judges that enough (or severe enough)
> fixes have accumulated since the last time.  Historically we've averaged
> about four minor releases a year, but that's not set in stone anywhere.

Could you please give me a hint of how to check if this patch was
included in 9.2.5 or not?

--
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

http://www.linkedin.com/in/grayhemp
+1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979
gray.ru@gmail.com

Re: Completely broken replica after PANIC: WAL contains references to invalid pages

From

Alvaro Herrera

Date:

11 October 2013, 00:48:48

Sergey Konoplev escribió:
> On Tue, Jun 11, 2013 at 6:50 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Sergey Konoplev <gray.ru@gmail.com> writes:
> >> Just curious, what is the planned date for the next minor release, and
> >> BTW where is it possible to see the roadmap for minor releases?
> >
> > There is no planned date, and certainly no "roadmap".  We make minor
> > releases when the core team judges that enough (or severe enough)
> > fixes have accumulated since the last time.  Historically we've averaged
> > about four minor releases a year, but that's not set in stone anywhere.
>
> Could you please give me a hint of how to check if this patch was
> included in 9.2.5 or not?

Yes, this was committed in June:

commit 99ee15b315c187045a95db7b27fd9d866aea93e0
Author: Simon Riggs <simon@2ndQuadrant.com>
Date:   Sun Jun 23 11:05:02 2013 +0100

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: Completely broken replica after PANIC: WAL contains references to invalid pages

From

Sergey Konoplev

Date:

11 October 2013, 01:02:13

On Thu, Oct 10, 2013 at 2:48 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
>> Could you please give me a hint of how to check if this patch was
>> included in 9.2.5 or not?
>
> Yes, this was committed in June:
>
> commit 99ee15b315c187045a95db7b27fd9d866aea93e0
> Author: Simon Riggs <simon@2ndQuadrant.com>
> Date:   Sun Jun 23 11:05:02 2013 +0100

Good news, thank you.

--
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

http://www.linkedin.com/in/grayhemp
+1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979
gray.ru@gmail.com

Re: Completely broken replica after PANIC: WAL contains references to invalid pages

From

Sergey Konoplev

Date:

01 November 2013, 10:35:44

On Tue, Apr 2, 2013 at 11:26 AM, Andres Freund <andres@2ndquadrant.com> wro=
te:
> The attached patch fixes this although I don't like the way it knowledge =
of the
> point up to which StartupSUBTRANS zeroes pages is handled.

So, after half a year the same failure has happened again on the same
replica, but now patched with the Andres' patch (9.2.4 + the patch)
that was supposed to fix it.

Here is the link to the full conversation.

http://www.postgresql.org/message-id/flat/CAL_0b1t=3DWuM6roO8dki=3Dw8DhH8P8=
whhohbPjReymmQUrOcNT2A@mail.gmail.com

Here is the logs.

2013-10-31 22:51:44 MSK 30711 @ from  [vxid:1/0 txid:0] [] WARNING:
page 27415 of relation base/16436/3220672275 is uninitialized
2013-10-31 22:51:44 MSK 30711 @ from  [vxid:1/0 txid:0] [] CONTEXT:
xlog redo visible: rel 1663/16436/3220672275; blk 27415
2013-10-31 22:51:44 MSK 30711 @ from  [vxid:1/0 txid:0] [] PANIC:  WAL
contains references to invalid pages
2013-10-31 22:51:44 MSK 30711 @ from  [vxid:1/0 txid:0] [] CONTEXT:
xlog redo visible: rel 1663/16436/3220672275; blk 27415
2013-10-31 22:51:44 MSK 30708 @ from  [vxid: txid:0] [] LOG:  startup
process (PID 30711) was terminated by signal 6: Aborted
2013-10-31 22:51:44 MSK 30708 @ from  [vxid: txid:0] [] LOG:
terminating any other active server processes

I saved the base/16436/3220672275* files and pg_xlog directory, just in cas=
e.

On attempt to restart it printed the same in logs and didn't started.

2013-11-01 08:15:25 MSK 767 @ from  [vxid:1/0 txid:0] [] LOG:
consistent recovery state reached at 2F02/2774CA28
2013-11-01 08:15:25 MSK 764 @ from  [vxid: txid:0] [] LOG:  database
system is ready to accept read only connections
2013-11-01 08:15:25 MSK 767 @ from  [vxid:1/0 txid:0] [] WARNING:
page 27415 of relation base/16436/3220672275 is uninitialized
2013-11-01 08:15:25 MSK 767 @ from  [vxid:1/0 txid:0] [] CONTEXT:
xlog redo visible: rel 1663/16436/3220672275; blk 27415
2013-11-01 08:15:25 MSK 767 @ from  [vxid:1/0 txid:0] [] PANIC:  WAL
contains references to invalid pages
2013-11-01 08:15:25 MSK 767 @ from  [vxid:1/0 txid:0] [] CONTEXT:
xlog redo visible: rel 1663/16436/3220672275; blk 27415
2013-11-01 08:15:25 MSK 764 @ from  [vxid: txid:0] [] LOG:  startup
process (PID 767) was terminated by signal 6: Aborted
2013-11-01 08:15:25 MSK 764 @ from  [vxid: txid:0] [] LOG:
terminating any other active server processes

Here is the pg_controldata ouptut.

pg_control version number:            922
Catalog version number:               201204301
Database system identifier:           5858109675396804534
Database cluster state:               in archive recovery
pg_control last modified:             =D0=9F=D1=82=D0=BD 01 =D0=9D=D0=BE=D1=
=8F 2013 07:52:08
Latest checkpoint location:           2F00/C9BCE828
Prior checkpoint location:            2F00/C9BCE828
Latest checkpoint's REDO location:    2F00/32F59B70
Latest checkpoint's TimeLineID:       2
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID:          3/805663702
Latest checkpoint's NextOID:          3227099776
Latest checkpoint's NextMultiXactId:  4809163
Latest checkpoint's NextMultiOffset:  21342992
Latest checkpoint's oldestXID:        605734616
Latest checkpoint's oldestXID's DB:   16436
Latest checkpoint's oldestActiveXID:  805262681
Time of latest checkpoint:            =D0=A7=D1=82=D0=B2 31 =D0=9E=D0=BA=D1=
=82 2013 21:00:02
Minimum recovery ending location:     2F02/2774CA28
Backup start location:                0/0
Backup end location:                  0/0
End-of-backup record required:        no
Current wal_level setting:            hot_standby
Current max_connections setting:      550
Current max_prepared_xacts setting:   0
Current max_locks_per_xact setting:   64
Maximum data alignment:               8
Database block size:                  8192
Blocks per segment of large relation: 131072
WAL block size:                       8192
Bytes per WAL segment:                16777216
Maximum length of identifiers:        64
Maximum columns in an index:          32
Maximum size of a TOAST chunk:        1996
Date/time type storage:               64-bit integers
Float4 argument passing:              by value
Float8 argument passing:              by value

Any thoughts?

--=20
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

http://www.linkedin.com/in/grayhemp
+1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979
gray.ru@gmail.com

Re: Completely broken replica after PANIC: WAL contains references to invalid pages

From

Andres Freund

Date:

04 November 2013, 12:49:02

On 2013-11-01 00:35:18 -0700, Sergey Konoplev wrote:
> On Tue, Apr 2, 2013 at 11:26 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> > The attached patch fixes this although I don't like the way it knowledge of the
> > point up to which StartupSUBTRANS zeroes pages is handled.
>
> So, after half a year the same failure has happened again on the same
> replica, but now patched with the Andres' patch (9.2.4 + the patch)
> that was supposed to fix it.

I think that's actually a different bug - what the patch was fixing was
the clog/subtrans errors, not the "reference to invalid pages".

I've submitted a fix for at least one cause causing that independently
from this bug which is included in 9.2.5
(17fa4c321ccf9693de406faffe6b235e949aa25f). Note thought that that fix
needed a fix iff using a cassert enabled build
(4da24f12e63313b7dbb6b3e3d0317e04045df636) which is *not* in 9.2.5.

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Completely broken replica after PANIC: WAL contains references to invalid pages

From

Sergey Konoplev

Date:

05 November 2013, 23:03:45

On Mon, Nov 4, 2013 at 1:48 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> On 2013-11-01 00:35:18 -0700, Sergey Konoplev wrote:
>> On Tue, Apr 2, 2013 at 11:26 AM, Andres Freund <andres@2ndquadrant.com> wrote:
>> > The attached patch fixes this although I don't like the way it knowledge of the
>> > point up to which StartupSUBTRANS zeroes pages is handled.
>>
>> So, after half a year the same failure has happened again on the same
>> replica, but now patched with the Andres' patch (9.2.4 + the patch)
>> that was supposed to fix it.
>
> I think that's actually a different bug - what the patch was fixing was
> the clog/subtrans errors, not the "reference to invalid pages".

Can I do something to help fixing this particular bug? I backed up the
relation and xlog files in case if someone need to dig it deeper.

> I've submitted a fix for at least one cause causing that independently
> from this bug which is included in 9.2.5
> (17fa4c321ccf9693de406faffe6b235e949aa25f). Note thought that that fix
> needed a fix iff using a cassert enabled build
> (4da24f12e63313b7dbb6b3e3d0317e04045df636) which is *not* in 9.2.5.

--
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

http://www.linkedin.com/in/grayhemp
+1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979
gray.ru@gmail.com

Re: Completely broken replica after PANIC: WAL contains references to invalid pages

From

Andres Freund

Date:

05 November 2013, 23:05:16

On 2013-11-05 12:03:19 -0800, Sergey Konoplev wrote:
> On Mon, Nov 4, 2013 at 1:48 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> > On 2013-11-01 00:35:18 -0700, Sergey Konoplev wrote:
> >> On Tue, Apr 2, 2013 at 11:26 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> >> > The attached patch fixes this although I don't like the way it knowledge of the
> >> > point up to which StartupSUBTRANS zeroes pages is handled.
> >>
> >> So, after half a year the same failure has happened again on the same
> >> replica, but now patched with the Andres' patch (9.2.4 + the patch)
> >> that was supposed to fix it.
> >
> > I think that's actually a different bug - what the patch was fixing was
> > the clog/subtrans errors, not the "reference to invalid pages".
>
> Can I do something to help fixing this particular bug? I backed up the
> relation and xlog files in case if someone need to dig it deeper.
>
> > I've submitted a fix for at least one cause causing that independently
> > from this bug which is included in 9.2.5
> > (17fa4c321ccf9693de406faffe6b235e949aa25f). Note thought that that fix
> > needed a fix iff using a cassert enabled build
> > (4da24f12e63313b7dbb6b3e3d0317e04045df636) which is *not* in 9.2.5.

As quoted above, I think the fix is in 9.2.5.

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Completely broken replica after PANIC: WAL contains references to invalid pages

From

Sergey Konoplev

Date:

05 November 2013, 23:12:46

On Tue, Nov 5, 2013 at 12:04 PM, Andres Freund <andres@2ndquadrant.com> wrote:
>> > I've submitted a fix for at least one cause causing that independently
>> > from this bug which is included in 9.2.5
>> > (17fa4c321ccf9693de406faffe6b235e949aa25f). Note thought that that fix
>> > needed a fix iff using a cassert enabled build
>> > (4da24f12e63313b7dbb6b3e3d0317e04045df636) which is *not* in 9.2.5.
>
> As quoted above, I think the fix is in 9.2.5.

Oh, looks like I understand it wrong. Thank you.

--
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

http://www.linkedin.com/in/grayhemp
+1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979
gray.ru@gmail.com