Thread: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages
Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages
From
Sergey Konoplev
Date:
Hi, Linux 2.6.32 PostgreSQL 9.2.6 It was suffering from this problem on 9.2.4, mostly last couple of weeks when I had to rebuild the replica almost every 3 days, and I hoped it would be fixed in 9.2.6, but it is not. Yesterday I upgraded both master and replica servers to 9.2.6, performed SET vacuum_freeze_table_age =3D 0; VACUUM; and then rebuild the replica from scratch. Below are replica logs, pg_controldata output, configuration and the backtrace of the core dump. I can keep the replica untouched (without rebuilding it) for several more hours, so please let me know if you need me to show you something else. Replica logs: 2013-12-19 20:51:22 MSK 19938 @ from [vxid:1/0 txid:0] [] WARNING: page 14833 of relation base/16436/3321003988 is uninitialized 2013-12-19 20:51:22 MSK 19938 @ from [vxid:1/0 txid:0] [] CONTEXT: xlog redo vacuum: rel 1663/16436/3321003988; blk 38538, lastBlockVacuumed 0 2013-12-19 20:51:22 MSK 19938 @ from [vxid:1/0 txid:0] [] PANIC: WAL contains references to invalid pages 2013-12-19 20:51:22 MSK 19938 @ from [vxid:1/0 txid:0] [] CONTEXT: xlog redo vacuum: rel 1663/16436/3321003988; blk 38538, lastBlockVacuumed 0 2013-12-19 20:51:22 MSK 19935 @ from [vxid: txid:0] [] LOG: startup process (PID 19938) was terminated by signal 6: Aborted 2013-12-19 20:51:22 MSK 19935 @ from [vxid: txid:0] [] LOG: terminating any other active server processes pg_controldata output: pg_control version number: 922 Catalog version number: 201204301 Database system identifier: 5858109675396804534 Database cluster state: in archive recovery pg_control last modified: Thu 19 Dec 2013 08:37:31 PM MSK Latest checkpoint location: 3271/79F16848 Prior checkpoint location: 3270/77E32598 Latest checkpoint's REDO location: 3270/929671E0 Latest checkpoint's TimeLineID: 2 Latest checkpoint's full_page_writes: on Latest checkpoint's NextXID: 4/1572485244 Latest checkpoint's NextOID: 3349674903 Latest checkpoint's NextMultiXactId: 6730476 Latest checkpoint's NextMultiOffset: 30866630 Latest checkpoint's oldestXID: 1340099111 Latest checkpoint's oldestXID's DB: 16436 Latest checkpoint's oldestActiveXID: 1572038146 Time of latest checkpoint: Thu 19 Dec 2013 07:00:09 PM MSK Minimum recovery ending location: 3272/666EE368 Backup start location: 0/0 Backup end location: 0/0 End-of-backup record required: no Current wal_level setting: hot_standby Current max_connections setting: 550 Current max_prepared_xacts setting: 0 Current max_locks_per_xact setting: 64 Maximum data alignment: 8 Database block size: 8192 Blocks per segment of large relation: 131072 WAL block size: 8192 Bytes per WAL segment: 16777216 Maximum length of identifiers: 64 Maximum columns in an index: 32 Maximum size of a TOAST chunk: 1996 Date/time type storage: 64-bit integers Float4 argument passing: by value Float8 argument passing: by value Non default configuration: application_name =3D psql archive_command =3D (disabled) autovacuum_analyze_scale_factor =3D 0.05 autovacuum_max_workers =3D 5 autovacuum_naptime =3D 5 autovacuum_vacuum_cost_delay =3D 5 autovacuum_vacuum_scale_factor =3D 0.05 bgwriter_delay =3D 10 bgwriter_lru_multiplier =3D 10 checkpoint_completion_target =3D 0.9 checkpoint_segments =3D 1024 checkpoint_timeout =3D 3600 checkpoint_warning =3D 300 client_encoding =3D UTF8 DateStyle =3D ISO, DMY default_text_search_config =3D pg_catalog.russian effective_cache_size =3D 17432576 effective_io_concurrency =3D 32 hot_standby =3D on hot_standby_feedback =3D on lc_collate =3D ru_RU.UTF-8 lc_ctype =3D ru_RU.UTF-8 lc_messages =3D en_US.UTF-8 lc_monetary =3D ru_RU.UTF-8 lc_numeric =3D ru_RU.UTF-8 lc_time =3D ru_RU.UTF-8 listen_addresses =3D * log_checkpoints =3D on log_directory =3D /var/log/pgsql log_file_mode =3D 0600 log_filename =3D postgresql-%Y-%m-%d.log log_line_prefix =3D %t %p %u@%d from %h [vxid:%v txid:%x] [%i] log_lock_waits =3D on log_rotation_size =3D 0 log_statement =3D ddl log_timezone =3D W-SU log_truncate_on_rotation =3D on logging_collector =3D on maintenance_work_mem =3D 1048576 max_connections =3D 550 max_stack_depth =3D 2048 max_standby_streaming_delay =3D 300000 max_wal_senders =3D 3 pg_stat_statements.track =3D all port =3D 6432 server_encoding =3D UTF8 shared_buffers =3D 4587520 shared_preload_libraries =3D pg_stat_statements synchronous_commit =3D off TimeZone =3D W-SU track_activity_query_size =3D 4096 transaction_isolation =3D read committed unix_socket_permissions =3D 0777 wal_buffers =3D 2048 wal_keep_segments =3D 2048 wal_level =3D hot_standby work_mem =3D 262144 Backtrace of the coredump: Core was generated by `postgres: startup process recovering 000000020000327200000066 '. Program terminated with signal 6, Aborted. #0 0x00007f6a5a8818a5 in raise () from /lib64/libc.so.6 Missing separate debuginfos, use: debuginfo-install postgresql92-server-9.2.6-1PGDG.el6.x86_64 (gdb) set print pretty on (gdb) bt f #0 0x00007f6a5a8818a5 in raise () from /lib64/libc.so.6 No symbol table info available. #1 0x00007f6a5a883085 in abort () from /lib64/libc.so.6 No symbol table info available. #2 0x000000000071c580 in errfinish (dummy=3D<value optimized out>) at elog= .c:534 edata =3D 0xb88fc0 elevel =3D 22 oldcontext =3D 0x134a740 econtext =3D 0x0 __func__ =3D "errfinish" #3 0x000000000071d09c in elog_finish (elevel=3D<value optimized out>, fmt=3D<value optimized out>) at elog.c:1213 edata =3D 0xb88fc0 oldcontext =3D 0x134a740 __func__ =3D "elog_finish" #4 0x00000000004a6d6d in log_invalid_page (node=3D..., forkno=3DMAIN_FORKNUM, blkno=3D14833, present=3D1 '\001') at xlogutils.c:91 key =3D { node =3D { spcNode =3D 1663, dbNode =3D 16436, relNode =3D 3321003988 }, forkno =3D 25867822, blkno =3D 0 } hentry =3D <value optimized out> found =3D <value optimized out> __func__ =3D "log_invalid_page" #5 0x00000000004a6ea0 in XLogReadBufferExtended (rnode=3D..., forknum=3DMAIN_FORKNUM, blkno=3D14833, mode=3DRBM_NORMAL) at xlogutils.c:35= 8 page =3D <value optimized out> lastblock =3D 38539 buffer =3D <value optimized out> smgr =3D <value optimized out> #6 0x00000000004897a2 in btree_xlog_vacuum (lsn=3D<value optimized out>, record=3D0x7f6a4fb20010) at nbtxlog.c:510 blkno =3D 14833 xlrec =3D 0x7f6a4fb20030 buffer =3D <value optimized out> page =3D <value optimized out> opaque =3D <value optimized out> #7 btree_redo (lsn=3D<value optimized out>, record=3D0x7f6a4fb20010) at nbtxlog.c:1062 info =3D <value optimized out> __func__ =3D "btree_redo" #8 0x00000000004a3468 in StartupXLOG () at xlog.c:6931 recoveryContinue =3D 1 '\001' recoveryApply =3D <value optimized out> errcontext =3D { previous =3D 0x0, callback =3D 0x49d5a0 <rm_redo_error_callback>, arg =3D 0x7f6a4fb20010 } xtime =3D <value optimized out> rmid =3D <value optimized out> xlogctl =3D 0x7f6a6de08480 Insert =3D <value optimized out> checkPoint =3D { redo =3D { xlogid =3D 12896, xrecoff =3D 4261478784 }, ThisTimeLineID =3D 2, fullPageWrites =3D 1 '\001', nextXidEpoch =3D 4, nextXid =3D 1452297001, nextOid =3D 3344849815, nextMulti =3D 6695025, nextMultiOffset =3D 30699263, oldestXid =3D 1256929431, oldestXidDB =3D 12865, time =3D 1387392623, oldestActiveXid =3D 1452282892 } wasShutdown =3D <value optimized out> reachedStopPoint =3D 0 '\000' haveBackupLabel =3D <value optimized out> RecPtr =3D <value optimized out> checkPointLoc =3D { xlogid =3D 12896, xrecoff =3D 4271100680 } EndOfLog =3D <value optimized out> endLogId =3D <value optimized out> endLogSeg =3D <value optimized out> record =3D 0x7f6a4fb20010 freespace =3D <value optimized out> oldestActiveXID =3D <value optimized out> backupEndRequired =3D <value optimized out> backupFromStandby =3D <value optimized out> dbstate_at_startup =3D <value optimized out> __func__ =3D "StartupXLOG" #9 0x00000000006153b8 in StartupProcessMain () at startup.c:220 __func__ =3D "StartupProcessMain" #10 0x00000000004c3837 in AuxiliaryProcessMain (argc=3D2, argv=3D0x7fffa02b0100) at bootstrap.c:418 progname =3D 0x854a1a "postgres" flag =3D <value optimized out> userDoption =3D 0x0 __func__ =3D "AuxiliaryProcessMain" #11 0x00000000006105f3 in StartChildProcess (type=3DStartupProcess) at postmaster.c:4522 pid =3D <value optimized out> av =3D {0x854a1a "postgres", 0x7fffa02b0150 "-x2", 0x0, 0x7fffa02f0814 "H\213\025e\376\377\377H+B(\213J8H#B0H\017\257\301\213J<H\323\370D;\"u\245H= \003C\b1\322\061\311H\213\063H=3D\377=C9=9A;v\033\017\037\204", 0x7fffa02b0170 "\300\226\064\001", 0x0, 0x7fffa02b0150 "-x2", 0x7fffa02f0814 "H\213\025e\376\377\377H+B(\213J8H#B0H\017\257\301\213J<H\323\370D;\"u\245H= \003C\b1\322\061\311H\213\063H=3D\377=C9=9A;v\033\017\037\204", 0xb41080 ">-\262R", 0x0} ac =3D 2 typebuf =3D "-x2\000\377\177\000\000\274\b/\240\377\177\000\000\005\000\000\000\000\000= \000\000\005\000\000\000\000\000\000" __func__ =3D "StartChildProcess" #12 0x0000000000614c3c in PostmasterMain (argc=3D<value optimized out>, argv=3D<value optimized out>) at postmaster.c:1123 opt =3D <value optimized out> status =3D <value optimized out> userDoption =3D <value optimized out> listen_addr_saved =3D 1 '\001' i =3D <value optimized out> __func__ =3D "PostmasterMain" #13 0x00000000005b37e0 in main (argc=3D5, argv=3D0x13496a0) at main.c:199 No locals. --=20 Kind regards, Sergey Konoplev PostgreSQL Consultant and DBA http://www.linkedin.com/in/grayhemp +1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979 gray.ru@gmail.com
Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages
From
Andres Freund
Date:
Hi Sergey, Heikki, On 2013-12-19 14:37:04 -0800, Sergey Konoplev wrote: > It was suffering from this problem on 9.2.4, mostly last couple of > weeks when I had to rebuild the replica almost every 3 days, and I > hoped it would be fixed in 9.2.6, but it is not. This actually is a separate bug from the one fixed. Not that the CONTEXT message isn't talking about "visible" but about "vacuum". > Yesterday I upgraded both master and replica servers to 9.2.6, > performed SET vacuum_freeze_table_age = 0; VACUUM; and then rebuild > the replica from scratch. > 2013-12-19 20:51:22 MSK 19938 @ from [vxid:1/0 txid:0] [] WARNING: > page 14833 of relation base/16436/3321003988 is uninitialized > 2013-12-19 20:51:22 MSK 19938 @ from [vxid:1/0 txid:0] [] CONTEXT: > xlog redo vacuum: rel 1663/16436/3321003988; blk 38538, > lastBlockVacuumed 0 > 2013-12-19 20:51:22 MSK 19938 @ from [vxid:1/0 txid:0] [] PANIC: WAL > contains references to invalid pages > 2013-12-19 20:51:22 MSK 19938 @ from [vxid:1/0 txid:0] [] CONTEXT: > xlog redo vacuum: rel 1663/16436/3321003988; blk 38538, > lastBlockVacuumed 0 Heikki, that's the redo routine for XLOG_BTREE_VACUUM, you probably know that code best atm, and it might influence on how the new split and deletion code works... Greetings, Andres
Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages
From
Sergey Konoplev
Date:
On Thu, Dec 19, 2013 at 3:04 PM, Andres Freund <andres@2ndquadrant.com> wrote: > On 2013-12-19 14:37:04 -0800, Sergey Konoplev wrote: >> It was suffering from this problem on 9.2.4, mostly last couple of >> weeks when I had to rebuild the replica almost every 3 days, and I >> hoped it would be fixed in 9.2.6, but it is not. > > This actually is a separate bug from the one fixed. Not that the CONTEXT > message isn't talking about "visible" but about "vacuum". Oh, I see. I have completely lost in these bugs last several months. Are there anything I can do to prevent it except waiting for a patch? -- Kind regards, Sergey Konoplev PostgreSQL Consultant and DBA http://www.linkedin.com/in/grayhemp +1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979 gray.ru@gmail.com
Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages
From
Heikki Linnakangas
Date:
On 12/20/2013 01:13 AM, Sergey Konoplev wrote: > On Thu, Dec 19, 2013 at 3:04 PM, Andres Freund <andres@2ndquadrant.com> wrote: >> On 2013-12-19 14:37:04 -0800, Sergey Konoplev wrote: >>> It was suffering from this problem on 9.2.4, mostly last couple of >>> weeks when I had to rebuild the replica almost every 3 days, and I >>> hoped it would be fixed in 9.2.6, but it is not. >> >> This actually is a separate bug from the one fixed. Not that the CONTEXT >> message isn't talking about "visible" but about "vacuum". > > Oh, I see. I have completely lost in these bugs last several months. > > Are there anything I can do to prevent it except waiting for a patch? I wonder if this might be the same bug Tomonari Katsumata reported yesterday: http://www.postgresql.org/message-id/E1VtTni-00082E-Jv@wrigleys.postgresql.org. Since this keeps happening to you, could you try the patch he posted and see if it helps? - Heikki
Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages
From
Sergey Konoplev
Date:
On Fri, Dec 20, 2013 at 11:34 AM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote: > On 12/20/2013 01:13 AM, Sergey Konoplev wrote: >> On Thu, Dec 19, 2013 at 3:04 PM, Andres Freund <andres@2ndquadrant.com> >> wrote: >>> On 2013-12-19 14:37:04 -0800, Sergey Konoplev wrote: >>>> >>>> It was suffering from this problem on 9.2.4, mostly last couple of >>>> weeks when I had to rebuild the replica almost every 3 days, and I >>>> hoped it would be fixed in 9.2.6, but it is not. >>> >>> >>> This actually is a separate bug from the one fixed. Not that the CONTEXT >>> message isn't talking about "visible" but about "vacuum". >> >> >> Oh, I see. I have completely lost in these bugs last several months. >> >> Are there anything I can do to prevent it except waiting for a patch? > > I wonder if this might be the same bug Tomonari Katsumata reported > yesterday: > http://www.postgresql.org/message-id/E1VtTni-00082E-Jv@wrigleys.postgresql.org. > Since this keeps happening to you, could you try the patch he posted and see > if it helps? For me it looks like something completely different from the Tomonari's bug. I does not have problems with restarting and it always has something to do with vacuum. The error message is the same every time it breaks down: 2013-12-19 20:51:22 MSK 19938 @ from [vxid:1/0 txid:0] [] WARNING: page 14833 of relation base/16436/3321003988 is uninitialized 2013-12-19 20:51:22 MSK 19938 @ from [vxid:1/0 txid:0] [] CONTEXT: xlog redo vacuum: rel 1663/16436/3321003988; blk 38538, lastBlockVacuumed 0 2013-12-19 20:51:22 MSK 19938 @ from [vxid:1/0 txid:0] [] PANIC: WAL contains references to invalid pages 2013-12-19 20:51:22 MSK 19938 @ from [vxid:1/0 txid:0] [] CONTEXT: xlog redo vacuum: rel 1663/16436/3321003988; blk 38538, lastBlockVacuumed 0 Also, the issue is reproducible on the production site of a big enterprise company and it is always a long lasting pain to negotiate and deploy even a little patch. So I would like to ask you to review this issue more carefully before I start the negotiation process, please. I saved the buggy cluster directory and the core dump in case if you will need it. -- Kind regards, Sergey Konoplev PostgreSQL Consultant and DBA http://www.linkedin.com/in/grayhemp +1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979 gray.ru@gmail.com
Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages
From
Andres Freund
Date:
On 2013-12-20 12:01:55 -0800, Sergey Konoplev wrote: > For me it looks like something completely different from the > Tomonari's bug. I does not have problems with restarting and it always > has something to do with vacuum. The error message is the same every > time it breaks down: Well, you didn't mention that it wasn't related to restarts before. You said that you just had re-cloned the standby, so assuming a restart isn't unreasonable. > Also, the issue is reproducible on the production site of a big > enterprise company and it is always a long lasting pain to negotiate > and deploy even a little patch. So I would like to ask you to review > this issue more carefully before I start the negotiation process, > please. I have to say, attitude-wise, I find that statement a bit problematic. You're working with the community here, not somebody you're paying. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages
From
Sergey Konoplev
Date:
On Fri, Dec 20, 2013 at 12:09 PM, Andres Freund <andres@2ndquadrant.com> wrote: > On 2013-12-20 12:01:55 -0800, Sergey Konoplev wrote: >> For me it looks like something completely different from the >> Tomonari's bug. I does not have problems with restarting and it always >> has something to do with vacuum. The error message is the same every >> time it breaks down: > > Well, you didn't mention that it wasn't related to restarts before. You > said that you just had re-cloned the standby, so assuming a restart > isn't unreasonable. I tried to reproduce the Tomonari's bug after Heikki pointed me on it, and nothing like described in those thread has happened. >> Also, the issue is reproducible on the production site of a big >> enterprise company and it is always a long lasting pain to negotiate >> and deploy even a little patch. So I would like to ask you to review >> this issue more carefully before I start the negotiation process, >> please. > > I have to say, attitude-wise, I find that statement a bit > problematic. You're working with the community here, not somebody you're > paying. Sure, I realize it. I am from the open-source world a little bit too. Just wanted to make sure that everything that could be done is done. Sorry if I said something wrong. -- Kind regards, Sergey Konoplev PostgreSQL Consultant and DBA http://www.linkedin.com/in/grayhemp +1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979 gray.ru@gmail.com
Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages
From
Sergey Konoplev
Date:
On Thu, Dec 19, 2013 at 3:04 PM, Andres Freund <andres@2ndquadrant.com> wrote: > On 2013-12-19 14:37:04 -0800, Sergey Konoplev wrote: >> It was suffering from this problem on 9.2.4, mostly last couple of >> weeks when I had to rebuild the replica almost every 3 days, and I >> hoped it would be fixed in 9.2.6, but it is not. > > This actually is a separate bug from the one fixed. Not that the CONTEXT > message isn't talking about "visible" but about "vacuum". > >> Yesterday I upgraded both master and replica servers to 9.2.6, >> performed SET vacuum_freeze_table_age = 0; VACUUM; and then rebuild >> the replica from scratch. > >> 2013-12-19 20:51:22 MSK 19938 @ from [vxid:1/0 txid:0] [] WARNING: >> page 14833 of relation base/16436/3321003988 is uninitialized >> 2013-12-19 20:51:22 MSK 19938 @ from [vxid:1/0 txid:0] [] CONTEXT: >> xlog redo vacuum: rel 1663/16436/3321003988; blk 38538, >> lastBlockVacuumed 0 > > Heikki, that's the redo routine for XLOG_BTREE_VACUUM, you probably know > that code best atm, and it might influence on how the new split and > deletion code works... I just wonder if it might be wal_level = hot_standby problem only. What if I switch it to just archive? Wont this problem gone? -- Kind regards, Sergey Konoplev PostgreSQL Consultant and DBA http://www.linkedin.com/in/grayhemp +1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979 gray.ru@gmail.com
Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages
From
Sergey Konoplev
Date:
On Thu, Dec 19, 2013 at 2:37 PM, Sergey Konoplev <gray.ru@gmail.com> wrote: > Yesterday I upgraded both master and replica servers to 9.2.6, > performed SET vacuum_freeze_table_age = 0; VACUUM; and then rebuild > the replica from scratch. > > 2013-12-19 20:51:22 MSK 19938 @ from [vxid:1/0 txid:0] [] WARNING: > page 14833 of relation base/16436/3321003988 is uninitialized > 2013-12-19 20:51:22 MSK 19938 @ from [vxid:1/0 txid:0] [] CONTEXT: > xlog redo vacuum: rel 1663/16436/3321003988; blk 38538, > lastBlockVacuumed 0 > 2013-12-19 20:51:22 MSK 19938 @ from [vxid:1/0 txid:0] [] PANIC: WAL > contains references to invalid pages > 2013-12-19 20:51:22 MSK 19938 @ from [vxid:1/0 txid:0] [] CONTEXT: > xlog redo vacuum: rel 1663/16436/3321003988; blk 38538, > lastBlockVacuumed 0 FYI Today we noticed that the PANIC happens on the same relation with the same block and page. This is an index, created on Dec 10 with its daily partition that was before I upgraded Postgres from 9.2.4 to 9.2.6. So probably it was a kind of aftermath of the pre-9.2.6 problems. I have done reindex of this particular index on production, but I still have the buggy copy of the cluster in case if someone would like to dig the problem deeper. Will keep you informed. -- Kind regards, Sergey Konoplev PostgreSQL Consultant and DBA http://www.linkedin.com/in/grayhemp +1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979 gray.ru@gmail.com
Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages
From
Andres Freund
Date:
Hi, On 2013-12-19 14:37:04 -0800, Sergey Konoplev wrote: > 2013-12-19 20:51:22 MSK 19938 @ from [vxid:1/0 txid:0] [] WARNING: > page 14833 of relation base/16436/3321003988 is uninitialized > 2013-12-19 20:51:22 MSK 19938 @ from [vxid:1/0 txid:0] [] CONTEXT: > xlog redo vacuum: rel 1663/16436/3321003988; blk 38538, > lastBlockVacuumed 0 > 2013-12-19 20:51:22 MSK 19938 @ from [vxid:1/0 txid:0] [] PANIC: WAL > contains references to invalid pages > 2013-12-19 20:51:22 MSK 19938 @ from [vxid:1/0 txid:0] [] CONTEXT: > xlog redo vacuum: rel 1663/16436/3321003988; blk 38538, > lastBlockVacuumed 0 > 2013-12-19 20:51:22 MSK 19935 @ from [vxid: txid:0] [] LOG: startup > process (PID 19938) was terminated by signal 6: Aborted > 2013-12-19 20:51:22 MSK 19935 @ from [vxid: txid:0] [] LOG: > terminating any other active server processes There just was another case of this reported on IRC by MatheusOl and for some reason in his case I noticed the pertinent details and it quickly clicked: * page 14833 is the one with the error * we're actually vacuuming page 38538 * lastBlockVacuumed is 0 In btree_xlog_vacuum() we scan all the pages between lastBlockVacuumed and the page vacuumed and acquire a cleanup lock on it. But there isn't any guarantee that the intermediate pages are valid, filled pages, afaics. ISTM we can just use RBM_ZERO_ON_ERROR instead of RBM_NORMAL. Alternatively we could devise a trivial routine that just does a BufTableLookup() and the required hangups around that. But that seems more invasive than necessary. Neither makes me really happy, but I don't see a nicer solution right now. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages
From
Heikki Linnakangas
Date:
On 01/06/2014 03:48 PM, Andres Freund wrote: > Hi, > > On 2013-12-19 14:37:04 -0800, Sergey Konoplev wrote: >> 2013-12-19 20:51:22 MSK 19938 @ from [vxid:1/0 txid:0] [] WARNING: >> page 14833 of relation base/16436/3321003988 is uninitialized >> 2013-12-19 20:51:22 MSK 19938 @ from [vxid:1/0 txid:0] [] CONTEXT: >> xlog redo vacuum: rel 1663/16436/3321003988; blk 38538, >> lastBlockVacuumed 0 >> 2013-12-19 20:51:22 MSK 19938 @ from [vxid:1/0 txid:0] [] PANIC: WAL >> contains references to invalid pages >> 2013-12-19 20:51:22 MSK 19938 @ from [vxid:1/0 txid:0] [] CONTEXT: >> xlog redo vacuum: rel 1663/16436/3321003988; blk 38538, >> lastBlockVacuumed 0 >> 2013-12-19 20:51:22 MSK 19935 @ from [vxid: txid:0] [] LOG: startup >> process (PID 19938) was terminated by signal 6: Aborted >> 2013-12-19 20:51:22 MSK 19935 @ from [vxid: txid:0] [] LOG: >> terminating any other active server processes > > There just was another case of this reported on IRC by MatheusOl and for > some reason in his case I noticed the pertinent details and it quickly > clicked: > * page 14833 is the one with the error > * we're actually vacuuming page 38538 > * lastBlockVacuumed is 0 > > In btree_xlog_vacuum() we scan all the pages between lastBlockVacuumed > and the page vacuumed and acquire a cleanup lock on it. But there isn't > any guarantee that the intermediate pages are valid, filled pages, > afaics. Hmm. So the problem arises if there's an uninitialized page in the middle of the b-tree relation for some reason. It's unusual for an uninitialized page to be left in the middle of the relation, but it's certainly possible, if e.g you crash just after extending the relation. In a heap, vacuum will initialize such pages and emit a WARNING like "page %u is uninitialized --- fixing", but we don't do that for b-tree. > ISTM we can just use RBM_ZERO_ON_ERROR instead of RBM_NORMAL. That'd be horrendously dangerous. It would silently zap any page with any error on it. But we could add a new ReadBufferMode that returns InvalidBuffer on error, without zeroing the page. - Heikki
Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages
From
Andres Freund
Date:
On 2014-01-06 16:35:42 +0200, Heikki Linnakangas wrote: > >In btree_xlog_vacuum() we scan all the pages between lastBlockVacuumed > >and the page vacuumed and acquire a cleanup lock on it. But there isn't > >any guarantee that the intermediate pages are valid, filled pages, > >afaics. > > Hmm. So the problem arises if there's an uninitialized page in the middle of > the b-tree relation for some reason. It's unusual for an uninitialized page > to be left in the middle of the relation, but it's certainly possible, if > e.g you crash just after extending the relation. In a heap, vacuum will > initialize such pages and emit a WARNING like "page %u is uninitialized --- > fixing", but we don't do that for b-tree. Yes. There's probably some resource exhaustion scenarios that can cause this as well. Triggering *_timeout or out-of-memory errors at least... > >ISTM we can just use RBM_ZERO_ON_ERROR instead of RBM_NORMAL. > > That'd be horrendously dangerous. It would silently zap any page with any > error on it. Well, the number of scenarios in which you can hit that is somewhat limited - it's not going to be ENOSPC since the file is guaranteed to be longer, and we don't create holes - so the scenario pretty much is disk reads not returning (pretty rare) and checksums failing. > But we could add a new ReadBufferMode that returns > InvalidBuffer on error, without zeroing the page. But yes, that's much better. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Heikki Linnakangas <hlinnakangas@vmware.com> writes: > On 01/06/2014 03:48 PM, Andres Freund wrote: >> There just was another case of this reported on IRC by MatheusOl and for >> some reason in his case I noticed the pertinent details and it quickly >> clicked: >> * page 14833 is the one with the error >> * we're actually vacuuming page 38538 >> * lastBlockVacuumed is 0 >> >> In btree_xlog_vacuum() we scan all the pages between lastBlockVacuumed >> and the page vacuumed and acquire a cleanup lock on it. But there isn't >> any guarantee that the intermediate pages are valid, filled pages, >> afaics. > Hmm. So the problem arises if there's an uninitialized page in the > middle of the b-tree relation for some reason. It's unusual for an > uninitialized page to be left in the middle of the relation, but it's > certainly possible, if e.g you crash just after extending the relation. Right. This diagnosis is incomplete in itself, because if the slave has a zeroed page there, shouldn't the master have one too? If the master does have a zeroed page there, how come vacuum didn't fail on the master? The answer is that btvacuumpage will skip over all-zero pages without doing anything more than noting them as free in FSM. When btree_xlog_vacuum rescans the relation, it will also skip over all-zero pages without doing anything --- but XLogReadBufferExtended logs such a page as invalid, and then bitches later when it doesn't see the page dropped or truncated away. >> ISTM we can just use RBM_ZERO_ON_ERROR instead of RBM_NORMAL. > That'd be horrendously dangerous. It would silently zap any page with > any error on it. But we could add a new ReadBufferMode that returns > InvalidBuffer on error, without zeroing the page. The important point is not just that it not damage the page, but that it not log it as invalid. I concur that the right fix requires a new operating mode for XLogReadBufferExtended, perhaps RBM_NORMAL_ZERO_OK. I think the spec for this should be that if the page doesn't exist or contains zeroes, we return InvalidBuffer without logging the page number as invalid. The doesn't-exist case is justified by the expectation that there will be a later RBM_NORMAL call for a larger page number, which will result in a suitable complaint if the page range isn't there. Will go fix this if there's not any objection to that plan. regards, tom lane
Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages
From
Andres Freund
Date:
On 2014-01-13 15:19:06 -0500, Tom Lane wrote: > >> ISTM we can just use RBM_ZERO_ON_ERROR instead of RBM_NORMAL. > > > That'd be horrendously dangerous. It would silently zap any page with > > any error on it. But we could add a new ReadBufferMode that returns > > InvalidBuffer on error, without zeroing the page. > > The important point is not just that it not damage the page, but that > it not log it as invalid. I concur that the right fix requires a > new operating mode for XLogReadBufferExtended, perhaps RBM_NORMAL_ZERO_OK. > I think the spec for this should be that if the page doesn't exist or > contains zeroes, we return InvalidBuffer without logging the page number > as invalid. The doesn't-exist case is justified by the expectation that > there will be a later RBM_NORMAL call for a larger page number, which will > result in a suitable complaint if the page range isn't there. That's a sensible way to go, yes. But I wonder if we wouldn't end up with less complicated code if we added a variant of ReadBuffer that only returns a buffer from cache if already present in s_b. I started to prototype something like RBM_NORMAL_ZERO_OK shortly after Heikki's message and it seems to quickly turn a bit ugly because the surrounding code isn't really ready to deal with not returning a buffer. And for the purpose of that redo routine, that'd actually be better. > Will go fix this if there's not any objection to that plan. Fine with me! Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund <andres@2ndquadrant.com> writes: > On 2014-01-13 15:19:06 -0500, Tom Lane wrote: >> The important point is not just that it not damage the page, but that >> it not log it as invalid. I concur that the right fix requires a >> new operating mode for XLogReadBufferExtended, perhaps RBM_NORMAL_ZERO_OK. >> I think the spec for this should be that if the page doesn't exist or >> contains zeroes, we return InvalidBuffer without logging the page number >> as invalid. The doesn't-exist case is justified by the expectation that >> there will be a later RBM_NORMAL call for a larger page number, which will >> result in a suitable complaint if the page range isn't there. > That's a sensible way to go, yes. But I wonder if we wouldn't end up > with less complicated code if we added a variant of ReadBuffer that only > returns a buffer from cache if already present in s_b. That sounds rather invasive to me, and more to the point it requires fooling with code that doesn't have anything to do with WAL replay. > I started to prototype something like RBM_NORMAL_ZERO_OK shortly after > Heikki's message and it seems to quickly turn a bit ugly because the > surrounding code isn't really ready to deal with not returning a > buffer. And for the purpose of that redo routine, that'd actually be > better. It looks easy enough to me to support the desired behavior in XLogReadBufferExtended. I wasn't planning to support it in ReadBuffer itself. (Probably the path of least resistance is to define it as meaning the same thing as RBM_NORMAL, so far as bufmgr.c is concerned.) regards, tom lane
Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages
From
Heikki Linnakangas
Date:
On 01/13/2014 10:26 PM, Andres Freund wrote: > On 2014-01-13 15:19:06 -0500, Tom Lane wrote: >> I concur that the right fix requires a >> new operating mode for XLogReadBufferExtended, perhaps RBM_NORMAL_ZERO_OK. >> I think the spec for this should be that if the page doesn't exist or >> contains zeroes, we return InvalidBuffer without logging the page number >> as invalid. The doesn't-exist case is justified by the expectation that >> there will be a later RBM_NORMAL call for a larger page number, which will >> result in a suitable complaint if the page range isn't there. I think it's more natural to return the empty page to the caller, rather than InvalidBuffer. > That's a sensible way to go, yes. But I wonder if we wouldn't end up > with less complicated code if we added a variant of ReadBuffer that only > returns a buffer from cache if already present in s_b. > I started to prototype something like RBM_NORMAL_ZERO_OK shortly after > Heikki's message and it seems to quickly turn a bit ugly because the > surrounding code isn't really ready to deal with not returning a > buffer. And for the purpose of that redo routine, that'd actually be > better. If it's trivial to add such a mode to buffer cache, then sure, let's do that by all means. But I seriously doubt it's really simple enough to be back-patchable. With RBM_NORMAL_ZERO_OK, AFAICS we're talking about a tiny patch to XLogReadBufferExtended. bufmgr.c doesn't need to do anything about the new mode, as it's XLogReadBuffer that does the the check for zero pages. Per attached patch (for demonstration purposes only, you also need to add the new mode to the header file and adjust comments). - Heikki
Attachment
Heikki Linnakangas <hlinnakangas@vmware.com> writes: > With RBM_NORMAL_ZERO_OK, AFAICS we're talking about a tiny patch to > XLogReadBufferExtended. bufmgr.c doesn't need to do anything about the > new mode, as it's XLogReadBuffer that does the the check for zero pages. > Per attached patch (for demonstration purposes only, you also need to > add the new mode to the header file and adjust comments). That looks like it will still log the page as invalid if it's past EOF, which I think we don't want. Did you want to finish this off and commit, or were you just passing what you had to me? regards, tom lane
Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages
From
Andres Freund
Date:
On 2014-01-13 22:40:32 +0200, Heikki Linnakangas wrote: > On 01/13/2014 10:26 PM, Andres Freund wrote: > >On 2014-01-13 15:19:06 -0500, Tom Lane wrote: > >>I concur that the right fix requires a > >>new operating mode for XLogReadBufferExtended, perhaps RBM_NORMAL_ZERO_OK. > >>I think the spec for this should be that if the page doesn't exist or > >>contains zeroes, we return InvalidBuffer without logging the page number > >>as invalid. The doesn't-exist case is justified by the expectation that > >>there will be a later RBM_NORMAL call for a larger page number, which will > >>result in a suitable complaint if the page range isn't there. > > I think it's more natural to return the empty page to the caller, rather > than InvalidBuffer. Hm. I don't see the advantage - and it makes it impossible to recognize that case at the caller level. > >That's a sensible way to go, yes. But I wonder if we wouldn't end up > >with less complicated code if we added a variant of ReadBuffer that only > >returns a buffer from cache if already present in s_b. > >I started to prototype something like RBM_NORMAL_ZERO_OK shortly after > >Heikki's message and it seems to quickly turn a bit ugly because the > >surrounding code isn't really ready to deal with not returning a > >buffer. And for the purpose of that redo routine, that'd actually be > >better. > > If it's trivial to add such a mode to buffer cache, then sure, let's do that > by all means. But I seriously doubt it's really simple enough to be > back-patchable. Isn't it (untested, written in the editor) just something like: /* only works for pages in shared buffers */ Assert(!SmgrIsTemp(smgr)); /* Make sure we will have room to remember the buffer pin */ ResourceOwnerEnlargeBuffers(CurrentResourceOwner); /* create a tag so we can lookup the buffer */ INIT_BUFFERTAG(newTag, smgr->smgr_rnode.node, forkNum, blockNum); newHash = BufTableHashCode(&newTag); newPartitionLock = BufMappingPartitionLock(newHash); /* now, check if the block is in the buffer pool already */ LWLockAcquire(newPartitionLock, LW_SHARED); buf_id = BufTableLookup(&newTag, newHash); if (buf_id >= 0) { /* * Found it. Now, pin the buffer so no one can steal it from the * buffer pool, and check to see if the correct data has been loaded * into the buffer. */ buf = &BufferDescriptors[buf_id]; valid = PinBuffer(buf, NULL); /* Can release the mapping lock as soon as we've pinned it */ LWLockRelease(newPartitionLock); if (!valid) { /* * We can only get here if (a) someone else is still reading in * the page, or (b) a previous read attempt failed. We have to * wait for any active read attempt to finish, and then set up our * own read attempt if the page is still not BM_VALID. * StartBufferIO does it all. */ if (StartBufferIO(buf, true)) { /* * If we get here, previous attempts to read the buffer must * have failed ... */ return InvalidBuffer; } } return buf; } /* * Didn't find it in the buffer pool, done. */ LWLockRelease(newPartitionLock); return InvalidBuffer; Now, that's not trivial, but also not too complicated. > With RBM_NORMAL_ZERO_OK, AFAICS we're talking about a tiny patch to > XLogReadBufferExtended. bufmgr.c doesn't need to do anything about the new > mode, as it's XLogReadBuffer that does the the check for zero pages. Per > attached patch (for demonstration purposes only, you also need to add the > new mode to the header file and adjust comments). I thought about that approach at first as well, but I am not so sure it's sufficient. Isn't it quite possible that we'd end up reading a page that was *partially* written during a crash and due to that has a corrupted checksum? There won't be any protection due to WAL replay/full page writes against that case here. Now, you could argue that that shouldn't be the case because we're only entering that codepath once STANDBY_SNAPSHOT_READY and you might be right... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund <andres@2ndquadrant.com> writes: > On 2014-01-13 22:40:32 +0200, Heikki Linnakangas wrote: >> I think it's more natural to return the empty page to the caller, rather >> than InvalidBuffer. > Hm. I don't see the advantage - and it makes it impossible to recognize > that case at the caller level. Huh? The caller can do PageIsNew if it needs to tell the difference. If we always return InvalidBuffer, it *can't* tell the difference. > I thought about that approach at first as well, but I am not so sure > it's sufficient. Isn't it quite possible that we'd end up reading a page > that was *partially* written during a crash and due to that has a > corrupted checksum? If we did, then either the master would also have failed (so we'd not be here), or there's a discrepancy between master and slave data, in which case we've got bigger problems than this. The addition of a bufmgr function to only return an already-in-cache page is a possible future performance enhancement, but I do not care for it as a bug fix that has to be back-patched as far as 9.0. regards, tom lane
Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages
From
Heikki Linnakangas
Date:
On 01/13/2014 11:02 PM, Andres Freund wrote: > On 2014-01-13 22:40:32 +0200, Heikki Linnakangas wrote: >> With RBM_NORMAL_ZERO_OK, AFAICS we're talking about a tiny patch to >> XLogReadBufferExtended. bufmgr.c doesn't need to do anything about the new >> mode, as it's XLogReadBuffer that does the the check for zero pages. Per >> attached patch (for demonstration purposes only, you also need to add the >> new mode to the header file and adjust comments). > > I thought about that approach at first as well, but I am not so sure > it's sufficient. Isn't it quite possible that we'd end up reading a page > that was *partially* written during a crash and due to that has a > corrupted checksum? There won't be any protection due to WAL replay/full > page writes against that case here. Good point. Normally, we expect the checksum to match on all pages that we read during WAL replay, because full page writes will initialize any page that is modified to an untorn state, before it's ever read. But we can't rely on that in the extra read that btree_xlog_vacuum() does. It's possible that there's a torn page on disk on block X, and we're vacuuming page X + 1. The page will be fixed by a later record in the WAL, before we reach consistency, but the ReadBuffer call from btree_xlog_vacuum() will throw an error. > Now, you could argue that that shouldn't be the case because we're only > entering that codepath once STANDBY_SNAPSHOT_READY and you might be > right... I don't think that saves us. standbyMode can be STANDBY_SNAPSHOT_READY, before we reach consistency. Adding a check for reachedConsistency, though, ought to fix it. - Heikki
Heikki Linnakangas <hlinnakangas@vmware.com> writes: > Good point. Normally, we expect the checksum to match on all pages that > we read during WAL replay, because full page writes will initialize any > page that is modified to an untorn state, before it's ever read. But we > can't rely on that in the extra read that btree_xlog_vacuum() does. But it's not an "extra" read. It's replicating a read that was done on the master in the btvacuumpage() scan. AFAICS the only way to fail on the slave and not the master is if the slave has inconsistent data, in which case you're at hazard of failing anyway. >> Now, you could argue that that shouldn't be the case because we're only >> entering that codepath once STANDBY_SNAPSHOT_READY and you might be >> right... > I don't think that saves us. standbyMode can be STANDBY_SNAPSHOT_READY, > before we reach consistency. Adding a check for reachedConsistency, > though, ought to fix it. Huh? Surely we're not letting queries in until we're consistent. regards, tom lane
Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages
From
Andres Freund
Date:
On 2014-01-13 16:25:48 -0500, Tom Lane wrote: > > I thought about that approach at first as well, but I am not so sure > > it's sufficient. Isn't it quite possible that we'd end up reading a page > > that was *partially* written during a crash and due to that has a > > corrupted checksum? > > If we did, then either the master would also have failed (so we'd not > be here), or there's a discrepancy between master and slave data, > in which case we've got bigger problems than this. If we crashed during replay and then restart replaying it might, at least without the STANDBY_SNAPSHOT_READY check. I think. XLOG_BTREE_VACUUM block 10, lastVacuumed 1 XLOG_BTREE_INSERT_SPLIT XLOG_BTREE_INSERT_LEAF block 5 XLOG_BTREE_VACUUM block 10, lastVacuumed 1 -- crash if we now restart replaying block 5 can be in just about any state. ... tinker ... Hm, the standbyMode check isn't sufficient, it's independent from other consistency checks (like minRecoveryPoint). But we can extend that check in the redo routine to check for full consistency, which should be fine since there won't be any relevant pins on buffers before that anyway. > The addition of a bufmgr function to only return an already-in-cache > page is a possible future performance enhancement, but I do not care > for it as a bug fix that has to be back-patched as far as 9.0. Yea, the only argument I see is that it actually might end up being simpler, that's why I brought it up. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages
From
Andres Freund
Date:
On 2014-01-13 16:36:41 -0500, Tom Lane wrote: > Heikki Linnakangas <hlinnakangas@vmware.com> writes: > > Good point. Normally, we expect the checksum to match on all pages that > > we read during WAL replay, because full page writes will initialize any > > page that is modified to an untorn state, before it's ever read. But we > > can't rely on that in the extra read that btree_xlog_vacuum() does. > > But it's not an "extra" read. It's replicating a read that was done > on the master in the btvacuumpage() scan. AFAICS the only way to fail > on the slave and not the master is if the slave has inconsistent data, > in which case you're at hazard of failing anyway. I tried to explain which scenario I see as dangerous nearby. > >> Now, you could argue that that shouldn't be the case because we're only > >> entering that codepath once STANDBY_SNAPSHOT_READY and you might be > >> right... > > > I don't think that saves us. standbyMode can be STANDBY_SNAPSHOT_READY, > > before we reach consistency. Adding a check for reachedConsistency, > > though, ought to fix it. > > Huh? Surely we're not letting queries in until we're consistent. We don't, but STANDBY_SNAPSHOT_READY isn't the only variable controlling that. It just determines whether we'd have the necessary visibility information. The full check is: */ if (standbyState == STANDBY_SNAPSHOT_READY && !LocalHotStandbyActive && reachedConsistency && IsUnderPostmaster) { ... xlogctl->SharedHotStandbyActive = true; ... SendPostmasterSignal(PMSIGNAL_BEGIN_HOT_STANDBY); } So we need to mimick that. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund <andres@2ndquadrant.com> writes: > On 2014-01-13 16:25:48 -0500, Tom Lane wrote: >> If we did, then either the master would also have failed (so we'd not >> be here), or there's a discrepancy between master and slave data, >> in which case we've got bigger problems than this. > If we crashed during replay and then restart replaying it might, at > least without the STANDBY_SNAPSHOT_READY check. I think. Perhaps, but the code in question isn't needed during crash recovery, only during hot standby; so modulo the possibility that the check for that is broken, I don't see a problem. regards, tom lane
Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages
From
Andres Freund
Date:
On 2014-01-13 23:29:59 +0200, Heikki Linnakangas wrote: > On 01/13/2014 11:02 PM, Andres Freund wrote: > >Now, you could argue that that shouldn't be the case because we're only > >entering that codepath once STANDBY_SNAPSHOT_READY and you might be > >right... > > I don't think that saves us. standbyMode can be STANDBY_SNAPSHOT_READY, > before we reach consistency. Adding a check for reachedConsistency, though, > ought to fix it. Maybe we should just put LocalHotStandbyActive into xlog_internal.h? That'd be enough for the startup process. Or alternatively add a check for EnableHotStandby into HotStandbyActive() so we don't take the spinlock if it's not enabled? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund <andres@2ndquadrant.com> writes: > Maybe we should just put LocalHotStandbyActive into xlog_internal.h? Or a macro, or something. nbtxlog.c shouldn't know quite that much about what allows hot standby ... regards, tom lane
Andres Freund <andres@2ndquadrant.com> writes: > On 2014-01-13 23:29:59 +0200, Heikki Linnakangas wrote: >> I don't think that saves us. standbyMode can be STANDBY_SNAPSHOT_READY, >> before we reach consistency. Adding a check for reachedConsistency, though, >> ought to fix it. > Maybe we should just put LocalHotStandbyActive into xlog_internal.h? > That'd be enough for the startup process. Or alternatively add a check > for EnableHotStandby into HotStandbyActive() so we don't take the > spinlock if it's not enabled? After a little bit of research, I propose that we add a function like this in xlog.c: /* * Like HotStandbyActive(), but to be used only in WAL replay code, * where we don't need to ask any other process what the state is. */ bool HotStandbyActiveStartup(void) { Assert(AmStartupProcess()); return LocalHotStandbyActive; } Any objections? Bikeshedding on the name? regards, tom lane
Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages
From
Andres Freund
Date:
On 2014-01-13 20:15:59 -0500, Tom Lane wrote: > Andres Freund <andres@2ndquadrant.com> writes: > > On 2014-01-13 23:29:59 +0200, Heikki Linnakangas wrote: > >> I don't think that saves us. standbyMode can be STANDBY_SNAPSHOT_READY, > >> before we reach consistency. Adding a check for reachedConsistency, though, > >> ought to fix it. > > > Maybe we should just put LocalHotStandbyActive into xlog_internal.h? > > That'd be enough for the startup process. Or alternatively add a check > > for EnableHotStandby into HotStandbyActive() so we don't take the > > spinlock if it's not enabled? > > After a little bit of research, I propose that we add a function like > this in xlog.c: > Any objections? Bikeshedding on the name? Fine with me. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
BTW, while I'm looking at this ... the writing side of the code seems a few bricks shy of a load too: /* * InHotStandby we need to scan right up to the end of the index for * correct locking, so we may need to write a WAL record for the final * block in the index if it was not vacuumed. It's possible that VACUUMing * has actually removed zeroed pages at the end of the index so we need to * take care to issue the record for last actual block and not for the * last block that was scanned. Ignore empty indexes. */ if (XLogStandbyInfoActive() && num_pages > 1 && vstate.lastBlockVacuumed < (num_pages - 1)) { Buffer buf; /* * We can't use _bt_getbuf() here because it always applies * _bt_checkpage(), which will barf on an all-zero page. We want to * recycle all-zero pages, not fail. Also, we want to use a * nondefault buffer access strategy. */ buf = ReadBufferExtended(rel, MAIN_FORKNUM, num_pages - 1, RBM_NORMAL, info->strategy); LockBufferForCleanup(buf); _bt_delitems_vacuum(rel, buf, NULL, 0, vstate.lastBlockVacuumed); _bt_relbuf(rel, buf); } If the last physical page of the index is all-zero, the preceding loop won't have any problem with that (nor should it); but this code sure will have a problem, because _bt_delitems_vacuum isn't prepared to cope with an all-zero page AFAICS. Nor am I following the logic of the initial comment. If we know that pages above N are all-zero, why are we worried about taking locks on them? There shouldn't be any scans reaching such pages, and any that did would error out anyhow in _bt_getbuf(). ISTM the right thing here is for the btvacuumpage loop to remember the last ordinary, valid index page it saw, and point to that one in this added WAL entry. regards, tom lane
Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages
From
Heikki Linnakangas
Date:
On 01/14/2014 03:59 AM, Tom Lane wrote: > BTW, while I'm looking at this ... the writing side of the code seems a > few bricks shy of a load too: > > /* > * InHotStandby we need to scan right up to the end of the index for > * correct locking, so we may need to write a WAL record for the final > * block in the index if it was not vacuumed. It's possible that VACUUMing > * has actually removed zeroed pages at the end of the index so we need to > * take care to issue the record for last actual block and not for the > * last block that was scanned. Ignore empty indexes. > */ > if (XLogStandbyInfoActive() && > num_pages > 1 && vstate.lastBlockVacuumed < (num_pages - 1)) > { > Buffer buf; > > /* > * We can't use _bt_getbuf() here because it always applies > * _bt_checkpage(), which will barf on an all-zero page. We want to > * recycle all-zero pages, not fail. Also, we want to use a > * nondefault buffer access strategy. > */ > buf = ReadBufferExtended(rel, MAIN_FORKNUM, num_pages - 1, RBM_NORMAL, > info->strategy); > LockBufferForCleanup(buf); > _bt_delitems_vacuum(rel, buf, NULL, 0, vstate.lastBlockVacuumed); > _bt_relbuf(rel, buf); > } > > If the last physical page of the index is all-zero, the preceding loop > won't have any problem with that (nor should it); but this code sure will > have a problem, because _bt_delitems_vacuum isn't prepared to cope with an > all-zero page AFAICS. Yep, that's broken. > Nor am I following the logic of the initial comment. If we know that > pages above N are all-zero, why are we worried about taking locks on them? > There shouldn't be any scans reaching such pages, and any that did would > error out anyhow in _bt_getbuf(). Also, btree relations are never truncated, so the comment is wrong when it claims that VACUUM might've removed zero pages at the end of the index. > ISTM the right thing here is for the btvacuumpage loop to remember the > last ordinary, valid index page it saw, and point to that one in this > added WAL entry. Agreed. - Heikki
I've pushed patches fixing the issues discussed in this thread. While I was working on it, it occurred to me that the whole darn mess might be dead code as of the removal of SnapshotNow. The reason we're doing this at all is the fear that, while some indexscan is in flight from an index entry to the relevant heap tuple, VACUUM could remove said tuple and then somebody else could insert a new, unrelated tuple at the now-vacant TID. With ordinary MVCC snapshot checking, this would not be a problem because the new tuple could not possibly pass the snapshot check, so it wouldn't get returned. However, with non-MVCC snapshots, it's possible we could return a heap tuple that doesn't match the index qual. The fear of this happening for SnapshotNow catalog accesses was what drove the original locking design, and that is also why we don't bother with similar locking for other index types --- only btree indexes are used for system catalog fetches. So now that we use MVCC snapshots for system catalog fetches, it seems like maybe we don't need to sweat about cleanup-locking index pages we're not going to remove tuples from; indeed maybe we don't need to acquire super-exclusive btree page locks at all. I'm not totally sure about this; we still have SnapshotSelf for instance. But maybe the remaining non-MVCC snapshot types are never used in ways where this type of aliasing would be a problem, or maybe we could cause that to become true with a bit more work. I'm not planning to pursue the question myself at the moment, but I thought I should mention it in case anybody else wants to. regards, tom lane
Hi all, I am also facing this issue. I set up streaming replication for a PG 9.2.6 database in both master and slave. I have used pg_basebackup to dump master database to slave. It took two hours to complete pg_basebackup. During this time a lot of wal files were archived. As expected, when slave database started up it went through archived wal files for 5 hours to catch the master. During recovery from archived wal files appeared some warnings like this: ... Jun 6 14:55:50 VNWID1-4 postgres[17024]: [177-1] [2014-06-06 14:55:50 CEST] 17024@ LOG: restored log file "000000010000066C000000E4" from archive Jun 6 14:55:51 VNWID1-4 postgres[17032]: [1-1] [2014-06-06 14:55:51 CEST] 17032@ WARNING: xlog min recovery request 67B/B36ABD78 is past current point 66C/E4FFFFE8 Jun 6 14:55:51 VNWID1-4 postgres[17032]: [1-2] [2014-06-06 14:55:51 CEST] 17032@ CONTEXT: writing block 0 of relation pg_tblspc/16756/PG_9.2_201204301/16757/1116517382_vm Jun 6 14:55:51 VNWID1-4 postgres[17024]: [178-1] [2014-06-06 14:55:51 CEST] 17024@ LOG: restored log file "000000010000066C000000E5" from archive ... It seems they are harmless and can be safely ignored, but I prefer to state this in the post in case be relevant for any reason. After 5 hours, slave database catched master database and streaming replication started: ... Jun 6 19:59:10 VNWID1-4 postgres[17024]: [15763-1] [2014-06-06 19:59:10 CEST] 17024@ LOG: restored log file "00000001000006AA00000003" from archive Jun 6 19:59:11 VNWID1-4 postgres[17024]: [15764-1] [2014-06-06 19:59:11 CEST] 17024@ LOG: unexpected pageaddr 6A7/CC000000 in log file 1706, segment 4, offset 0 Jun 6 19:59:11 VNWID1-4 postgres[21691]: [2-1] [2014-06-06 19:59:11 CEST] 21691@ LOG: streaming replication successfully connected to primary It seems that all is ok until now. And no other message was logged until the crash. After some hours of replication being working properly, Postgres stopped and log contains this: Jun 7 17:52:45 VNWID1-4 postgres[17024]: [15765-1] [2014-06-07 17:52:45 CEST] 17024@ WARNING: page 916 of relation pg_tblspc/16756/PG_9.2_201204301/16757/402007830 is uninitialized Jun 7 17:52:45 VNWID1-4 postgres[17024]: [15765-2] [2014-06-07 17:52:45 CEST] 17024@ CONTEXT: xlog redo vacuum: rel 16756/16757/402007830; blk 917, lastBlockVacuumed 915 Jun 7 17:52:45 VNWID1-4 postgres[17024]: [15766-1] [2014-06-07 17:52:45 CEST] 17024@ PANIC: WAL contains references to invalid pages Jun 7 17:52:45 VNWID1-4 postgres[17024]: [15766-2] [2014-06-07 17:52:45 CEST] 17024@ CONTEXT: xlog redo vacuum: rel 16756/16757/402007830; blk 917, lastBlockVacuumed 915 Jun 7 17:52:45 VNWID1-4 postgres[17021]: [2-1] [2014-06-07 17:52:45 CEST] 17021@ LOG: startup process (PID 17024) was terminated by signal 6: Aborted Jun 7 17:52:45 VNWID1-4 postgres[17021]: [3-1] [2014-06-07 17:52:45 CEST] 17021@ LOG: terminating any other active server processes I think that this issue is exactly the same that is being discussed in this thread and I have seen that a patch was released for this bug. Could anyone tell me how to get / apply this patch? Thanks in advace. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Hot-standby-9-2-6-9-2-6-PANIC-WAL-contains-references-to-invalid-pages-tp5784147p5806482.html Sent from the PostgreSQL - bugs mailing list archive at Nabble.com.
Re: Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages
From
Heikki Linnakangas
Date:
On 06/09/2014 02:27 PM, jlrando wrote: > Hi all, > > I am also facing this issue. I set up streaming replication for a PG 9.2.6 > database in both master and slave. I have used pg_basebackup to dump master > database to slave. It took two hours to complete pg_basebackup. During this > time a lot of wal files were archived. As expected, when slave database > started up it went through archived wal files for 5 hours to catch the > master. During recovery from archived wal files appeared some warnings like > this: > > ... > Jun 6 14:55:50 VNWID1-4 postgres[17024]: [177-1] [2014-06-06 14:55:50 CEST] > 17024@ LOG: restored log file "000000010000066C000000E4" from archive > Jun 6 14:55:51 VNWID1-4 postgres[17032]: [1-1] [2014-06-06 14:55:51 CEST] > 17032@ WARNING: xlog min recovery request 67B/B36ABD78 is past current > point 66C/E4FFFFE8 > Jun 6 14:55:51 VNWID1-4 postgres[17032]: [1-2] [2014-06-06 14:55:51 CEST] > 17032@ CONTEXT: writing block 0 of relation > pg_tblspc/16756/PG_9.2_201204301/16757/1116517382_vm > Jun 6 14:55:51 VNWID1-4 postgres[17024]: [178-1] [2014-06-06 14:55:51 CEST] > 17024@ LOG: restored log file "000000010000066C000000E5" from archive > ... > > It seems they are harmless and can be safely ignored, but I prefer to state > this in the post in case be relevant for any reason. Hmm. That warning shouldn't happen, but looking at the visibility map code, I see how it could: During the WAL replay, a heap update is replayed, which clears a bit in the visibility map. That dirties the VM page, but we intentionally don't check or update the LSN on the VM page. The VM buffer is now dirty in the buffer pool, but if it was backed up later during the backup, it has a greater LSN than what we're currently replaying. When that buffer gets written out, you get the warning. Yeah, I think that's harmless. But we should look into fixing that... > It seems that all is ok until now. And no other message was logged until the > crash. After some hours of replication being working properly, Postgres > stopped and log contains this: > > Jun 7 17:52:45 VNWID1-4 postgres[17024]: [15765-1] [2014-06-07 17:52:45 > CEST] 17024@ WARNING: page 916 of relation > pg_tblspc/16756/PG_9.2_201204301/16757/402007830 is uninitialized > Jun 7 17:52:45 VNWID1-4 postgres[17024]: [15765-2] [2014-06-07 17:52:45 > CEST] 17024@ CONTEXT: xlog redo vacuum: rel 16756/16757/402007830; blk > 917, lastBlockVacuumed 915 > Jun 7 17:52:45 VNWID1-4 postgres[17024]: [15766-1] [2014-06-07 17:52:45 > CEST] 17024@ PANIC: WAL contains references to invalid pages > Jun 7 17:52:45 VNWID1-4 postgres[17024]: [15766-2] [2014-06-07 17:52:45 > CEST] 17024@ CONTEXT: xlog redo vacuum: rel 16756/16757/402007830; blk > 917, lastBlockVacuumed 915 > Jun 7 17:52:45 VNWID1-4 postgres[17021]: [2-1] [2014-06-07 17:52:45 CEST] > 17021@ LOG: startup process (PID 17024) was terminated by signal 6: Aborted > Jun 7 17:52:45 VNWID1-4 postgres[17021]: [3-1] [2014-06-07 17:52:45 CEST] > 17021@ LOG: terminating any other active server processes > > I think that this issue is exactly the same that is being discussed in this > thread and I have seen that a patch was released for this bug. Yeah, that looks like the same bug. > Could anyone tell me how to get / apply this patch? It's included in 9.2.8. Just update the binaries. - Heikki
Hi Heikki, Thanks a lot for your answer. Perform an upgrade of this DB will require a lot of time because the only certified DB version for this application is 9.2.6. A very complex flow will be triggered to ask for a new PG release certification. So my question is, do you know if there is any workaround for 9.2.6 we can use in the meanwhile? Thanks in advance and best regards! Jose. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Hot-standby-9-2-6-9-2-6-PANIC-WAL-contains-references-to-invalid-pages-tp5784147p5806506.html Sent from the PostgreSQL - bugs mailing list archive at Nabble.com.
Re: Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages
From
Matheus de Oliveira
Date:
On Mon, Jun 9, 2014 at 11:10 AM, jlrando <jose.luis.rando.calvo@ericsson.co= m > wrote: > Perform an upgrade of this DB will require a > lot of time because the only certified DB version for this application is > 9.2.6. > This simple makes no sense for PG release versioning schema/policy, see [1]= . Following schema X.Y.Z of a PG release, you may "certifies" an application to work on version X.Y only, the Z number means the release (bug-fix/patch basically), and you should **always** be using the most higher Z for your X.Y version (in the case of 9.2, you should be using 9.2.8 already, no excuses). In summary, you can always upgrade from Z to Z+1 without compatibility issues. Of course, read the release notes to check if there is some maintenance needed after the upgrade. [1] http://www.postgresql.org/support/versioning/ --=20 Matheus de Oliveira Analista de Banco de Dados Dextra Sistemas - MPS.Br n=C3=ADvel F! www.dextra.com.br/postgres
Re: Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages
From
Michael Paquier
Date:
On Tue, Jun 10, 2014 at 3:58 AM, Matheus de Oliveira <matioli.matheus@gmail.com> wrote: > > On Mon, Jun 9, 2014 at 11:10 AM, jlrando > <jose.luis.rando.calvo@ericsson.com> wrote: >> >> Perform an upgrade of this DB will require a >> lot of time because the only certified DB version for this application is >> 9.2.6. > > > This simple makes no sense for PG release versioning schema/policy, see [1]. > > Following schema X.Y.Z of a PG release, you may "certifies" an application > to work on version X.Y only, the Z number means the release (bug-fix/patch > basically), and you should **always** be using the most higher Z for your > X.Y version (in the case of 9.2, you should be using 9.2.8 already, no > excuses). In summary, you can always upgrade from Z to Z+1 without > compatibility issues. Of course, read the release notes to check if there is > some maintenance needed after the upgrade. > > [1] http://www.postgresql.org/support/versioning/ The same major versions (X.Y releases) are guaranteed to be disk-compatible and new minor releases only contain bug fixes. So updating is only a matter of installing the new binaries and then restart the server. By updating always to the latest available version you protect as well your application from the latest bugs found. -- Michael