Thread: page 1 of relation global/11787 was uninitialized
Just today one of my systems experienced a kernel panic, and halted abruptly. Running Linux 3.1.9, PostgreSQL 9.0.4 (Debian 9.0.4-1+b1, to be precise). The system was moderately active, i.e. about one commit per minute. It is not a large problem if the last few commits would be gone. Now, in restarting the system, I get this: ------------------------- LOG: database system was interrupted while in recovery at 2013-04-09 18:07:45 CEST HINT: This probably means that some data is corrupted and you will have to use the last backup for recovery. LOG: database system was not properly shut down; automatic recovery in progress LOG: consistent recovery state reached at D/B0BCE118 LOG: redo starts at D/B0BAB734 LOG: invalid record length at D/B0BAE010 LOG: redo done at D/B0BADFC4 LOG: last completed transaction was at log time 2013-04-09 14:50:29.743986+02 WARNING: page 1 of relation global/11787 was uninitialized PANIC: WAL contains references to invalid pages LOG: startup process (PID 30827) was terminated by signal 6: Aborted LOG: aborting startup due to startup process failure ------------------------- Looking at global/11787, doesn't reveal any obvious corruption. The server was running with:synchronous_commit = offfull_page_writes = off to maximise performance, since the data is not 100% critical, but I would like to recover the data up to some point in the past (an hour ago is fine). Any suggestions? Restarting PostgreSQL several times, results in identical messages. -- Stephen.
On 04/09/2013 09:21 AM, Stephen R. van den Berg wrote: > ------------------------- > > Looking at global/11787, doesn't reveal any obvious corruption. > The server was running with: > synchronous_commit = off > full_page_writes = off full_page_writes = off is the problem. From the docs: Turning this parameter off speeds normal operation, but might lead to either unrecoverable data corruption, or silent data corruption, after a system failure. The risks are similar to turning off fsync, though smaller, and it should be turned off only based on the same circumstances recommended for that parameter. http://www.postgresql.org/docs/9.0/static/runtime-config-wal.html#GUC-FULL-PAGE-WRITES Sincerely, Joshua D. Drake -- Command Prompt, Inc. - http://www.commandprompt.com/ PostgreSQL Support, Training, Professional Services and Development High Availability, Oracle Conversion, Postgres-XC @cmdpromptinc - 509-416-6579
On 2013-04-09 18:21:20 +0200, Stephen R. van den Berg wrote: > Just today one of my systems experienced a kernel panic, and halted abruptly. > Running Linux 3.1.9, PostgreSQL 9.0.4 (Debian 9.0.4-1+b1, to be precise). Thats an absolutely outdated version of 9.0. You shouldn't be running this in production. On 2013-04-09 09:27:52 -0700, Joshua D. Drake wrote: > > On 04/09/2013 09:21 AM, Stephen R. van den Berg wrote: > > >------------------------- > > > >Looking at global/11787, doesn't reveal any obvious corruption. > >The server was running with: > > synchronous_commit = off > > full_page_writes = off > > full_page_writes = off is the problem. Yea, and it can cause very hard to recover corruption, its not that you only may loose some of the last transactions, in contrast to synchronous_commit=off where you can loose the last transactions but which never should cause corruption. > From the docs: > > Turning this parameter off speeds normal operation, but might lead to either > unrecoverable data corruption, or silent data corruption, after a system > failure. The risks are similar to turning off fsync, though smaller, and it > should be turned off only based on the same circumstances recommended for > that parameter. > > http://www.postgresql.org/docs/9.0/static/runtime-config-wal.html#GUC-FULL-PAGE-WRITES That was my first thought as well, but whilst it sure can cause corruption, I can't immediately see how it should be responsible for this error. That seems to indicate another problem. Stephen, could you check how big global/11787 exactly is? Too bad we don't know what that relfilenode corresponds to and we can't easily find out what it maps to. Afaik we don't have any debugging utility to dump the pg_filenode.map contents? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
"Stephen R. van den Berg" <srb@cuci.nl> writes: > Just today one of my systems experienced a kernel panic, and halted abruptly. > Running Linux 3.1.9, PostgreSQL 9.0.4 (Debian 9.0.4-1+b1, to be precise). It's conceivable that updating to something more current than 9.0.4 would get you out of this --- we've fixed quite a number of WAL replay bugs in the last two years. regards, tom lane
Tom Lane wrote: >"Stephen R. van den Berg" <srb@cuci.nl> writes: >> Just today one of my systems experienced a kernel panic, and halted abruptly. >> Running Linux 3.1.9, PostgreSQL 9.0.4 (Debian 9.0.4-1+b1, to be precise). >It's conceivable that updating to something more current than 9.0.4 >would get you out of this --- we've fixed quite a number of WAL replay >bugs in the last two years. I see that there is a 9.0.13, but that would be a source upgrade, since Debian doesn't provide later than 9.0.4 AFAICS. I suppose upgrading to 9.1.9 is not recommended in this state, or is it? -- Stephen. "Reality is merely an illusion,albeit a very persistent one." -- Albert Einstein
Joshua D. Drake wrote: >full_page_writes = off is the problem. >Turning this parameter off speeds normal operation, but might lead to >either unrecoverable data corruption, or silent data corruption, >after a system failure. The risks are similar to turning off fsync, >though smaller, and it should be turned off only based on the same >circumstances recommended for that parameter. Well, silent data corruption would have been "ok" (to a certain extent). Unrecoverable data corruption would be ok too, if it would only imply losing some of the more recent parts of the data (at least for this database). -- Stephen. "Reality is merely an illusion,albeit a very persistent one." -- Albert Einstein
On 2013-04-09 19:18:57 +0200, Stephen R. van den Berg wrote: > Tom Lane wrote: > >"Stephen R. van den Berg" <srb@cuci.nl> writes: > >> Just today one of my systems experienced a kernel panic, and halted abruptly. > >> Running Linux 3.1.9, PostgreSQL 9.0.4 (Debian 9.0.4-1+b1, to be precise). > > >It's conceivable that updating to something more current than 9.0.4 > >would get you out of this --- we've fixed quite a number of WAL replay > >bugs in the last two years. > > I see that there is a 9.0.13, but that would be a source upgrade, since Debian > doesn't provide later than 9.0.4 AFAICS. Afaik debian has never shipped 9.0 in any stable release at all. You can use the packages provided by the postgres community though, they are compatible: http://wiki.postgresql.org/wiki/Apt > I suppose upgrading to 9.1.9 is not recommended in this state, or is > it? You can at least try, don't think it will make more problems than you already have. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund wrote: >On 2013-04-09 18:21:20 +0200, Stephen R. van den Berg wrote: >Thats an absolutely outdated version of 9.0. You shouldn't be running >this in production. Yes, well, it's one of those things. It got installed with Debian and the automatic upgrade didn't upgrade it further, they switched to 9.1 without an automatic upgrade path. >Yea, and it can cause very hard to recover corruption, its not that you >only may loose some of the last transactions, in contrast to >synchronous_commit=off where you can loose the last transactions but >which never should cause corruption. Ok, I'll avoid this option in the future, and just run with synchronous_commit=off for non-critical databases. >Stephen, could you check how big global/11787 exactly is? Too bad we >don't know what that relfilenode corresponds to and we can't easily find >out what it maps to. This is the filelist of the whole global directory (there is only one active database on this server), the 11787 file is 16KB exactly: -rw------- 1 postgres postgres 8192 May 30 2011 11601 -rw------- 1 postgres postgres 24576 May 30 2011 11601_fsm -rw------- 1 postgres postgres 8192 May 30 2011 11601_vm -rw------- 1 postgres postgres 16384 May 30 2011 11603 -rw------- 1 postgres postgres 16384 May 30 2011 11604 -rw------- 1 postgres postgres 0 May 30 2011 11770 -rw------- 1 postgres postgres 0 May 30 2011 11772 -rw------- 1 postgres postgres 8192 May 30 2011 11774 -rw------- 1 postgres postgres 8192 May 30 2011 11775 -rw------- 1 postgres postgres 8192 May 30 2011 11776 -rw------- 1 postgres postgres 24576 May 30 2011 11776_fsm -rw------- 1 postgres postgres 8192 May 30 2011 11776_vm -rw------- 1 postgres postgres 16384 May 30 2011 11778 -rw------- 1 postgres postgres 16384 May 30 2011 11779 -rw------- 1 postgres postgres 8192 May 30 2011 11780 -rw------- 1 postgres postgres 24576 May 30 2011 11780_fsm -rw------- 1 postgres postgres 8192 May 30 2011 11780_vm -rw------- 1 postgres postgres 16384 May 30 2011 11782 -rw------- 1 postgres postgres 8192 May 30 2011 11783 -rw------- 1 postgres postgres 16384 May 30 2011 11785 -rw------- 1 postgres postgres 16384 May 30 2011 11786 -rw------- 1 postgres postgres 16384 Apr 9 18:08 11787 -rw------- 1 postgres postgres 24576 Apr 9 13:09 11787_fsm -rw------- 1 postgres postgres 8192 Apr 9 14:58 11787_vm -rw------- 1 postgres postgres 40960 Apr 9 14:51 11789 -rw------- 1 postgres postgres 24576 Nov 25 2011 11789_fsm -rw------- 1 postgres postgres 16384 Apr 9 14:51 11790 -rw------- 1 postgres postgres 8192 May 30 2011 11791 -rw------- 1 postgres postgres 24576 May 30 2011 11791_fsm -rw------- 1 postgres postgres 8192 May 30 2011 11791_vm -rw------- 1 postgres postgres 0 May 30 2011 11793 -rw------- 1 postgres postgres 8192 May 30 2011 11795 -rw------- 1 postgres postgres 16384 May 30 2011 11796 -rw------- 1 postgres postgres 8192 May 30 2011 11867 -rw------- 1 postgres postgres 0 May 30 2011 11869 -rw------- 1 postgres postgres 8192 May 30 2011 11871 -rw------- 1 postgres postgres 16384 May 30 2011 11872 -rw------- 1 postgres postgres 16384 May 30 2011 11873 -- Stephen.
"Stephen R. van den Berg" <srb@cuci.nl> writes: > Tom Lane wrote: >> It's conceivable that updating to something more current than 9.0.4 >> would get you out of this --- we've fixed quite a number of WAL replay >> bugs in the last two years. > I see that there is a 9.0.13, but that would be a source upgrade, since Debian > doesn't provide later than 9.0.4 AFAICS. That's a bit hard to believe, especially given that 9.0.13 fixes a rather major security bug. Any packager that isn't offering something more current than 9.0.4 is *seriously* misfeasant. > I suppose upgrading to 9.1.9 is not recommended in this state, or is it? You can't -- only a 9.0.x server will even try to start up against this DB. regards, tom lane
On 2013-04-09 13:34:41 -0400, Tom Lane wrote: > "Stephen R. van den Berg" <srb@cuci.nl> writes: > > Tom Lane wrote: > >> It's conceivable that updating to something more current than 9.0.4 > >> would get you out of this --- we've fixed quite a number of WAL replay > >> bugs in the last two years. > > > I see that there is a 9.0.13, but that would be a source upgrade, since Debian > > doesn't provide later than 9.0.4 AFAICS. > > That's a bit hard to believe, especially given that 9.0.13 fixes a > rather major security bug. Any packager that isn't offering something > more current than 9.0.4 is *seriously* misfeasant. Debian simply has never shipped postgres 9.0 at all outside of its unstable distribution. Afaik it didn't even enter testing once.. Its pretty normal to drop support for packages from there imo. Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 2013-04-09 19:30:38 +0200, Andres Freund wrote: > On 2013-04-09 19:18:57 +0200, Stephen R. van den Berg wrote: > > Tom Lane wrote: > > >"Stephen R. van den Berg" <srb@cuci.nl> writes: > > >> Just today one of my systems experienced a kernel panic, and halted abruptly. > > >> Running Linux 3.1.9, PostgreSQL 9.0.4 (Debian 9.0.4-1+b1, to be precise). > > > > >It's conceivable that updating to something more current than 9.0.4 > > >would get you out of this --- we've fixed quite a number of WAL replay > > >bugs in the last two years. > > > > I see that there is a 9.0.13, but that would be a source upgrade, since Debian > > doesn't provide later than 9.0.4 AFAICS. > > Afaik debian has never shipped 9.0 in any stable release at all. You can > use the packages provided by the postgres community though, they are > compatible: > http://wiki.postgresql.org/wiki/Apt > > > I suppose upgrading to 9.1.9 is not recommended in this state, or is > > it? > > You can at least try, don't think it will make more problems than you > already have. Uh, I didn't read careful enough. I thought you were talking about upgrading to the newest 9.0 release which *does* make sense. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Tom Lane wrote: >"Stephen R. van den Berg" <srb@cuci.nl> writes: >> Tom Lane wrote: >>> It's conceivable that updating to something more current than 9.0.4 >>> would get you out of this --- we've fixed quite a number of WAL replay >>> bugs in the last two years. >> I see that there is a 9.0.13, but that would be a source upgrade, since Debian >> doesn't provide later than 9.0.4 AFAICS. >That's a bit hard to believe, especially given that 9.0.13 fixes a >rather major security bug. Any packager that isn't offering something >more current than 9.0.4 is *seriously* misfeasant. Well, it might be such that this was only in unstable at some point in time, and by the time they wanted to include it in testing, they skipped 9.0 and went for 9.1 instead. >> I suppose upgrading to 9.1.9 is not recommended in this state, or is it? >You can't -- only a 9.0.x server will even try to start up against this >DB. Well, there is something like pg_upgradecluster, which probably will attempt a binary conversion of the database, but I guess that in this state that is hazardous at best. Besides, the standard tool doesn't seem to like tablespaces (and I'm using one). I'll look into trying a 9.0.13 first. -- Stephen. "Reality is merely an illusion,albeit a very persistent one." -- Albert Einstein
Andres Freund <andres@2ndquadrant.com> writes: > Afaik we don't have any debugging utility to dump the pg_filenode.map > contents? Hardly need one ... od -t d4 $PGDATA/global/pg_filenode.map is readable enough, though it does leave you still having to map the numeric OIDs back to names. The OIDs will be stable though. regards, tom lane
Success (Re: page 1 of relation global/11787 was uninitialized)
From
"Stephen R. van den Berg"
Date:
Thanks, all of you, for all good advice and suggestions. Andres Freund wrote: >Afaik debian has never shipped 9.0 in any stable release at all. You can >use the packages provided by the postgres community though, they are >compatible: >http://wiki.postgresql.org/wiki/Apt I used this to upgrade to 9.0.13, which indeed is fully compatible with Debian. It started up like a charm. I can upgrade it to 9.1.9 now (while I'm at it). For reference, the startup log now looked like this: LOG: database system was interrupted while in recovery at 2013-04-09 19:45:49 CESTFATAL: the database system is startingupHINT: This probably means that some data is corrupted and you will have to use the last backup for recovery.LOG: incomplete startup packetLOG: database system was not properly shut down; automatic recovery in progressLOG: redo starts at D/B0BAB734LOG: record with zero length at D/B0D3D71CLOG: redo done at D/B0D3D6E0LOG: lastcompleted transaction was at log time 2013-04-09 15:02:24.848947+02LOG: checkpoint starting: end-of-recovery immediateLOG: checkpoint complete: wrote 423 buffers (11.8%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=0.003s, sync=11.419 s, total=11.601 s So, Tom, thanks for fixing that WAL recovery code :-). -- Stephen.
On Apr9, 2013, at 19:56 , "Stephen R. van den Berg" <srb@cuci.nl> wrote: > Andres Freund wrote: >> Afaik debian has never shipped 9.0 in any stable release at all. You can >> use the packages provided by the postgres community though, they are >> compatible: >> http://wiki.postgresql.org/wiki/Apt > > I used this to upgrade to 9.0.13, which indeed is fully compatible with > Debian. It started up like a charm. I can upgrade it to 9.1.9 now > (while I'm at it). Since you've been running with full_page_writes=off, I suggest you update to 9.1 by dumping and reloading your data (i.e. pg_dumpall) instead of via binary upgrades (i.e. pg_upgrade). That way, you can be sure that your data is consistent (from postgres' POV anyway) after the upgrade. pg_upgrade OTOH will only notice inconsistencies in the system tables, since the data files of user tables and indices are simply copied unchanged. best regards, Florian Pflug
Re: Success (Re: page 1 of relation global/11787 was uninitialized)
From
"Stephen R. van den Berg"
Date:
Florian Pflug wrote: >> I used this to upgrade to 9.0.13, which indeed is fully compatible with >> Debian. It started up like a charm. I can upgrade it to 9.1.9 now >> (while I'm at it). >Since you've been running with full_page_writes=off, I suggest you update >to 9.1 by dumping and reloading your data (i.e. pg_dumpall) instead of >via binary upgrades (i.e. pg_upgrade). I need to do that anyway, since the pg_upgradecluster script of Debian (or is it a native PostgreSQL script?) does not support tablespaces. >That way, you can be sure that your data is consistent (from postgres' >POV anyway) after the upgrade. pg_upgrade OTOH will only notice >inconsistencies in the system tables, since the data files of user tables >and indices are simply copied unchanged. Good point. Thanks. -- Stephen.
Tom Lane wrote: > Andres Freund <andres@2ndquadrant.com> writes: >> Afaik we don't have any debugging utility to dump the pg_filenode.map >> contents? > > Hardly need one ... od -t d4 $PGDATA/global/pg_filenode.map > is readable enough, though it does leave you still having to > map the numeric OIDs back to names. The OIDs will be stable though. Shouldn't that be "od -t u4 $PGDATA/global/pg_filenode.map"? Yours, Laurenz Albe