Re: Bug (#3484) - Invalid page header again - Mailing list pgsql-bugs

From Zdenek Kotala
Subject Re: Bug (#3484) - Invalid page header again
Date
Msg-id 4767F433.9010700@sun.com
Whole thread Raw
In response to Re: Bug (#3484) - Invalid page header again  (Zdenek Kotala <Zdenek.Kotala@Sun.COM>)
Responses Re: Bug (#3484) - Invalid page header again
Re: Bug (#3484) - Invalid page header again
List pgsql-bugs
Zdenek Kotala wrote:
> alex wrote:
>
> <snip>
>
>> WARNING:  relation "transaktion" TID 1240631/12: OID is invalid
>> ERROR:  invalid page header in block 1240632 of relation "transaktion"
>> 7. 2007/12/10 : We started the export of the data ( which runs every
>> morning ) for the last days again. These exports use the same
>> SQL-Commands as the automatical run.
>
> Alex,
>
> please can you provide binary dump of these two pages or if there are
> sensitive data try to use pg_filedump to get only page and tuple headers?
>
>


I got dump of affected two blocks from Alex and it seems that both blocks were
overwritten together with some 128bytes length structure (there some pattern)
and complete damaged size is 9728bytes (first block is overwritten completely
and second one only at the beginning), but another buffer from another relation
could be overwritten too.

I think it is more software bug than hardware, because bad data contains some
logic. There is x54 byte which is repeated after each 128 bytes and so on and
most data are zeros.

My suggestion is apply following patch to catch if data are corrupted by
postgreSQL or elsewhere. It should be failed before writing damaged data to the
disk. It is for HEAD but similar patch could be backported.

Index: backend/storage/buffer/bufmgr.c
===================================================================
RCS file: /zfs_data/cvs_pgsql/cvsroot/pgsql/src/backend/storage/buffer/bufmgr.c,v
retrieving revision 1.227
diff -c -r1.227 bufmgr.c
*** backend/storage/buffer/bufmgr.c     15 Nov 2007 21:14:37 -0000      1.227
--- backend/storage/buffer/bufmgr.c     18 Dec 2007 15:50:06 -0000
***************
*** 1734,1739 ****
--- 1734,1741 ----
         buf->flags &= ~BM_JUST_DIRTIED;
         UnlockBufHdr(buf);

+       if (!PageHeaderIsValid((PageHeader) BufHdrGetBlock(buf)))
+               elog(FATAL, "Buffer cache is damaged!");
         smgrwrite(reln,
                           buf->tag.blockNum,
                           (char *) BufHdrGetBlock(buf),
***************
*** 1966,1971 ****
--- 1968,1976 ----
                                 errcontext.previous = error_context_stack;
                                 error_context_stack = &errcontext;

+                               if (!PageHeaderIsValid((PageHeader)
BufHdrGetBlock(bufHdr)))
+                       elog(FATAL, "Buffer cache is damaged!");
+
                                 smgrwrite(rel->rd_smgr,
                                                   bufHdr->tag.blockNum,
                                                   (char *)
LocalBufHdrGetBlock(bufHdr),
Index: backend/storage/buffer/localbuf.c
===================================================================
RCS file: /zfs_data/cvs_pgsql/cvsroot/pgsql/src/backend/storage/buffer/localbuf.c,v
retrieving revision 1.78
diff -c -r1.78 localbuf.c
*** backend/storage/buffer/localbuf.c   15 Nov 2007 21:14:38 -0000      1.78
--- backend/storage/buffer/localbuf.c   18 Dec 2007 16:05:49 -0000
***************
*** 16,21 ****
--- 16,22 ----
   #include "postgres.h"

   #include "storage/buf_internals.h"
+ #include"storage/bufpage.h"
   #include "storage/bufmgr.h"
   #include "storage/smgr.h"
   #include "utils/guc.h"
***************
*** 161,166 ****
--- 162,169 ----
                 oreln = smgropen(bufHdr->tag.rnode);

                 /* And write... */
+               if (!PageHeaderIsValid((PageHeader) LocalBufHdrGetBlock(bufHdr)))
+               elog(FATAL, "Local buffer cache is damaged!");
                 smgrwrite(oreln,
                                   bufHdr->tag.blockNum,
                                   (char *) LocalBufHdrGetBlock(bufHdr),

pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: BUG #3824: Query hangs when result set empty using sort and limit
Next
From: Zdenek Kotala
Date:
Subject: Re: Bug (#3484) - Invalid page header again