Thread: FATAL Error

FATAL Error

From
Matt
Date:
I'm getting the following error on PGSQL 8.0.8 server that I admin.  I
don't think this is a hardware problem but I'm not sure.  Anyway in the
logfile I'm constantly getting this:

ERROR:  xlog flush request 2/13CEA8AC is not satisfied --- flushed only
to 2/13634EE4
CONTEXT:  writing block 3421 of relation 1663/9533957/9534098
WARNING:  could not write block 3421 of 1663/9533957/9534098
DETAIL:  Multiple failures --- write error may be permanent.


psql -l fails with:
psql: FATAL:  xlog flush request 2/13CEA8AC is not satisfied --- flushed
only to 2/13634EE4
CONTEXT:  writing block 3421 of relation 1663/9533957/9534098

but I can connect using psql DBNAME


Any insight other than restore from your most recent backup?

Thanks,

Matt

Re: FATAL Error

From
Tom Lane
Date:
Matt <matthew@zeut.net> writes:
> I'm getting the following error on PGSQL 8.0.8 server that I admin.  I
> don't think this is a hardware problem but I'm not sure.  Anyway in the
> logfile I'm constantly getting this:

> ERROR:  xlog flush request 2/13CEA8AC is not satisfied --- flushed only
> to 2/13634EE4
> CONTEXT:  writing block 3421 of relation 1663/9533957/9534098

This indicates that the LSN word of that block contains 2/13CEA8AC,
which is impossibly large if WAL currently extends only to 2/13634EE4.
Have you had any crashes or other odd behavior recently?
What files do you see in $PGDATA/pg_xlog/?  Can you grab a copy of
pg_filedump and look at what the block contains on-disk?

            regards, tom lane

Re: FATAL Error

From
"Matthew T. O'Connor"
Date:
Tom Lane wrote:
> Matt <matthew@zeut.net> writes:
>
>> I'm getting the following error on PGSQL 8.0.8 server that I admin.  I
>> don't think this is a hardware problem but I'm not sure.  Anyway in the
>> logfile I'm constantly getting this:
>>
>> ERROR:  xlog flush request 2/13CEA8AC is not satisfied --- flushed only
>> to 2/13634EE4
>> CONTEXT:  writing block 3421 of relation 1663/9533957/9534098
>>
>
> This indicates that the LSN word of that block contains 2/13CEA8AC,
> which is impossibly large if WAL currently extends only to 2/13634EE4.
> Have you had any crashes or other odd behavior recently?
> What files do you see in $PGDATA/pg_xlog/?  Can you grab a copy of
> pg_filedump and look at what the block contains on-disk?

Tom, thanks for the reply.  I'm not aware of any database crashes or OS
crashes lately, the server has been up for 48 days and running smoothly
until today (Monday).

I'm not sure how to get what you are looking for with pg_filedump, I've
got pg_filedump compiled, and tried pg_filedump-4.0/pg_filedump -f -R
3421 ./9534098  but I don't really know how to read the output.

 Here is what I find in pg_xlog:
[root@mail pg_xlog]# ls -l
total 114840
-rw-------   1 postgres postgres 16777216 Aug  7 18:29
000000010000000200000016
-rw-------   1 postgres postgres 16777216 Aug  6 16:38
000000010000000200000017
-rw-------   1 postgres postgres 16777216 Aug  6 20:51
000000010000000200000018
-rw-------   1 postgres postgres 16777216 Aug  7 00:17
000000010000000200000019
-rw-------   1 postgres postgres 16777216 Aug  7 13:36
00000001000000020000001A
-rw-------   1 postgres postgres 16777216 Aug  7 15:12
00000001000000020000001B
-rw-------   1 postgres postgres 16777216 Aug  7 18:01
00000001000000020000001C
drwx------   2 postgres postgres     4096 Jun 27 11:52 archive_status
[root@mail pg_xlog]#


Thanks for your help.

Matt


Re: FATAL Error

From
Tom Lane
Date:
"Matthew T. O'Connor" <matthew@zeut.net> writes:
>  Here is what I find in pg_xlog:
> [root@mail pg_xlog]# ls -l
> total 114840
> -rw-------   1 postgres postgres 16777216 Aug  7 18:29
> 000000010000000200000016

So are you still getting the error?  This looks like your WAL is now
up to 2/16xxxxxx, so that sanity check shouldn't be noticing a problem
anymore.  Not that that should make us feel better --- there's
definitely something fishy here.

> I'm not sure how to get what you are looking for with pg_filedump, I've
> got pg_filedump compiled, and tried pg_filedump-4.0/pg_filedump -f -R
> 3421 ./9534098  but I don't really know how to read the output.

I didn't ask you if *you* knew what the output meant ;-) but it should
show the page LSN like this:

 LSN:  logid      0 recoff 0x00000020      Special  8192 (0x2000)
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

            regards, tom lane