Re: ERROR: XLogFlush: request - Mailing list pgsql-general

From Nitin Verma
Subject Re: ERROR: XLogFlush: request
Date
Msg-id 640150C1BB635E4C9F2F617BA3EFF1D101FBFF17@XCHMTV1.azulsystems.com
Whole thread Raw
In response to Re: ERROR: XLogFlush: request  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: ERROR: XLogFlush: request  (Alvaro Herrera <alvherre@commandprompt.com>)
Re: ERROR: XLogFlush: request  (Scott Marlowe <smarlowe@g2switchworks.com>)
List pgsql-general
Thanx Tom, anyway we are moving to 8.1.0 soon.

Leaving that moving all our client to newer release will take sometime. I
hope you know how it works in a product. Till that time we need to release a
patch that recovers from this condition.

Said that, do we have some advice or workarounds?

I saw 8.1.0's code; it even ends up handling the same condition.

    /*
     * If we still haven't flushed to the request point then we have a
     * problem; most likely, the requested flush point is past end of
XLOG.
     * This has been seen to occur when a disk page has a corrupted LSN.
     *
     * Formerly we treated this as a PANIC condition, but that hurts the
system's
     * robustness rather than helping it: we do not want to take down the
     * whole system due to corruption on one data page.  In particular,
if the
     * bad page is encountered again during recovery then we would be
unable
     * to restart the database at all!    (This scenario has actually
happened
     * in the field several times with 7.1 releases. Note that we cannot
get
     * here while InRedo is true, but if the bad page is brought in and
marked
     * dirty during recovery then CreateCheckPoint will try to flush it
at the
     * end of recovery.)
     *
     * The current approach is to ERROR under normal conditions, but only
WARNING
     * during recovery, so that the system can be brought up even if
there's a
     * corrupt LSN.  Note that for calls from xact.c, the ERROR will be
     * promoted to PANIC since xact.c calls this routine inside a
critical
     * section.  However, calls from bufmgr.c are not within critical
sections
     * and so we will not force a restart for a bad LSN on a data page.
     */
    if (XLByteLT(LogwrtResult.Flush, record))
        elog(InRecovery ? WARNING : ERROR,
        "xlog flush request %X/%X is not satisfied --- flushed only
to %X/%X",
             record.xlogid, record.xrecoff,
             LogwrtResult.Flush.xlogid,
LogwrtResult.Flush.xrecoff);

Thus there is a probability of same happing again, so will need a solution to
recover out of it.


So I re-quote myself again:

=========
A java process using postgres 7.3.2, got these errors

java.sql.SQLException: ERROR:  XLogFlush: request 0/240169BC is not satisfied
--- flushed only to 0/23FFC01C

While these errors where filling the logs, we were able to connect via psql,
and see all the data.

> This has been seen to occur when a disk page has a corrupted LSN
I suppose LSN refers to Logical sector number of a WAL. If that was corrupted
how-come we were able to access it via psql. Is it just an isolated
phenomenon? Does postgres have an auto-recovery for this? If yes did old
connections have stale values of LSN?

Coming to safeguard:

1. Is there any use of restarting java process when this happens?
2. Is there any use of or Is it safe to restart postmaster at this time?

What all should be done when this happened? Any suggestions.

=========



-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Friday, April 13, 2007 8:18 PM
To: Nitin Verma
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] ERROR: XLogFlush: request

"Nitin Verma" <nitinverma@azulsystems.com> writes:
> xlog.c code from version we use (7.3.2)
> ...
> What all should be done when this happened? Any suggestions.

Updating to something newer than 7.3.2 would seem to be a good idea.
7.3.18 is the current release in that branch.

            regards, tom lane

pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: Regard to PANIC: unexpected hash relation size
Next
From: Tom Lane
Date:
Subject: Re: corr() in 8.1