Re: Completely broken replica after PANIC: WAL contains references to invalid pages - Mailing list pgsql-bugs

From Andres Freund
Subject Re: Completely broken replica after PANIC: WAL contains references to invalid pages
Date
Msg-id 20130402182644.GJ2415@alap2.anarazel.de
Whole thread Raw
In response to Re: Completely broken replica after PANIC: WAL contains references to invalid pages  (Andres Freund <andres@2ndquadrant.com>)
Responses Re: Completely broken replica after PANIC: WAL contains references to invalid pages  (Sergey Konoplev <gray.ru@gmail.com>)
Re: Completely broken replica after PANIC: WAL contains references to invalid pages  (Sergey Konoplev <gray.ru@gmail.com>)
Re: Completely broken replica after PANIC: WAL contains references to invalid pages  (Sergey Konoplev <gray.ru@gmail.com>)
List pgsql-bugs
On 2013-04-02 12:10:12 +0200, Andres Freund wrote:
> On 2013-04-01 08:49:16 +0100, Simon Riggs wrote:
> > On 30 March 2013 17:21, Andres Freund <andres@2ndquadrant.com> wrote:
> >
> > > So if the xid is later than latestObservedXid we extend subtrans one by
> > > one. So far so good. But we initialize it in
> > > ProcArrayApplyRecoveryInfo() when consistency is initially reached:
> > >                              latestObservedXid = running->nextXid;
> > >                              TransactionIdRetreat(latestObservedXid);
> > > Before that subtrans has initially been started up with:
> > >                         if (wasShutdown)
> > >                                 oldestActiveXID = PrescanPreparedTransactions(&xids, &nxids);
> > >                         else
> > >                                 oldestActiveXID = checkPoint.oldestActiveXid;
> > > ...
> > >                         StartupSUBTRANS(oldestActiveXID);
> > >
> > > That means its only initialized up to checkPoint.oldestActiveXid. As it
> > > can take some time till we reach consistency it seems rather plausible
> > > that there now will be a gap in initilized pages. From
> > > checkPoint.oldestActiveXid to running->nextXid if there are pages
> > > inbetween.
> >
> > That was an old bug.
> >
> > StartupSUBTRANS() now explicitly fills that gap. Are you saying it
> > does that incorrectly? How?
>
> Well, no. I think StartupSUBTRANS does this correctly, but there's a gap
> between the call to Startup* and the first call to ExtendSUBTRANS. The
> latter is only called *after* we reached STANDBY_INITIALIZED via
> ProcArrayApplyRecoveryInfo(). The problem is that we StartupSUBTRANS to
> checkPoint.oldestActiveXid while we start to ExtendSUBTRANS from
> running->nextXid - 1. There very well can be a gap inbetween.
> The window isn't terribly big but if you use subtransactions as heavily
> as Sergey seems to be it doesn't seem unlikely to hit it.
>
> Let me come up with a testcase and patch.

Developing a testcase was trivial, pgbench running the following function:
CREATE OR REPLACE FUNCTION recurse_and_assign_txid(level bigint DEFAULT 0)
RETURNS bigint
LANGUAGE plpgsql AS $b$
BEGIN
    IF level < 500 THEN
        RETURN recurse_and_assign_txid(level + 1);
    ELSE
        -- assign xid in subtxn and parents
        CREATE TEMPORARY TABLE foo();
        DROP TABLE foo;
        RETURN txid_current()::bigint;
    END IF;
EXCEPTION WHEN others THEN
    RAISE NOTICE 'unexpected';
END
$b$;

When now restarting a standby (so it restarts from another checkpoint) it
frequently crashed with various errors:
* pg_subtrans/xxx does not exist
* (warning) pg_subtrans page does not exist, assuming zero
* xid overwritten in SubTransSetParent

So I think my theory is correct.

The attached patch fixes this although I don't like the way it knowledge of the
point up to which StartupSUBTRANS zeroes pages is handled.

Makes sense?

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

pgsql-bugs by date:

Previous
From: bricklen
Date:
Subject: Re: BUG #8027: Get generated key value while inserting in partitioned table
Next
From: dbenhur@whitepages.com
Date:
Subject: BUG #8034: pg_buffercache gets invalid memory alloc request size with very large shared memory buffers