Re: Hot Standby: Startup at shutdown checkpoint - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Hot Standby: Startup at shutdown checkpoint
Date
Msg-id 1270726334.8305.26.camel@ebony
Whole thread Raw
In response to Re: Hot Standby: Startup at shutdown checkpoint  (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
Responses Re: Hot Standby: Startup at shutdown checkpoint  (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
List pgsql-hackers
On Thu, 2010-04-08 at 13:33 +0300, Heikki Linnakangas wrote:

> > If standby_mode is enabled and there is no source of WAL, then we get a
> > stream of messages saying
> > 
> > LOG:  record with zero length at 0/C000088
> > ...
> > 
> > but most importantly we never get to the main recovery loop, so Hot
> > Standby never gets to start at all. We can't keep retrying the request
> > for WAL and at the same time enter the retry loop, executing lots of
> > things that expect non-NULL pointers using a NULL xlog pointer.
> 
> You mean it can't find even the checkpoint record to start replaying? 

Clearly I don't mean that. Otherwise it wouldn't be "start from a
shutdown checkpoint". I think you are misunderstanding me.

Let me explain in more detail though please also read the patch before
replying, if you do.

The patch I submitted at top of this thread works for allowing Hot
Standby during recovery. Yes, of course that occurs when the database is
consistent. The trick is to get recovery to the point where it can be
enabled. The second patch on this thread presents a way to get the
database to that point; it touches some of the other recovery code that
you and Masao have worked on. We *must* touch that code if we are to
enable Hot Standby in the way you desire.

In StartupXlog() when we get to the point where we "Find the first
record that logically follows the checkpoint", in the current code
ReadRecord() loops forever, spitting out
LOG: record with zero length at 0/C000088
...

That prevents us from going further down StartupXLog() to the point
where we start the InRedo loop and hence start hot standby. As long as
we retry we cannot progress further: this is the main problem.

So in the patch, I have modified the retry test in ReadRecord() so it no
longer retries iff there is no WAL source defined. Now, when
ReadRecord() exits, record == NULL at that point and so we do not (and
cannot) enter the redo loop.

So I have introduced the new mode ("snapshot mode") to enter hot standby
anyway. That avoids us having to screw around with the loop logic for
redo. I don't see any need to support the case of where we have no WAL
source defined, yet we want Hot Standby but we also want to allow
somebody to drop a WAL file into pg_xlog at some future point. That has
no use case of value AFAICS and is too complex to add at this stage of
the release cycle.

-- Simon Riggs           www.2ndQuadrant.com



pgsql-hackers by date:

Previous
From: Thom Brown
Date:
Subject: Re: [pgadmin-hackers] Feature request: limited deletions
Next
From: Simon Riggs
Date:
Subject: Re: Remaining Streaming Replication Open Items