Re: Change pg_last_xlog_receive_location not to move backwards - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Change pg_last_xlog_receive_location not to move backwards
Date
Msg-id 1297531924.1747.3209.camel@ebony
Whole thread Raw
In response to Re: Change pg_last_xlog_receive_location not to move backwards  (Fujii Masao <masao.fujii@gmail.com>)
List pgsql-hackers
On Mon, 2011-01-31 at 16:12 +0900, Fujii Masao wrote:
> On Sun, Jan 30, 2011 at 10:44 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
> > I do not understand what doing so gets us.
> >
> > Say we previously received 2/3 of a WAL file, and replayed most of it.
> > So now the shared buffers have data that has been synced to that WAL
> > file already, and some of those dirty shared buffers have been written
> > to disk and some have not.  At this point, we need the data in the first
> > 2/3 of the WAL file in order to reach a consistent state.  But now we
> > lose the connection to the master, and then we restore it.  Now we
> > request the entire file from the start rather than from where it
> > left off.
> >
> > Either of two things happens.  Most likely, the newly received WAL file
> > matches the file it is overwriting, in which case there was no
> > point in asking for it.
> >
> > Less likely, the master is feeding us gibberish.  By requesting the
> > full WAL file, we check the header and detect that the master is feeding
> > us gibberish.  Unfortunately, we only detect that fact *after* we have
> > replaced a critical part of our own (previously good) copy of the WAL
> > file with said gibberish.  The standby is now in an unrecoverable state.
> 
> Right. To avoid this problem completely, IMO, walreceiver should validate
> the received WAL data before writing it. Or, walreceiver should write the
> WAL to the transient file, and the startup process should rename it to the
> correct name after replaying it.
> 
> We should do something like the above?
> 
> > With a bit of malicious engineering, I have created this situation.
> > I don't know how likely it is that something like that could happen
> > accidentally, say with a corrupted file system.  I have been unable
> > to engineer a situation where checking the header actually does
> > any good.  It has either done nothing, or done harm.
> 
> OK, I seem to have to consider again why the code which retreats the
> replication starting location exists.
> 
> At first, I added the code to prevent a half-baked WAL file. For example,
> please imagine the case where you start the standby server with no WAL
> files in pg_xlog. In this case, if replication starts from the middle of WAL
> file, the received WAL file is obviously broken (i.e., with no WAL data in
> the first half of file). This broken WAL file might cause trouble when we
> restart the standby and even when we just promote it (because the last
> applied WAL file is re-fetched at the end of recovery).
> 
> OTOH, we can start replication from the middle of WAL file if we can
> *ensure* that the first half of WAL file already exists. At least, when the
> standby reconnects to the master, we might be able to ensure that and
> start from the middle.

Some important points here, but it seems complex.

AFAICS we need to do two things
* check header of WAL file matches when we start streaming
* start streaming from last received location

So why not do them separately, rather than rewinding the received
location and kludging things?

Seems easier to send all required info in a separate data packet, then
validate the existing WAL file against that. Then start streaming from
last received location. That way we don't have any "going backwards"
logic at all.

-- Simon Riggs           http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services



pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: Change pg_last_xlog_receive_location not to move backwards
Next
From: Dimitri Fontaine
Date:
Subject: Re: Extensions vs PGXS' MODULE_PATHNAME handling