Re: Sync Rep: First Thoughts on Code - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Sync Rep: First Thoughts on Code
Date
Msg-id 1228816829.20796.716.camel@hp_dx2400_1
Whole thread Raw
In response to Re: Sync Rep: First Thoughts on Code  ("Fujii Masao" <masao.fujii@gmail.com>)
Responses Re: Sync Rep: First Thoughts on Code
List pgsql-hackers
On Tue, 2008-12-09 at 17:15 +0900, Fujii Masao wrote:
> >
> > But what is p.7? It's even more complex than the original. Forgive me,
> > but I don't understand that. Can you explain?
> 
> p.7 shows one of the system configuration examples. Some people don't
> want to share an archive between two servers would probably choose
> this configuration, I think.
> 
> If archive is not shared, some WAL files before replication starts would not
> be copied automatically from the primary to standby. So, we have to copy
> them by hand or using clusterware ..etc. This is what p.7 shows. If archive
> is shared, archiver on the primary would copy them automatically (p.6).

I agree that is the way to do it *if* the archive is not shared. But why
would you want to *not* share the archive??

> > What is the procedure if the standby shuts down, for example if we wish
> > to restart server to change a parameter?
> 
> Stop postgres by using immediate shutdown, and start postgres from an
> existing database cluster directory. When restarting postgres, if there are
> one or more archives, we also need to copy the WAL files after stopping
> replication before restarting replication.
> 
> > Or to reboot the system it is
> > on. Does the primary switch back to writing files to archive?
> 
> I assume that the primary always writes files to archive, that is, basically
> the primary doesn't switch to non-archiving mode. 

OK, I think that clears up what I was seeing in the code. i.e. I didn't
understand the modes of operation.

I really like most of what you've done, though you must forgive me for
saying I still don't like this. I really am with you on how tiresome
that sounds.

For clarity: I don't think its acceptable to have the archiver send
files to the archive at the same time as we're streaming data. In normal
running we should not duplicate the data paths - its just too much data
volume and/or bandwidth.

The cleanest way I can see is to have two modes of operation:
* First mode is file-based log shipping (FLS) (i.e. "warm standby")
* Second mode is streaming log shipping (SLS) (wal sender to wal
receiver)

When we start we are in FLS mode, then we catch up to the cross-over
point and we switch to SLS mode. If streaming stops, we just switch back
to FLS mode. If they reconnect, we follow same procedure again. So the
two modes are compatible, but are never simultaneously active except for
a short period when we switch modes.

If SLS mode is active then the archiver doesn't send files. If FLS mode
is active, we send files. All of the places in code that currently are
not optimised when XLogArchivingActive() must remain unoptimised for
either FLS or SLS mode, so we need a new name for that.

This makes least number of changes to existing architecture. People
currently use FLS mode and understand it (!), they just add
understanding of SLS mode. It's also a very straightforward
architecture, which means fewer code paths and less weird bugs. (There's
been enough already, as you know).

So just for clarity, let me rephrase it:

We set up FLS mode as we do currently. Then we initiate SLS mode. At the
end of the next WAL file on primary we archive it, then turn off
archiving on primary. (So for up to one WAL file we operate two modes
together).

If SLS mode ends, we send next WAL file via archiver. Some part of that
file has already been streamed across, but that doesn't matter. (If SLS
mode ends because primary is down, we obviously do nothing. If we have a
split brain situation then we rely on clusterware to kill us (STONITH).

So AFAICS p.6 of the architecture is all we really need. Nice, simple.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



pgsql-hackers by date:

Previous
From: Greg Smith
Date:
Subject: Re: contrib/pg_stat_statements 1202
Next
From: Heikki Linnakangas
Date:
Subject: Re: Sync Rep: First Thoughts on Code