Re: Streaming replication, retrying from archive - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Streaming replication, retrying from archive
Date
Msg-id 1263581801.26654.37627.camel@ebony
Whole thread Raw
In response to Re: Streaming replication, retrying from archive  (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
List pgsql-hackers
On Fri, 2010-01-15 at 20:11 +0200, Heikki Linnakangas wrote:

> The states we have at the moment in standby are:
> 
> 1. Archive recovery. Standby fetches WAL files from archive using
> restore_command. When a file is not found in archive, we switch to state 2
> 
> 2. Streaming replication. Standby connects (and reconnects if the
> connection is lost for any reason) to the primary, starts streaming, and
> applies WAL as it arrives. We stay in this state until trigger file is
> found or server is shut down.

> The states with my suggested ReadRecord/FetchRecord refactoring, the
> code I have in the replication-xlogrefactor branch in my git repo, are:
> 
> 1. Initial archive recovery. Standby fetches WAL files from archive
> using restore_command. When a file is not found in archive, we start
> walreceiver and switch to state 2
> 
> 2. Retrying to restore from archive. When the connection to primary is
> established and replication is started, we switch to state 3
> 
> 3. Streaming replication. Connection to primary is established, and WAL
> is applied as it arrives. When the connection is dropped, we go back to
> state 2
> 
> Although the the state transitions between 2 and 3 are a bit fuzzy in
> that version; walreceiver runs concurrently, trying to reconnect, while
> startup process retries restoring from archive. Fujii-san's suggestion
> to have walreceiver stop while startup process retries restoring from
> archive (or have walreceiver run restore_command in approach #2) would
> make that clearer.

The one-way state transitions between 1->2 in both cases seem to make
this a little more complex, rather than more simple. 

If the connection did drop then WAL will be in the archive, so the path
for data is archive->primary->standby. There already needs to be a
network path between archive and standby, so why not drop back from
state 3 -> 1 rather than from 3 -> 2? That way we could have just 2
states on each side, rather than 3.

-- Simon Riggs           www.2ndQuadrant.com



pgsql-hackers by date:

Previous
From: Markus Wanner
Date:
Subject: Re: Testing with concurrent sessions
Next
From: "Kevin Grittner"
Date:
Subject: Re: Testing with concurrent sessions