Re: Sync Rep: First Thoughts on Code - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Sync Rep: First Thoughts on Code
Date
Msg-id 1228938915.20796.909.camel@hp_dx2400_1
Whole thread Raw
In response to Re: Sync Rep: First Thoughts on Code  (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
Responses Re: Sync Rep: First Thoughts on Code  (Aidan Van Dyk <aidan@highrise.ca>)
Re: Sync Rep: First Thoughts on Code  (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
List pgsql-hackers
On Wed, 2008-12-10 at 20:52 +0200, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > On Wed, 2008-12-10 at 14:39 +0200, Heikki Linnakangas wrote:
> > 
> >> For a solution that doesn't depend on the file-based log shipping, I 
> >> think we'll need a way for the standby to request a certain starting 
> >> point for the streaming when it connects. When the standby starts, it 
> >> would first recover all the log segments it can obtain using 
> >> recovery_command, and then connect to the primary and request to
> >> start 
> >> streaming from where recovery_command stopped.
> > 
> > That was already suggested and rejected because it introduces a
> > potentially unacceptable delay in the start of synch replication - for
> > large databases this could be hours. (I should add it was suggested by
> > me and I now accept that it should be rejected.)
> 
> I don't understand that argument. If the standby is missing say 100 log 
> files, it's not up-to-date with the primary until it has somehow 
> obtained and replayed all those log file. It doesn't make any difference 
> whether it obtains them over the wire via walreceiver, or via an 
> archive. Until it has obtained and replayed all those files, it's not 
> up-to-date, and a failover would lead to data loss.
> 
> Or did I misunderstand what "start of synch replication" means? Got a 
> pointer to the previous discussion?

I think you just went down the same path I did before. (That's a good
sign).

When the WAL starts streaming the *primary* can immediately perform
synchronous replication, i.e. commit waits for transfer. The *standby*
has an initial lag before it catches up, whatever we do (as you say).

I suggested that way initially because it simplifies the mode change.
The mode change isn't really that complex, so I agreed we should change
it.

The two ways of doing this are/were:

1. (Initial suggestion)
* allow standby to catchup
* then connect and allow sync rep

2. Preferred Choice
* connect to primary and allow sync rep
* catch up

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



pgsql-hackers by date:

Previous
From: Martijn van Oosterhout
Date:
Subject: Re: cvs head initdb hangs on unixware
Next
From: Simon Riggs
Date:
Subject: Re: Sync Rep: First Thoughts on Code