Re: Synchronous replication patch v1 - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Synchronous replication patch v1
Date
Msg-id 1225893718.17744.95.camel@ebony.2ndQuadrant
Whole thread Raw
In response to Synchronous replication patch v1  ("Fujii Masao" <masao.fujii@gmail.com>)
Responses Re: Synchronous replication patch v1
List pgsql-hackers
Hi Fujii,

Here's some initial thoughts on the structure of this. I've deliberately
not yet read other comments, so we have some independent viewpoints.
Sorry if that means we end up saying same thing twice.

On Fri, 2008-10-31 at 20:36 +0900, Fujii Masao wrote:

> 1) Walsender
> 
> This is new process to focus on sending xlog through the position
> which a backend requests on commit. Walsender calculates the area
> of xlog to be replicated by a logic similar to XLogWrite.
> 
> At first, walsender is forked as a normal backend by postmater (i.e.
> the standby connects to postmaster just like normal frontend).
> A backend works as walsender after receiving "mimic-walsender"
> message. Then, walsender is handled differently from a backend.
> 
> Now, the number of walsenders is restricted to one.

I would think we would want this integrated into the server as an
additional special backend, similar to WALWriter. If it works for now,
that's fine for other testing. This is not an especially difficult
change, I can help with this.

> 2) Communication between backends and walsender
> 
> On commit, a backend tells walsender the position (LSN) to be
> replicated via shmem, and wakes it up by signaling if needed.
> Then, a backend sleeps until requested replication is completed.
> At this time, walsender might signal a backend to wake up.
> 
> Synchronous and asynchronous replication mode are supported.
> In async case, a backend basically don't need to sleep for replication.
> 
> User can tune a backend's max sleep time as a replication timeout.
> Now, the timeout closes the connection to the standby, terminates
> walsender, but the other postgres process continue to work.

No comments until I've read the code.

> 3) Management of the xlog positions for replication
> 
> XLog positions are managed consistent. It's necessary to be careful
> especially in AdvanceXLInsertBuffer and xlog_switch case.

Sounds good.

> 4) Walreceiver
> 
> This is new contrib program to focus on receiving xlog and writing it.
> User can specify the xlog location (where walreceiver writes xlog in
> just after receiving), and the archive location (where walreceiver
> archives a filled xlog file). This options are used to cooperate with
> pg_standby (prevents pg_standby from reading the xlog file under
> walreceiver writing)

Again, I would expect this to be integrated with server. I would expect
code to live in src/postmaster/walreceiver.c, with main logic in a file
alongside xlog.c, perhaps xreceive.c. We would start WALReceiver when we
enter archive recovery mode - I already have logic for this state
change. After that you would be able to use the archive location
specified via recovery.conf.

The logic need not be any further integrated than you have here.

> The above is a necessary minimum function, and some requests
> which came out in the discussion have not been implemented yet.
> If there is other indispensable function, please let me know.
> 
> And, there are some problems in this patch;
> 
> * This patch is somewhat big, though it should be subdivided for
>    review.
> 
> * Source code comments and documents are insufficient.

Source code comments are essential. I try to put enough comments so that
each chunk of the patch has a comment to explain why that change is a
necessary part of the whole patch. Doing that is a good way to find
chunks that you can remove.

> Now, there are three configurable parameter in postgresql.conf.
> 
> > synchronous_replication = on   # immediate replication at commit
> > replication_timeout = 0ms      # 0 is disabled
> > wal_sender_delay = 200ms       # 1-10000 milliseconds

Could you write some docs for this? I just want to check how you think
it will work.

Does synchronous_replication = off mean 
a) asynchronous replication or
b) no replication at all

I want to be able to specify synch, asynch or no replication.


We need an explanation and example of how to set this up when performing
a large initial base backup. Earlier we discussed using archiver to
transfer initial files and then switching to streaming mode later. How
does all that work now?
http://archives.postgresql.org/pgsql-hackers/2008-09/msg01208.php

I'll be looking at this a lot more over next few weeks/months, so this
is just a few short initial comments.

Well done for getting this together so quickly, especially with your
visit to hospital taking away time. 

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



pgsql-hackers by date:

Previous
From: Greg Stark
Date:
Subject: Re: [WIP] In-place upgrade
Next
From: Zdenek Kotala
Date:
Subject: Re: [WIP] In-place upgrade