Re: Streaming Replication patch for CommitFest 2009-09 - Mailing list pgsql-hackers

From Magnus Hagander
Subject Re: Streaming Replication patch for CommitFest 2009-09
Date
Msg-id 9837222c0909170146g7721af7fte033c4a08349f407@mail.gmail.com
Whole thread Raw
In response to Re: Streaming Replication patch for CommitFest 2009-09  (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
List pgsql-hackers
On Thu, Sep 17, 2009 at 10:08, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> Fujii Masao wrote:
>> On Tue, Sep 15, 2009 at 7:53 PM, Heikki Linnakangas
>> <heikki.linnakangas@enterprisedb.com> wrote:
>>> After playing with this a little bit, I think we need logic in the slave
>>> to reconnect to the master if the connection is broken for some reason,
>>> or can't be established in the first place. At the moment, that is
>>> considered as the end of recovery, and the slave starts up. You have the
>>> trigger file mechanism to stop that, but it only gives you a chance to
>>> manually kill and restart the slave before it chooses a new timeline and
>>> starts up, it doesn't reconnect automatically.
>>
>> I was thinking that the automatic reconnection capability is the TODO item
>> for the later CF. The infrastructure for it has already been introduced in the
>> current patch. Please see the macro MAX_WALRCV_RETRIES (backend/
>> postmaster/walreceiver.c). This is the maximum number of times to retry
>> walreceiver. In the current version, this is the fixed value, but we can make
>> this user-configurable (parameter of recovery.conf is suitable, I think).
>
> Ah, I see.
>
> Robert Haas suggested a while ago that walreceiver could be a
> stand-alone utility, not requiring postmaster at all. That would allow
> you to set up streaming replication as another way to implement WAL
> archiving. Looking at how the processes interact, there really isn't
> much communication between walreceiver and the rest of the system, so
> that sounds pretty attractive.

Yes, that would be very very useful.


> Walreceiver is really a slave to the startup process. The startup
> process decides when it's launched, and it's the startup process that
> then waits for it to advance. But the way it's set up at the moment, the
> startup process needs to ask the postmaster to start it up, and it
> doesn't look very robust to me. For example, if launching walreceiver
> fails for some reason, startup process will just hang waiting for it.
>
> I'm thinking that walreceiver should be a stand-alone program that the
> startup process launches, similar to how it invokes restore_command in
> PITR recovery. Instead of using system(), though, it would use
> fork+exec, and a pipe to communicate.

Not having looked at all into the details, that sounds like a nice
improvement :-)


-- Magnus HaganderMe: http://www.hagander.net/Work: http://www.redpill-linpro.com/


pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: Streaming Replication patch for CommitFest 2009-09
Next
From: Emmanuel Cecchet
Date:
Subject: Re: generic copy options