Re: Streaming Replication patch for CommitFest 2009-09 - Mailing list pgsql-hackers

From Alvaro Herrera
Subject Re: Streaming Replication patch for CommitFest 2009-09
Date
Msg-id 20091006134200.GC5929@alvh.no-ip.org
Whole thread Raw
In response to Re: Streaming Replication patch for CommitFest 2009-09  (Fujii Masao <masao.fujii@gmail.com>)
Responses Re: Streaming Replication patch for CommitFest 2009-09
List pgsql-hackers
Fujii Masao escribió:
> On Thu, Sep 17, 2009 at 5:08 PM, Heikki Linnakangas
> <heikki.linnakangas@enterprisedb.com> wrote:
> > Walreceiver is really a slave to the startup process. The startup
> > process decides when it's launched, and it's the startup process that
> > then waits for it to advance. But the way it's set up at the moment, the
> > startup process needs to ask the postmaster to start it up, and it
> > doesn't look very robust to me. For example, if launching walreceiver
> > fails for some reason, startup process will just hang waiting for it.
> 
> I changed the postmaster to report the failure of  fork of the walreceiver
> to the startup process by resetting WalRcv->in_progress, which prevents
> the startup process from getting stuck when launching walreceiver fails.
> http://archives.postgresql.org/pgsql-hackers/2009-09/msg01996.php
> 
> Do you have another concern about the robustness? If yes, I'll address that.

Hmm.  Without looking at the patch at all, this seems similar to how
autovacuum does things: autovac launcher signals postmaster that a
worker needs to be started.  Postmaster proceeds to fork a worker.  This
could obviously fail for a lot of reasons.

Now, there is code in place to notify the user when forking fails, and
this is seen on the wild quite a bit more than one would like :-(  I
think it would be a good idea to have a retry mechanism in the
walreceiver startup mechanism so that recovery does not get stuck due to
transient problems.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


pgsql-hackers by date:

Previous
From: Petr Jelinek
Date:
Subject: Re: [PATCH] DefaultACLs
Next
From: Tom Lane
Date:
Subject: Re: Privileges and inheritance