Fujii Masao escribió:
> On Thu, Sep 17, 2009 at 5:08 PM, Heikki Linnakangas
> <heikki.linnakangas@enterprisedb.com> wrote:
> > Walreceiver is really a slave to the startup process. The startup
> > process decides when it's launched, and it's the startup process that
> > then waits for it to advance. But the way it's set up at the moment, the
> > startup process needs to ask the postmaster to start it up, and it
> > doesn't look very robust to me. For example, if launching walreceiver
> > fails for some reason, startup process will just hang waiting for it.
>
> I changed the postmaster to report the failure of fork of the walreceiver
> to the startup process by resetting WalRcv->in_progress, which prevents
> the startup process from getting stuck when launching walreceiver fails.
> http://archives.postgresql.org/pgsql-hackers/2009-09/msg01996.php
>
> Do you have another concern about the robustness? If yes, I'll address that.
Hmm. Without looking at the patch at all, this seems similar to how
autovacuum does things: autovac launcher signals postmaster that a
worker needs to be started. Postmaster proceeds to fork a worker. This
could obviously fail for a lot of reasons.
Now, there is code in place to notify the user when forking fails, and
this is seen on the wild quite a bit more than one would like :-( I
think it would be a good idea to have a retry mechanism in the
walreceiver startup mechanism so that recovery does not get stuck due to
transient problems.
--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support