Excerpts from Markus Wanner's message of mié nov 17 09:57:18 -0300 2010:
> On 11/17/2010 01:27 PM, Alvaro Herrera wrote:
> > I don't think it's a speed thing only. It would be a great thing to
> > have in autovacuum, for example, where we have constant problem reports
> > because the system failed to fork a new backend. If we could simply
> > reuse an already existing one, it would be a lot more robust.
>
> Hm, that's an interesting point.
>
> To actually increase robustness, it would have to be a failure scenario
> that (temporarily) prevents forking, but allows an existing backend to
> continue to do work (i.e. the ability to allocate memory or open files
> come to mind).
Well, the autovacuum mechanism involves a lot of back-and-forth between
launcher and postmaster, which includes some signals, a fork() and
backend initialization. The failure possibilities are endless.
Fork failure communication is similarly brittle.
> Any idea about what's usually causing these fork() failures? I'm asking
> because I'm afraid that for example, in case of an out of memory
> condition, we'd just hit an OOM error later on, without being able to
> perform the VACUUM job, either.
To be honest I have no idea. Sometimes the server is just too loaded.
Right now we have this "delay", if the process is not up and running in
60 seconds then we have to assume that "something" happened, and we no
longer wait for it. If we knew the process was already there, we could
leave it alone; we'd know it would get to its duty eventually.
--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support