On Mon, 2006-05-15 at 09:28 -0700, Jeff Frost wrote:
> I've run into a problem with a PITR setup at a client. The problem is that
> whenever the CIFS NAS device that we're mounting at /mnt/pgbackup has
> problems
What kind of problems?
> , it seems that the current client connection gets blocked and this
> eventually builds up to a "sorry, too many clients already" error.
This sounds like the archiver keeps waking up and trying the command,
but it fails, yet that request is causing a resource leak on the NAS.
Eventually, archiver retrying the command eventually fails. Or am I
misunderstanding your issues?
> I'm
> wondering if this is expected behavior with the archive command and if I
> should build in some more smarts to my archive script. Maybe I should fork
> and waitpid such that I can use a manual timeout shorter than whatever the
> CIFS timeout is so that I can return an error in a reasonable amount of time?
The archiver is designed around the thought that *attempting* to archive
is a task that it can do indefinitely without a problem; its up to you
to spot that the link is down.
We can put something in to make the retry period elongate, but you'd
need to put a reasonable case for how that would increase robustness.
--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com