Re: SIGUSR1 pingpong between master na autovacum launcher causes crash - Mailing list pgsql-hackers

From Zdenek Kotala
Subject Re: SIGUSR1 pingpong between master na autovacum launcher causes crash
Date
Msg-id 1250884883.1320.66.camel@localhost
Whole thread Raw
In response to Re: SIGUSR1 pingpong between master na autovacum launcher causes crash  (Alvaro Herrera <alvherre@commandprompt.com>)
List pgsql-hackers
Alvaro Herrera píše v pá 21. 08. 2009 v 15:40 -0400:
> Zdenek Kotala wrote:
> 
> > The problem what I see here is that StartAutovacuumWorker() fails and
> > send SIGUSR1 to the postmaster, but it send it too quickly and signal
> > handler is still active. When signal mask is unblocked in
> > sigusr1_handler() than signal handler is run again...
> > 
> > The reason why StartAutovacuumWorker() is interesting. Log says:
> > 
> > LOG:  could not fork autovacuum worker process: Not enough space
> 
> Does this mean that the machine is out of swap space?

It is ENOMEM error. But it is strange. Machine has 4GB RAM and it was
freshly installed PG84 without any data and with default configuration.
It was not under load. I did not find any clue what happend with memory
on this system. The question is if out of memory was a source or result
of the pinpong. 

> 
> > It is strange and I don't understand it. May be too many nested signal
> > handlers call could cause it.
> > 
> > Strange also is that 100ms is not enough to protect this situation, but
> > I think that sleep could interrupted by signal.
> > 
> > My suggestion is to set for example gotUSR1=true in sigusr1_handler()
> > and in the server loop check if we got a USR1 signal. It avoids any
> > problems with signal handler which is not currently POSIX compliant
> > anyway.
> 
> What 100ms?  The pg_usleep call you see in ServerLoop is only there
> during shutdown; normally it would be the select() call that would be
> blocking the process.

I mean AutoVacLauncherMain()
http://doxygen.postgresql.org/autovacuum_8c.html#19ef1013e6110a4536ed92a454aba8c9
line 656

> If sigusr1_handler needs rewriting, don't all the other sighandler as
> well?  Note that the process is supposed to be running with signals
> blocked all the time except during those sleep/select calls, which is
> what (according to comments) let the sighandlers do nontrivial tasks.

Comments says that it is OK. POSIX says that is not OK and my instinct
say to trust the POSIX standard. Especially I do not see any reason why
we need do this in signal handler. avl_sigterm_handler and so on are
good example how it should be implemented in postmaster as well.

The core shows that it is not good idea to have complicated signal
handler.
Zdenek





pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: SIGUSR1 pingpong between master na autovacum launcher causes crash
Next
From: Tom Lane
Date:
Subject: Re: SIGUSR1 pingpong between master na autovacum launcher causes crash