Home > mailing lists

Re: recovery is stuck when children are not processing SIGQUIT from previous crash - Mailing list pgsql-admin

From	Tom Lane
Subject	Re: recovery is stuck when children are not processing SIGQUIT from previous crash
Date	September 23, 2009 11:04:41
Msg-id	21890.1253714661@sss.pgh.pa.us Whole thread Raw
In response to	recovery is stuck when children are not processing SIGQUIT from previous crash (Peter Eisentraut <peter_e@gmx.net>)
Responses	Re: recovery is stuck when children are not processing SIGQUIT from previous crash
List	pgsql-admin

Tree view

Peter Eisentraut <peter_e@gmx.net> writes:
> I have observed the following situation a few times now (weeks or months
> apart), most recently with 8.3.7.  Some postgres child process crashes.
> The postmaster notices and sends SIGQUIT to all other children.  Once
> all other children have exited, it would enter recovery.  But for some
> reason, some children are not processing the SIGQUIT signal and are
> basically just stuck.  That means the whole database system is then
> stuck and won't continue without manual intervention.  If I go in
> manually and SIGKILL the offending processes, everything proceeds
> normally, recovery finishes, and the system is up again.

We need some investigation into why that is happening.

> I haven't had the chance yet to analyze why the SIGQUIT signals are
> getting stuck.  Be that as it may, it appears there are no provisions
> for this case.  I couldn't find any documentation or previous reports on
> this sort of thing.  One might imagine a feature where the postmaster
> resorts to throwing SIGKILLs around after a while, similar to how init
> scripts are sometimes set up.

I'd prefer not to go there, at least not without a demonstration that
this will solve a bug that's unsolvable otherwise.  If a child is
really stuck in a state that doesn't accept SIGQUIT, it probably
won't accept SIGKILL either (eg, uninterruptable disk wait).  Or maybe
we just have some errant code that is blocking SIGQUIT; but that's
a garden variety bug IMO, not something that needs major new postmaster
logic to work around.

            regards, tom lane

pgsql-admin by date:

From: Rafael Domiciano
Date: 23 September 2009, 10:08:01
Subject: Authentication Postgres user via LDAP

From: Isabella Ghiurea
Date: 23 September 2009, 12:47:30
Subject: Re: db size and tables size difference

Re: recovery is stuck when children are not processing SIGQUIT from previous crash - Mailing list pgsql-admin

Previous

Next