Thread: stuck spinlock. Aborting. What does this mean?

stuck spinlock. Aborting. What does this mean?

From
Markus Bertheau
Date:
Good day,

cenes=3D> select version();
                            version=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20
---------------------------------------------------------------
 PostgreSQL 7.0.2 on i686-pc-linux-gnu, compiled by gcc 2.95.2
(1 row)

(I know it is old, update is in the queue)

The following happened to our production system few minutes ago.

Our web interface gave a standard error message saying db doesn't work.
I looked and found several postgres processes with parent process init
and *no* postmaster process.

Please have a look at the log at

http://www.bluetwanger.de/postgres.fail.log

It is 13K.

Can someone tell me what happend and clarify the situation?

Thanks

Markus Bertheau

P.S. I will be reading email for around 1 hour from now on, after that
comes the weekend, and I don't have internet at home. So don't wonder if
I don't answer.

Re: stuck spinlock. Aborting. What does this mean?

From
Tom Lane
Date:
Markus Bertheau <twanger@bluetwanger.de> writes:
> I looked and found several postgres processes with parent process init
> and *no* postmaster process.

It looks to me like your postmaster crashed because there were no free
file descriptors left in the system:

FATAL 1:  ReleaseLruFile: No open files available to be closed

I believe this would lead to the subsequent errors shown in the log,
because the postmaster would delete the semaphores it owns before
exiting.

The same hypothesis probably explains the peculiar backend-startup
failures earlier in your log:

FATAL 1:  File '/var/lib/pgsql/PG_VERSION' does not exist or no read permission.

Consider increasing your kernel file table size.  An update to PG 7.1
also seems in order...

            regards, tom lane

Re: stuck spinlock. Aborting. What does this mean?

From
Markus Bertheau
Date:
On Fri, 2001-12-14 at 23:31, Tom Lane wrote:
> It looks to me like your postmaster crashed because there were no free
> file descriptors left in the system:

Does 7.1 or 7.2 behave differently when confronted with such a
situation?

Markus Bertheau

Re: stuck spinlock. Aborting. What does this mean?

From
Tom Lane
Date:
Markus Bertheau <twanger@bluetwanger.de> writes:
> On Fri, 2001-12-14 at 23:31, Tom Lane wrote:
>> It looks to me like your postmaster crashed because there were no free
>> file descriptors left in the system:

> Does 7.1 or 7.2 behave differently when confronted with such a
> situation?

I believe we've fixed the particular case you exhibited as of 7.1.
Can't swear that there are no similar problems anywhere, however.

In practice, being out of file descriptors will take down most parts
of a Unix system, so whether the postmaster is bulletproof or not is
not all that interesting a question.  The only practical way to proceed
is to make sure that the situation never happens.  7.2 has a config
variable that can be used to limit Postgres' appetite for file
descriptors, even when the kernel lies about its ability to provide
lots of descriptors (unfortunately a common practice).

            regards, tom lane