Thread: forced to restart postgresql service yesterday

forced to restart postgresql service yesterday

From
"Merlin Moncure"
Date:
Yesterday one of our clients called up and complained about lousy
performance and being unable to log in to our postgresql 8.0 backed ERP
running on windows 2000 server.  The server has been run for several
months without being restarted or rebooted.

The login was hanging in a simple plpgsql login script which basically
did an insert/update on a small table.  It would hang when called from
within psql, and once hung the query would not respond to cancel
requests (such as they are implemented on win32).  Investigating
further, trying to select form this small table at all would also hang.
Unfortunately, this table tells me which pids are safe to kill and which
are not, so I had no choice to do emergency restart of postgresql
service which went without complaint and everything worked normally,
with nothing extraordinary in the server log.

Not ruling out an obscure win32 problem here but this is fairly
untraceable. This is just a FYI type of post.  This particular client
with about 50 users has been running over a year on win32/pg and this is
the first time I had to restart the service :(.  I am really pushing a
move to linux although there is no reason to believe this will prevent
this from happening again.

Merlin





Re: forced to restart postgresql service yesterday

From
Tom Lane
Date:
"Merlin Moncure" <merlin.moncure@rcsonline.com> writes:
> The login was hanging in a simple plpgsql login script which basically
> did an insert/update on a small table.  It would hang when called from
> within psql, and once hung the query would not respond to cancel
> requests (such as they are implemented on win32).  Investigating
> further, trying to select form this small table at all would also hang.
> Unfortunately, this table tells me which pids are safe to kill and which
> are not,

Did you look at pg_locks or pg_stat_activity?

There is pretty much nothing we can do with this report given the lack
of detail.
        regards, tom lane


Re: forced to restart postgresql service yesterday

From
"Merlin Moncure"
Date:
> Subject: Re: [HACKERS] forced to restart postgresql service yesterday
>
> "Merlin Moncure" <merlin.moncure@rcsonline.com> writes:
> > The login was hanging in a simple plpgsql login script which
basically
> > did an insert/update on a small table.  It would hang when called
from
> > within psql, and once hung the query would not respond to cancel
> > requests (such as they are implemented on win32).  Investigating
> > further, trying to select form this small table at all would also
hang.
> > Unfortunately, this table tells me which pids are safe to kill and
which
> > are not,
>
> Did you look at pg_locks or pg_stat_activity?
>
> There is pretty much nothing we can do with this report given the lack
> of detail.
>
understood, I was in a big hurry to get the server back up.

pg_stat_activity worked ok...there were a lot of hung processes and it's
possible pg was over connection limit although pretty much everything
logs in as super user.  There was also a ton of stuff in pg_locks but it
was hard to determine anything useful because my app makes a lot of use
of userlocks and I don't get the benefit of the revamped 8.1 pg_locks
view.

In any case, while waiting on a lock win32 pg will respond to query
cancel, and the server wouldn't while trying to do anything with this
particular table.  It was like a process sucking black hole.  Also, cpu
load was 0 as was disk.

Merlin