Thread: postgres process got stuck in "notify interrupt waiting" status

postgres process got stuck in "notify interrupt waiting" status

From
Aleksey Tsalolikhin
Date:
Hi.

We use LISTEN/NOTIFY quite a bit but today something unusual (bad) happened.

Number of processes waiting for a lock just started to go up up up.

I finally found the object being locked was pg_listener which
RhodiumToad on IRC kindly informed happens during LISTEN/NOTIFY.  The
process that had the lock (in pg_locks it had granted = t ) was shown
by ps in status "notify interrupt waiting" and has had the lock for
over half an hour.  (Usually these notifications are very quick.)

the process would not respond to kill, so I kill -9'ed

The only reference I could find to a similar problem was at
http://archives.postgresql.org/pgsql-performance/2008-02/msg00345.php
which seemed to indicate a process should not be in this state for
very long.

We are on postgres 8.4.12.

I'd like to figure out what happened.

There is a web server that talks to this database server (amongst
other clients), and the client addr and port mapped to this web
server, but there was no process on the web server matching the port
number.  that's when I decided to kill the postgres process.

Anything I should know or read up on?  Any suggestions?

I'd like the system to be able to recover, and for the process to
terminate if the client is no longer around.

Best,
Aleksey


Re: postgres process got stuck in "notify interrupt waiting" status

From
Aleksey Tsalolikhin
Date:
BTW, after I signalled TERM, the process status changed from

notify interrupt waiting

to

notify interrupt waiting waiting

which I thought looked kind of odd.

Then I signalled KILL.

Aleksey

On Tue, Sep 4, 2012 at 6:21 PM, Aleksey Tsalolikhin
<atsaloli.tech@gmail.com> wrote:
> Hi.
>
> We use LISTEN/NOTIFY quite a bit but today something unusual (bad) happened.
>
> Number of processes waiting for a lock just started to go up up up.
>
> I finally found the object being locked was pg_listener which
> RhodiumToad on IRC kindly informed happens during LISTEN/NOTIFY.  The
> process that had the lock (in pg_locks it had granted = t ) was shown
> by ps in status "notify interrupt waiting" and has had the lock for
> over half an hour.  (Usually these notifications are very quick.)
>
> the process would not respond to kill, so I kill -9'ed
>
> The only reference I could find to a similar problem was at
> http://archives.postgresql.org/pgsql-performance/2008-02/msg00345.php
> which seemed to indicate a process should not be in this state for
> very long.
>
> We are on postgres 8.4.12.
>
> I'd like to figure out what happened.
>
> There is a web server that talks to this database server (amongst
> other clients), and the client addr and port mapped to this web
> server, but there was no process on the web server matching the port
> number.  that's when I decided to kill the postgres process.
>
> Anything I should know or read up on?  Any suggestions?
>
> I'd like the system to be able to recover, and for the process to
> terminate if the client is no longer around.
>
> Best,
> Aleksey



--
Upcoming Trainings:
"Editing with vi" 31 Aug 2012 at LinuxCon North America in San Diego,
CA (http://lcna2012.sched.org/speaker/alekseytsalolikhin)
"Time Management for System Administrators" 28 Sep 2012 at Ohio Linux
Fest (http://ohiolinux.org/register)
"Editing with vi" 28 Sep 2012 at Ohio Linux Fest (http://ohiolinux.org/register)
"Automating System Administration with CFEngine 3" 22-25 Oct 2012 in
Palo Alto, CA (http://www.eventbrite.com/event/3388161081)


Re: postgres process got stuck in "notify interrupt waiting" status

From
John R Pierce
Date:
On 09/04/12 7:09 PM, Aleksey Tsalolikhin wrote:
> BTW, after I signalled TERM, the process status changed from
>
> notify interrupt waiting
>
> to
>
> notify interrupt waiting waiting
>
> which I thought looked kind of odd.
>
> Then I signalled KILL.

was this a client process or a postgres process?   kill -9 on postgres
processes can easily trigger data corruption.



--
john r pierce                            N 37, W 122
santa cruz ca                         mid-left coast



Re: postgres process got stuck in "notify interrupt waiting" status

From
Aleksey Tsalolikhin
Date:
On Tue, Sep 4, 2012 at 7:21 PM, John R Pierce <pierce@hogranch.com> wrote:
> On 09/04/12 7:09 PM, Aleksey Tsalolikhin wrote:
>>
>> BTW, after I signalled TERM, the process status changed from
>>
>> notify interrupt waiting
>>
>> to
>>
>> notify interrupt waiting waiting
>>
>> which I thought looked kind of odd.
>>
>> Then I signalled KILL.
>
>
> was this a client process or a postgres process?   kill -9 on postgres
> processes can easily trigger data corruption.

This was a postgres process.  i certainly won't signal KILL anymore to
postgres processes, thanks for that warning, John.

Aleksey


Re: postgres process got stuck in "notify interrupt waiting" status

From
"Albe Laurenz"
Date:
John R Pierce wrote:
> was this a client process or a postgres process?   kill -9 on postgres
> processes can easily trigger data corruption.

It definitely shouldn't cause data corruption, otherwise
PostgreSQL would not be crash safe.

Yours,
Laurenz Albe


Re: postgres process got stuck in "notify interrupt waiting" status

From
Craig Ringer
Date:
On 09/05/2012 12:21 PM, John R Pierce wrote:
> was this a client process or a postgres process?   kill -9 on postgres
> processes can easily trigger data corruption.

It certainly shouldn't.

kill -9 of the postmaster, deletion of postmaster.pid, and re-starting
postgresql *might* but AFAIK even then you'll have to bypass the shared
memory lockout (unless you're on Windows).

--
Craig Ringer



Re: postgres process got stuck in "notify interrupt waiting" status

From
Tom Lane
Date:
Craig Ringer <ringerc@ringerc.id.au> writes:
> On 09/05/2012 12:21 PM, John R Pierce wrote:
>> was this a client process or a postgres process?   kill -9 on postgres
>> processes can easily trigger data corruption.

> It certainly shouldn't.

> kill -9 of the postmaster, deletion of postmaster.pid, and re-starting
> postgresql *might* but AFAIK even then you'll have to bypass the shared
> memory lockout (unless you're on Windows).

Correction on that: manually deleting postmaster.pid *does* bypass the
shared memory lock.  If there are still any live backends from the old
postmaster, you can get corruption as a result of this, because the old
backends and the new ones will be modifying the database independently.

This is why we recommend that you never delete postmaster.pid manually,
and certainly not as part of an automatic startup script.

Having said that, a kill -9 on an individual backend (*not* the
postmaster) should be safe enough, if you don't mind the fact that
it'll kill all your other sessions too.

            regards, tom lane


Re: postgres process got stuck in "notify interrupt waiting" status

From
Aleksey Tsalolikhin
Date:
On Wed, Sep 5, 2012 at 7:38 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Having said that, a kill -9 on an individual backend (*not* the
> postmaster) should be safe enough, if you don't mind the fact that
> it'll kill all your other sessions too.
>

Got it, thanks.

Why will it kill all your other sessions too?  Isn't there a separate backend
process for each session?

Best,
-at


Re: postgres process got stuck in "notify interrupt waiting" status

From
"Kevin Grittner"
Date:
Aleksey Tsalolikhin <atsaloli.tech@gmail.com> wrote:

> Why will it kill all your other sessions too?  Isn't there a
> separate backend process for each session?

When stopped that abruptly, the process has no chance to clean up
its pending state in shared memory.  A fresh copy of shared memory
is needed, so it is necessary to effectively do an immediate restart
on the whole PostgreSQL instance.

-Kevin


Re: postgres process got stuck in "notify interrupt waiting" status

From
Tom Lane
Date:
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
> Aleksey Tsalolikhin <atsaloli.tech@gmail.com> wrote:
>> Why will it kill all your other sessions too?  Isn't there a
>> separate backend process for each session?

> When stopped that abruptly, the process has no chance to clean up
> its pending state in shared memory.  A fresh copy of shared memory
> is needed, so it is necessary to effectively do an immediate restart
> on the whole PostgreSQL instance.

Right.  On seeing one child die unexpectedly, the postmaster forcibly
SIGQUITs all its other children and initiates a crash recovery sequence.
The reason for this is exactly that we can't trust the contents of
shared memory anymore.  An example is that the dying backend may have
held some critical lock, which there is no way to release, so that every
other session will shortly be stuck anyway.

            regards, tom lane


Re: postgres process got stuck in "notify interrupt waiting" status

From
Aleksey Tsalolikhin
Date:
Got it, thanks, Kevin, Tom.

So how about that this process that was in "notify interrupt waiting
waiting" status after I SIGTERM'ed it.  Is the double "waiting"
expected?

Aleksey


Re: postgres process got stuck in "notify interrupt waiting" status

From
Tom Lane
Date:
Aleksey Tsalolikhin <atsaloli.tech@gmail.com> writes:
> So how about that this process that was in "notify interrupt waiting
> waiting" status after I SIGTERM'ed it.  Is the double "waiting"
> expected?

That sounded a bit fishy to me too.  But unless you can reproduce it in
something newer than 8.4.x, nobody's likely to take much of an interest.
The LISTEN/NOTIFY infrastructure got completely rewritten in 9.0, so
any bugs in the legacy version are probably just going to get benign
neglect at this point ... especially if we don't know how to reproduce
them.

            regards, tom lane


Re: postgres process got stuck in "notify interrupt waiting" status

From
Aleksey Tsalolikhin
Date:
On Wed, Sep 5, 2012 at 10:03 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> That sounded a bit fishy to me too.  But unless you can reproduce it in
> something newer than 8.4.x, nobody's likely to take much of an interest.
> The LISTEN/NOTIFY infrastructure got completely rewritten in 9.0, so
> any bugs in the legacy version are probably just going to get benign
> neglect at this point ... especially if we don't know how to reproduce
> them.

Got it, thanks, Tom!  Will urge our shop to upgrade to 9.1.

Best,
-at