Re: BUG #15036: Un-killable queries Hanging in BgWorkerShutdown - Mailing list pgsql-bugs

From Thomas Munro
Subject Re: BUG #15036: Un-killable queries Hanging in BgWorkerShutdown
Date
Msg-id CAEepm=0YQbc32PVbM8BxXDJhmK8+rUTzKhSVC1ujSQ7c1hy5Lw@mail.gmail.com
Whole thread Raw
In response to BUG #15036: Un-killable queries Hanging in BgWorkerShutdown  (PG Bug reporting form <noreply@postgresql.org>)
Responses Re: BUG #15036: Un-killable queries Hanging in BgWorkerShutdown
List pgsql-bugs
On Tue, Jan 30, 2018 at 5:48 AM, PG Bug reporting form
<noreply@postgresql.org> wrote:
> The following bug has been logged on the website:
>
> Bug reference:      15036
> Logged by:          David Kohn
> Email address:      djk447@gmail.com
> PostgreSQL version: 10.1
> Operating system:   Ubuntu 16.04
> Description:
>
> I have been experiencing a consistent problem with queries that I cannot
> kill with pg_cancel_backend or pg_terminate_backend. In many cases they have
> been running for days and are in a transaction so it eventually causes
> rather large bloat etc problems. All the backends are in the IPC wait_event.
> The backends appear to either be a main client_backend, in which case
> wait_event_type fields in pg_stat_activity say BgWorkerShutdown and for the
> background workers I see two (though I'm not sure that that this is all of
> them): BtreePage and MessageQueuePutMessage. I'm quite sure the clients for
> these are dead, they had statement timeouts set to an hour at most, they
> might have died sooner than that of other causes. I assume this is a bug and
> I should be reporting it here, but if I'm putting it on the wrong list let
> me know and I'll move it!

Hi David,

Thanks for the report!  Based on the mention of BtreePage, this sounds
like the following bug:

https://www.postgresql.org/message-id/flat/CAEepm%3D2xZUcOGP9V0O_G0%3D2P2wwXwPrkF%3DupWTCJSisUxMnuSg%40mail.gmail.com

The fix for that will be in 10.2 (current target date: February 8th).
The workaround in the meantime would be to disable parallelism, at
least for the queries doing parallel index scans if you can identify
them.

However, I'm not entirely sure why you're not able to cancel these
backends politely with pg_cancel_backend().  For example, the
BtreePage waiter should be in ConditionVariableSleep() and should be
interrupted by such a signal and error out in CHECK_FOR_INTERRUPTS().

-- 
Thomas Munro
http://www.enterprisedb.com


pgsql-bugs by date:

Previous
From: Andres Freund
Date:
Subject: Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop
Next
From: Tomas Vondra
Date:
Subject: Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop