Thread: BUG #15290: Stuck Parallel Index Scan query

BUG #15290: Stuck Parallel Index Scan query

From
PG Bug reporting form
Date:
The following bug has been logged on the website:

Bug reference:      15290
Logged by:          Victor Yegorov
Email address:      vyegorov@gmail.com
PostgreSQL version: 10.4
Operating system:   Debian GNU/Linux 8.7 (jessie)
Description:

We've just encountered an issue on the streaming replica of the client.

In short — query is active for 8 hours and we're not able to terminate it:
all processes (leader and workers) are ignoring all signals.
Symptoms are similar to the ones described in #15036. As a result, we had to
perform immediate shutdown f the instance.
After restart we've got another such stuck query within minutes, so it's not
a single occurrence..


Query in subject
----------------
-[ RECORD 1
]-----------------------------------------------------------------------------------------------------
ts_age      | 08:32:17.286343
state       | active
query_age   | 08:32:17.286345
change_age  | 08:32:17.286344
datname     | coub
pid         | 2877
usename     | app
waiting     | f
client_addr |
client_port | -1
query       | select count(*) as value from coubs where type='Coub::Simple'
and is_done=false and in_process=false

                                                             QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=666001.67..666001.72 rows=1 width=8)
   ->  Gather  (cost=666001.20..666001.65 rows=8 width=8)
         Workers Planned: 8
         ->  Partial Aggregate  (cost=665901.20..665901.25 rows=1 width=8)
               ->  Parallel Index Scan using
coubs_type_is_done_partial_simple on coubs  (cost=0.56..663016.01
rows=1154077 width=0)
                     Index Cond: (is_done = false)
                     Filter: ((NOT is_done) AND (NOT in_process))
(7 rows)

In reality this query worked with just 4 workers:
 2877 ?        ts     0:08  \_ postgres: 10/main: app coub [local] SELECT
 3416 ?        Ss     0:00  \_ postgres: 10/main: bgworker: parallel worker
for PID 2877
 3417 ?        Ss     0:00  \_ postgres: 10/main: bgworker: parallel worker
for PID 2877
 3418 ?        Ss     0:00  \_ postgres: 10/main: bgworker: parallel worker
for PID 2877
 3419 ?        Ss     0:00  \_ postgres: 10/main: bgworker: parallel worker
for PID 2877

We have backtraces of the main process and all workers, and also
`pg_stat_activity` snapshot — I will attach them to the new e-mail.


Re: BUG #15290: Stuck Parallel Index Scan query

From
Victor Yegorov
Date:
вс, 22 июл. 2018 г. в 21:31, PG Bug reporting form <noreply@postgresql.org>:
The following bug has been logged on the website:

Bug reference:      15290
Logged by:          Victor Yegorov
Email address:      vyegorov@gmail.com
PostgreSQL version: 10.4
Operating system:   Debian GNU/Linux 8.7 (jessie)

And here're backtraces and pg_stat_activity snapshot.

--
Victor Yegorov
Attachment

Re: BUG #15290: Stuck Parallel Index Scan query

From
Victor Yegorov
Date:
вс, 22 июл. 2018 г. в 21:31, PG Bug reporting form <noreply@postgresql.org>:
The following bug has been logged on the website:

Bug reference:      15290
Logged by:          Victor Yegorov
Email address:      vyegorov@gmail.com
PostgreSQL version: 10.4
Operating system:   Debian GNU/Linux 8.7 (jessie)

Some more details.
Index that was chosen for Parallel Index Scan was a partial index. And it is quite bloated:

"coubs_type_is_done_partial_simple" btree (type, is_done) WHERE type::text = 'Coub::Simple'::text
6766MB

What we did — created another, ordinary index:

"index_coubs_on_is_done_and_in_process_and_type" btree (is_done, in_process, type)
2141MB

So now planner prefers to use new index and issue is not re-appearing. Query executes within 10-20ms.
Old bloated index is still around for analysis. `amcheck` showed no issues with it, though.
 

--
Victor Yegorov

Re: BUG #15290: Stuck Parallel Index Scan query

From
Thomas Munro
Date:
On Mon, Jul 23, 2018 at 8:52 AM, Victor Yegorov <vyegorov@gmail.com> wrote:
> вс, 22 июл. 2018 г. в 21:31, PG Bug reporting form <noreply@postgresql.org>:
>>
>> The following bug has been logged on the website:
>>
>> Bug reference:      15290
>> Logged by:          Victor Yegorov
>> Email address:      vyegorov@gmail.com
>> PostgreSQL version: 10.4
>> Operating system:   Debian GNU/Linux 8.7 (jessie)

Thanks for the detailed report!  To summarise your back traces, we have:

PID 2877 is the master process and has decided to abort and is waiting
for the workers to exit:

WaitLatch
WaitForBackgroundWorkerShutdown
WaitForParallelWorkersToExit
DestroyParallelContext
AtEOXact_Parallel
AbortTransaction
AbortCurrentTransaction
PostgresMain

PIDs 3416, 3417, 3418, 3419 meanwhile are waiting to seize the scan head:

WaitEventSetWaitBlock
ConditionVariableSleep
_bt_parallel_seize
_bt_readnextpage

Presumably 2877 has it (?), but aborted (do you have an error message
in the server log?), and the workers have somehow survived
TerminateBackgroundWorker() (called by DestroyParallelContext()).
Hmm.

TerminateBackgroundWorker() sets a shm flag and signals the
postmaster, the postmaster signals the worker with SIGTERM, the worker
handles SIGTERM in die() by setting ProcDiePending = true,
InterruptPending= true and then setting its own latch, and the
ConditionVarableSleep() loop should wake up, go around its loop again
and run CHECK_FOR_INTERRUPTS() which sees InterruptPending and runs
ProcessInterrupts(), which sees ProcDiePending and reports FATAL.  So
which domino fell short here?

--
Thomas Munro
http://www.enterprisedb.com


Re: BUG #15290: Stuck Parallel Index Scan query

From
Andres Freund
Date:
On 2018-07-23 16:30:28 +1200, Thomas Munro wrote:
> TerminateBackgroundWorker() sets a shm flag and signals the
> postmaster, the postmaster signals the worker with SIGTERM, the worker
> handles SIGTERM in die() by setting ProcDiePending = true,
> InterruptPending= true and then setting its own latch, and the
> ConditionVarableSleep() loop should wake up, go around its loop again
> and run CHECK_FOR_INTERRUPTS() which sees InterruptPending and runs
> ProcessInterrupts(), which sees ProcDiePending and reports FATAL.  So
> which domino fell short here?

I've not looked at this, but is it possible that interrupts aren't
accepted because they're held (e.g. because of an lwlock)?

Greetings,

Andres Freund


Re: BUG #15290: Stuck Parallel Index Scan query

From
Victor Yegorov
Date:
пн, 23 июл. 2018 г. в 7:31, Thomas Munro <thomas.munro@enterprisedb.com>:
PID 2877 is the master process and has decided to abort and is waiting
for the workers to exit:

WaitLatch
WaitForBackgroundWorkerShutdown
WaitForParallelWorkersToExit
DestroyParallelContext
AtEOXact_Parallel
AbortTransaction
AbortCurrentTransaction
PostgresMain

PIDs 3416, 3417, 3418, 3419 meanwhile are waiting to seize the scan head:

WaitEventSetWaitBlock
ConditionVariableSleep
_bt_parallel_seize
_bt_readnextpage

Presumably 2877 has it (?), but aborted (do you have an error message
in the server log?), and the workers have somehow survived
TerminateBackgroundWorker() (called by DestroyParallelContext()).

Query was stuck for 8 hours when we tried to terminate it. Makes me think, that master process was
still waiting for bgworkers to finish, as test run finished in 11ms for me.
As I mentioned, we've got this case re-appear while I was preparing
the report (had to restart the DB second time). I think I might make it happen again, if necessary.

There is not so much in the logs:
- a bunch of `FATAL:  connection to client lost`, but from another (web) user (couple errors per hour)
- `ERROR:  canceling statement due to conflict with recovery`, happened right when our problematic query started, same user
- errors related to shutdown/startup of the DB.

 
--
Victor Yegorov

Re: BUG #15290: Stuck Parallel Index Scan query

From
Thomas Munro
Date:
On Mon, Jul 23, 2018 at 7:57 PM, Victor Yegorov <vyegorov@gmail.com> wrote:
> - `ERROR:  canceling statement due to conflict with recovery`, happened
> right when our problematic query started, same user

Ok, so that would explain how the master was cancelled.  In 2877's
stack we see that it was aborting here:

#11 0x00007f539697ba5e in PostgresMain (argc=1,
argv=argv@entry=0x7f5398d1bbc8, dbname=0x7f5398d1bb98 "coub",
username=0x7f5398d1bbb0 "app") at
/build/postgresql-10-U6N320/postgresql-10-10.4/build/../src/backend/tcop/postgres.c:3879

That line calls AbortCurrentTransaction(), just after the call to
EmitErrorReport() that wrote something in your log.  Andres's theory
(interrupts 'held') seems promising... perhaps there could be a bug
where parallel index scans leak a share-locked page or something like
that.  I tried to reproduce this a bit, but no cigar so far.  I wonder
if there could be something about your bloated index that reaches
buggy behaviour...

If you happen to have a core file for a worker that is waiting in
ConditionVariableSleep(), or it happens again, you'd be able to see if
an LWLock is causing this by printing num_held_lwlocks.

-- 
Thomas Munro
http://www.enterprisedb.com


Re: BUG #15290: Stuck Parallel Index Scan query

From
Victor Yegorov
Date:
пн, 23 июл. 2018 г. в 11:47, Thomas Munro <thomas.munro@enterprisedb.com>:
On Mon, Jul 23, 2018 at 7:57 PM, Victor Yegorov <vyegorov@gmail.com> wrote:
> - `ERROR:  canceling statement due to conflict with recovery`, happened
> right when our problematic query started, same user

Ok, so that would explain how the master was cancelled.  In 2877's
stack we see that it was aborting here:

Right:

ERROR:  canceling statement due to conflict with recovery
DETAIL:  User was holding shared buffer pin for too long.
 

#11 0x00007f539697ba5e in PostgresMain (argc=1,
argv=argv@entry=0x7f5398d1bbc8, dbname=0x7f5398d1bb98 "coub",
username=0x7f5398d1bbb0 "app") at
/build/postgresql-10-U6N320/postgresql-10-10.4/build/../src/backend/tcop/postgres.c:3879

That line calls AbortCurrentTransaction(), just after the call to
EmitErrorReport() that wrote something in your log.  Andres's theory
(interrupts 'held') seems promising... perhaps there could be a bug
where parallel index scans leak a share-locked page or something like
that.  I tried to reproduce this a bit, but no cigar so far.  I wonder
if there could be something about your bloated index that reaches
buggy behaviour...

If you happen to have a core file for a worker that is waiting in
ConditionVariableSleep(), or it happens again, you'd be able to see if
an LWLock is causing this by printing num_held_lwlocks.

No, we do not have core files around. And so far I was not able to reproduce this situation.
I will keep monitoring. In case I'll hit it again — what else (except for num_held_lwlocks)
should I check for?

--
Victor Yegorov

Re: BUG #15290: Stuck Parallel Index Scan query

From
Thomas Munro
Date:
Ok, I see it:

                        /* check for interrupts while we're not
holding any buffer lock */
                        CHECK_FOR_INTERRUPTS();
                        /* step right one page */
                        so->currPos.buf = _bt_getbuf(rel, blkno, BT_READ);
                        ...
                        /* nope, keep going */
                        if (scan->parallel_scan != NULL)
                        {
                                status = _bt_parallel_seize(scan, &blkno);

That leads to a condition variable wait, while we still hold that
buffer lock.  That prevents interrupts.  Oops.

-- 
Thomas Munro
http://www.enterprisedb.com


Re: BUG #15290: Stuck Parallel Index Scan query

From
Andres Freund
Date:
On 2018-07-25 14:04:11 +1200, Thomas Munro wrote:
> Ok, I see it:
> 
>                         /* check for interrupts while we're not
> holding any buffer lock */
>                         CHECK_FOR_INTERRUPTS();
>                         /* step right one page */
>                         so->currPos.buf = _bt_getbuf(rel, blkno, BT_READ);
>                         ...
>                         /* nope, keep going */
>                         if (scan->parallel_scan != NULL)
>                         {
>                                 status = _bt_parallel_seize(scan, &blkno);
> 
> That leads to a condition variable wait, while we still hold that
> buffer lock.  That prevents interrupts.  Oops.

Heh, guessed right.  I kinda wonder if we shouldn't add a
CHECK_FOR_INTERRUPTS_FOR_REALZ() that asserts if interrupts aren't
held. There are plenty places where we rely on that being the case, for
correctness.

Greetings,

Andres Freund


Re: BUG #15290: Stuck Parallel Index Scan query

From
Thomas Munro
Date:
On Wed, Jul 25, 2018 at 2:08 PM, Andres Freund <andres@anarazel.de> wrote:
> On 2018-07-25 14:04:11 +1200, Thomas Munro wrote:
>> Ok, I see it:
>>
>>                         /* check for interrupts while we're not
>> holding any buffer lock */
>>                         CHECK_FOR_INTERRUPTS();
>>                         /* step right one page */
>>                         so->currPos.buf = _bt_getbuf(rel, blkno, BT_READ);
>>                         ...
>>                         /* nope, keep going */
>>                         if (scan->parallel_scan != NULL)
>>                         {
>>                                 status = _bt_parallel_seize(scan, &blkno);
>>
>> That leads to a condition variable wait, while we still hold that
>> buffer lock.  That prevents interrupts.  Oops.
>
> Heh, guessed right.  I kinda wonder if we shouldn't add a
> CHECK_FOR_INTERRUPTS_FOR_REALZ() that asserts if interrupts aren't
> held. There are plenty places where we rely on that being the case, for
> correctness.

Yeah, I was wondering something similar: perhaps WaitEventSetWait()
should assert that interrupts are not held, unless you explicitly told
it somehow that it's OK?  You couldn't do it unconditionally, because
we sometimes do that on purpose:

    HOLD_INTERRUPTS();
    WaitForParallelWorkersToExit(pcxt);
    RESUME_INTERRUPTS();

Here's a reproducer (adjust timeout to suit your machine):

drop table if exists t;
create table t as select generate_series(1, 1000000)::int i;
create index on t(i);
alter table t set (autovacuum_enabled = false);
delete from t; -- 100% bloat please

set max_parallel_workers_per_gather = 2;
set min_parallel_index_scan_size = 0;
set enable_seqscan = false;
set enable_bitmapscan = false;
set parallel_tuple_cost = 0;
set parallel_setup_cost = 0;

set statement_timeout = '100ms'; -- enough to fork, not enough to complete

explain analyze select count(*) from t;

-- 
Thomas Munro
http://www.enterprisedb.com


Re: BUG #15290: Stuck Parallel Index Scan query

From
Amit Kapila
Date:
On Wed, Jul 25, 2018 at 8:29 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> On Wed, Jul 25, 2018 at 2:08 PM, Andres Freund <andres@anarazel.de> wrote:
>> On 2018-07-25 14:04:11 +1200, Thomas Munro wrote:
>>> Ok, I see it:
>>>
>>>                         /* check for interrupts while we're not
>>> holding any buffer lock */
>>>                         CHECK_FOR_INTERRUPTS();
>>>                         /* step right one page */
>>>                         so->currPos.buf = _bt_getbuf(rel, blkno, BT_READ);
>>>                         ...
>>>                         /* nope, keep going */
>>>                         if (scan->parallel_scan != NULL)
>>>                         {
>>>                                 status = _bt_parallel_seize(scan, &blkno);
>>>
>>> That leads to a condition variable wait, while we still hold that
>>> buffer lock.  That prevents interrupts.  Oops.
>>

Well spotted.  I think here we can release the current page lock
before calling _bt_parallel_seize as we don't need it to get the next
page.  See the backward scan case, in particular, I am referring to
the below code:

_bt_readnextpage()
{
..
* For parallel scans, get the last page scanned as it is quite
* possible that by the time we try to seize the scan, some other
* worker has already advanced the scan to a different page.  We
* must continue based on the latest page scanned by any worker.
*/
if (scan->parallel_scan != NULL)
{
_bt_relbuf(rel, so->currPos.buf);
status = _bt_parallel_seize(scan, &blkno);
..
}

This needs some more analysis.  I will continue the analysis and
shared findings.

Thanks, Thomas for pinging me offlist and including me here.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: BUG #15290: Stuck Parallel Index Scan query

From
Amit Kapila
Date:
On Wed, Jul 25, 2018 at 8:43 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Wed, Jul 25, 2018 at 8:29 AM, Thomas Munro
> <thomas.munro@enterprisedb.com> wrote:
>> On Wed, Jul 25, 2018 at 2:08 PM, Andres Freund <andres@anarazel.de> wrote:
>>> On 2018-07-25 14:04:11 +1200, Thomas Munro wrote:
>>>> Ok, I see it:
>>>>
>>>>                         /* check for interrupts while we're not
>>>> holding any buffer lock */
>>>>                         CHECK_FOR_INTERRUPTS();
>>>>                         /* step right one page */
>>>>                         so->currPos.buf = _bt_getbuf(rel, blkno, BT_READ);
>>>>                         ...
>>>>                         /* nope, keep going */
>>>>                         if (scan->parallel_scan != NULL)
>>>>                         {
>>>>                                 status = _bt_parallel_seize(scan, &blkno);
>>>>
>>>> That leads to a condition variable wait, while we still hold that
>>>> buffer lock.  That prevents interrupts.  Oops.
>>>
>
> Well spotted.  I think here we can release the current page lock
> before calling _bt_parallel_seize as we don't need it to get the next
> page.
>

I have written a patch on the above lines and manually verified (by
reproducing the issue via debugger) that it fixes the issue.  Thomas,
Victor, is it possible for you guys to see if the attached fixes the
issue for you?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: BUG #15290: Stuck Parallel Index Scan query

From
Thomas Munro
Date:
On Wed, Jul 25, 2018 at 5:46 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> I think here we can release the current page lock
>> before calling _bt_parallel_seize as we don't need it to get the next
>> page.
>
> I have written a patch on the above lines and manually verified (by
> reproducing the issue via debugger) that it fixes the issue.  Thomas,
> Victor, is it possible for you guys to see if the attached fixes the
> issue for you?

The patch looks good to me.  It fixes the problem for my reproducer
(BTW that works better with an extra zero on the generate_series
number).

-- 
Thomas Munro
http://www.enterprisedb.com


Re: BUG #15290: Stuck Parallel Index Scan query

From
Victor Yegorov
Date:
ср, 25 июл. 2018 г. в 8:46, Amit Kapila <amit.kapila16@gmail.com>:
On Wed, Jul 25, 2018 at 8:43 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> Well spotted.  I think here we can release the current page lock
> before calling _bt_parallel_seize as we don't need it to get the next
> page.
>

I have written a patch on the above lines and manually verified (by
reproducing the issue via debugger) that it fixes the issue.  Thomas,
Victor, is it possible for you guys to see if the attached fixes the
issue for you?

First, I failed to reproduce Thomas' case on the VM with just 1 core.
But it is reproduced just fine on an 8-core machine, make sure to warm up the index first though.

Patch fixes the issue for me here.
 

--
Victor Yegorov

Re: BUG #15290: Stuck Parallel Index Scan query

From
Amit Kapila
Date:
On Thu, Jul 26, 2018 at 3:53 PM, Victor Yegorov <vyegorov@gmail.com> wrote:
> ср, 25 июл. 2018 г. в 8:46, Amit Kapila <amit.kapila16@gmail.com>:
>>
>> On Wed, Jul 25, 2018 at 8:43 AM, Amit Kapila <amit.kapila16@gmail.com>
>> wrote:
>> > Well spotted.  I think here we can release the current page lock
>> > before calling _bt_parallel_seize as we don't need it to get the next
>> > page.
>> >
>>
>> I have written a patch on the above lines and manually verified (by
>> reproducing the issue via debugger) that it fixes the issue.  Thomas,
>> Victor, is it possible for you guys to see if the attached fixes the
>> issue for you?
>
>
> First, I failed to reproduce Thomas' case on the VM with just 1 core.
> But it is reproduced just fine on an 8-core machine, make sure to warm up
> the index first though.
>
> Patch fixes the issue for me here.
>

Thanks, pushed the fix.  I have not included Thomas's test as it is
timing dependent and it seems risky to include such a test.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com