Re: BUG #15290: Stuck Parallel Index Scan query - Mailing list pgsql-bugs

From Victor Yegorov
Subject Re: BUG #15290: Stuck Parallel Index Scan query
Date
Msg-id CAGnEboh-xhhxoVvFE2hpkera4UZUgDcN2P+yncsZWiFWZ+88TQ@mail.gmail.com
Whole thread Raw
In response to Re: BUG #15290: Stuck Parallel Index Scan query  (Thomas Munro <thomas.munro@enterprisedb.com>)
Responses Re: BUG #15290: Stuck Parallel Index Scan query
List pgsql-bugs
пн, 23 июл. 2018 г. в 7:31, Thomas Munro <thomas.munro@enterprisedb.com>:
PID 2877 is the master process and has decided to abort and is waiting
for the workers to exit:

WaitLatch
WaitForBackgroundWorkerShutdown
WaitForParallelWorkersToExit
DestroyParallelContext
AtEOXact_Parallel
AbortTransaction
AbortCurrentTransaction
PostgresMain

PIDs 3416, 3417, 3418, 3419 meanwhile are waiting to seize the scan head:

WaitEventSetWaitBlock
ConditionVariableSleep
_bt_parallel_seize
_bt_readnextpage

Presumably 2877 has it (?), but aborted (do you have an error message
in the server log?), and the workers have somehow survived
TerminateBackgroundWorker() (called by DestroyParallelContext()).

Query was stuck for 8 hours when we tried to terminate it. Makes me think, that master process was
still waiting for bgworkers to finish, as test run finished in 11ms for me.
As I mentioned, we've got this case re-appear while I was preparing
the report (had to restart the DB second time). I think I might make it happen again, if necessary.

There is not so much in the logs:
- a bunch of `FATAL:  connection to client lost`, but from another (web) user (couple errors per hour)
- `ERROR:  canceling statement due to conflict with recovery`, happened right when our problematic query started, same user
- errors related to shutdown/startup of the DB.

 
--
Victor Yegorov

pgsql-bugs by date:

Previous
From: Andres Freund
Date:
Subject: Re: BUG #15290: Stuck Parallel Index Scan query
Next
From: suan tay
Date:
Subject: Re: BUG #15283: Query Result equal 0 for partitioned table