Thread: Parallel index creation does not properly cleanup after error

Parallel index creation does not properly cleanup after error

From

David Rowley

Date:

11 March 2018, 13:22:54

Hi,

I've just stumbled on a bug in the parallel reindexing code.

Example:

-- force parallel index creation
set parallel_tuple_cost = 0;
set parallel_setup_cost = 0;
set min_parallel_table_scan_size = '0MB';
set min_parallel_index_scan_size = '0kB';

-- example (from the regression tests)
CREATE TABLE mvtest_foo(a, b) AS VALUES(1, 10);
CREATE MATERIALIZED VIEW mvtest_mv AS SELECT * FROM mvtest_foo;
CREATE UNIQUE INDEX ON mvtest_mv(a);
INSERT INTO mvtest_foo SELECT * FROM mvtest_foo;
REFRESH MATERIALIZED VIEW mvtest_mv;
ERROR:  cannot modify reindex state during a parallel operation

Due to the failure during the index build, it appears that the
PG_TRY/PG_CATCH block in reindex_relation() causes the reindex_index()
to abort and jump out to the catch block. Here there's a call to
ResetReindexPending(), which complains as we're still left in parallel
mode from the aborted _bt_begin_parallel() call which has called
EnterParallelMode(), but not managed to make it all the way to
_bt_end_parallel() (called from btbuild()), where ExitParallelMode()
is normally called.

Subsequent attempts to refresh the materialized view result in an
Assert failure in list_member_oid()

REFRESH MATERIALIZED VIEW mvtest_mv;
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

I've not debugged that, but I assume it's because
pendingReindexedIndexes is left as a non-empty list but has had its
memory context obliterated due to the previous query having ended.

The comment in the following fragment is not well honored by the
ResetReindexPending() since it does not clear the list if there's an
error.

PG_CATCH();
{
    /* Make sure list gets cleared on error exit */
    ResetReindexPending();
    PG_RE_THROW();
}

static void
ResetReindexPending(void)
{
    if (IsInParallelMode())
        elog(ERROR, "cannot modify reindex state during a parallel operation");
    pendingReindexedIndexes = NIL;
}

A perhaps simple fix would be just to have ResetReindexPending() only
reset the list to NIL again and not try to raise any error.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Parallel index creation does not properly cleanup after error

From

Peter Geoghegan

Date:

11 March 2018, 22:41:40

On Sun, Mar 11, 2018 at 3:22 AM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
> Due to the failure during the index build, it appears that the
> PG_TRY/PG_CATCH block in reindex_relation() causes the reindex_index()
> to abort and jump out to the catch block. Here there's a call to
> ResetReindexPending(), which complains as we're still left in parallel
> mode from the aborted _bt_begin_parallel() call which has called
> EnterParallelMode(), but not managed to make it all the way to
> _bt_end_parallel() (called from btbuild()), where ExitParallelMode()
> is normally called.
>
> Subsequent attempts to refresh the materialized view result in an
> Assert failure in list_member_oid()

Thanks for the report.

> I've not debugged that, but I assume it's because
> pendingReindexedIndexes is left as a non-empty list but has had its
> memory context obliterated due to the previous query having ended.

It's not really related to memory lifetime, so much as a corruption of
the state that tracks reindexed indexes within a backend. This is of
course due to that "cannot modify reindex state during a parallel
operation" error you saw.

> The comment in the following fragment is not well honored by the
> ResetReindexPending() since it does not clear the list if there's an
> error.

> A perhaps simple fix would be just to have ResetReindexPending() only
> reset the list to NIL again and not try to raise any error.

I noticed a very similar bug in ResetReindexProcessing() just before
parallel CREATE INDEX was committed. The fix there was simply not
throwing a "can't happen" error. I agree that the same fix should be
used here. It's not worth enforcing !IsInParallelMode() in the reset
functions; just enforcing !IsInParallelMode() in the set functions is
sufficient. Attached patch does this.

-- 
Peter Geoghegan

Attachment

0001-Fix-corruption-of-backend-REINDEX-processing-state.patch

Re: Parallel index creation does not properly cleanup after error

From

David Rowley

Date:

14 March 2018, 15:16:47

On 12 March 2018 at 08:41, Peter Geoghegan <pg@bowt.ie> wrote:
> On Sun, Mar 11, 2018 at 3:22 AM, David Rowley
>> A perhaps simple fix would be just to have ResetReindexPending() only
>> reset the list to NIL again and not try to raise any error.
>
> I noticed a very similar bug in ResetReindexProcessing() just before
> parallel CREATE INDEX was committed. The fix there was simply not
> throwing a "can't happen" error. I agree that the same fix should be
> used here. It's not worth enforcing !IsInParallelMode() in the reset
> functions; just enforcing !IsInParallelMode() in the set functions is
> sufficient. Attached patch does this.

Thanks for putting the patch together. Although in regards to your
proposed commit message in the patch, I'd disagree with me having just
reported it. I also investigated it and suggested a fix, which happens
to be exactly the fix you've used in the patch.

I've added this to the open items list for PG11.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Parallel index creation does not properly cleanup after error

From

Peter Geoghegan

Date:

14 March 2018, 21:14:05

On Wed, Mar 14, 2018 at 5:16 AM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
> Thanks for putting the patch together. Although in regards to your
> proposed commit message in the patch, I'd disagree with me having just
> reported it. I also investigated it and suggested a fix, which happens
> to be exactly the fix you've used in the patch.

I didn't mean to detract from that, but I can see how my wording had
that effect. I apologize.


-- 
Peter Geoghegan

Re: Parallel index creation does not properly cleanup after error

From

Robert Haas

Date:

07 April 2018, 02:39:10

On Wed, Mar 14, 2018 at 2:14 PM, Peter Geoghegan <pg@bowt.ie> wrote:
> On Wed, Mar 14, 2018 at 5:16 AM, David Rowley
> <david.rowley@2ndquadrant.com> wrote:
>> Thanks for putting the patch together. Although in regards to your
>> proposed commit message in the patch, I'd disagree with me having just
>> reported it. I also investigated it and suggested a fix, which happens
>> to be exactly the fix you've used in the patch.
>
> I didn't mean to detract from that, but I can see how my wording had
> that effect. I apologize.

Committed.  Thanks to David for the report and analysis and to Peter
for the patch and study.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Parallel index creation does not properly cleanup after error

From

Peter Geoghegan

Date:

07 April 2018, 02:40:06

On Fri, Apr 6, 2018 at 4:39 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> Committed.  Thanks to David for the report and analysis and to Peter
> for the patch and study.

Thanks!

-- 
Peter Geoghegan

Re: Parallel index creation does not properly cleanup after error

From

David Rowley

Date:

07 April 2018, 02:41:26

On 7 April 2018 at 11:39, Robert Haas <robertmhaas@gmail.com> wrote:
> Committed.  Thanks to David for the report and analysis and to Peter
> for the patch and study.

Thanks for pushing!


-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services