Re: BUG #17268: Possible corruption in toast index after reindex index concurrently - Mailing list pgsql-bugs

From Michael Paquier
Subject Re: BUG #17268: Possible corruption in toast index after reindex index concurrently
Date
Msg-id YZI+aNEnnpBASxNU@paquier.xyz
Whole thread Raw
In response to Re: BUG #17268: Possible corruption in toast index after reindex index concurrently  (Michael Paquier <michael@paquier.xyz>)
Responses Re: BUG #17268: Possible corruption in toast index after reindex index concurrently
List pgsql-bugs
On Thu, Nov 11, 2021 at 06:09:49PM +0900, Michael Paquier wrote:
> To be clear on this point, users cannot reindex concurrently catalog
> indexes and toast indexes associated to catalog tables, just toast
> indexes of normal tables.  I don't know if any of you have been
> working on a patch, but I was cooking something.  It would be worth
> checking if an isolation test could be written.

So, I have worked on this one.  And attached is a patch that
implements the two approaches suggested by Andres which are able to
fix the issues discussed:
1) Switch to a session-level lock on the parent relation if doing a
reindex concurrently on a toast table or on one of its indexes.  This
requires to look back at pg_class.reltoastrelid to find the correct
parent.  This stresses me quite a bit, and I am not sure that I like
that to be honest because we don't do anything like that in the rest
of the tree.  I am also getting the feeling that this is an open door
for more issues.
2) Don't release locks when a new toast value is saved until the end
of its transaction.

After more testing, I have been able to extract and write an isolation
test that is able to reproduce the failure.  It relies on a trick as
the toast relation names are not deterministic, and we cannot use
REINDEX CONCURRENTLY in a function context.  So I have used an ALTER
TABLE/INDEX RENAME with a DO block to change the toast relation
names with allow_system_table_mods instead.  There is no need either
for amcheck with this method.

2) is enough to fix the problem, and I'd like to think that we had
better stick with only this method for simplicity's sake.

Comments?
--
Michael

Attachment

pgsql-bugs by date:

Previous
From: Dave Page
Date:
Subject: Re: Tenable Report Issue even after upgrading to correct Postgres version
Next
From: Tom Lane
Date:
Subject: Bogus NULL object_name from pg_event_trigger_dropped_objects()