Thread: v12.0: interrupt reindex CONCURRENTLY: ccold: ERROR: could not findtuple for parent of relation ...
v12.0: interrupt reindex CONCURRENTLY: ccold: ERROR: could not findtuple for parent of relation ...
From
Justin Pryzby
Date:
On a badly-overloaded VM, we hit the previously-reported segfault in progress reporting. This left around some *ccold indices. I tried to drop them but: sentinel=# DROP INDEX child.alarms_null_alarm_id_idx1_ccold; -- child.alarms_null_alarm_time_idx_ccold; -- alarms_null_alarm_id_idx_ccold; ERROR: could not find tuple for parent of relation 41351896 Those are children of relkind=I index on relkind=p table. postgres=# CREATE TABLE t(i int)PARTITION BY RANGE(i); postgres=# CREATE TABLE t1 PARTITION OF t FOR VALUES FROM (1)TO(100); postgres=# INSERT INTO t1 SELECT 1 FROM generate_series(1,99999); postgres=# CREATE INDEX ON t(i); postgres=# begin; SELECT * FROM t; -- DO THIS IN ANOTHER SESSION postgres=# REINDEX INDEX CONCURRENTLY t1_i_idx; -- cancel this one ^CCancel request sent ERROR: canceling statement due to user request postgres=# \d t1 ... "t1_i_idx" btree (i) "t1_i_idx_ccold" btree (i) INVALID postgres=# SELECT inhrelid::regclass FROM pg_inherits WHERE inhparent='t_i_idx'::regclass; inhrelid t1_i_idx (1 row) Not only can't I DROP the _ccold indexes, but also dropping the table doesn't cause them to be dropped, and then I can't even slash dee them anymore: jtp=# DROP INDEX t1_i_idx_ccold; ERROR: could not find tuple for parent of relation 290818869 jtp=# DROP TABLE t; -- does not fail, but .. jtp=# \d t1_i_idx_ccold ERROR: cache lookup failed for relation 290818865 jtp=# SELECT indrelid::regclass, * FROM pg_index WHERE indexrelid='t1_i_idx_ccold'::regclass; indrelid | 290818865 indexrelid | 290818869 indrelid | 290818865 [...] Justin
Re: v12.0: interrupt reindex CONCURRENTLY: ccold: ERROR: could notfind tuple for parent of relation ...
From
Michael Paquier
Date:
On Tue, Oct 15, 2019 at 11:40:47AM -0500, Justin Pryzby wrote: > Not only can't I DROP the _ccold indexes, but also dropping the table doesn't > cause them to be dropped, and then I can't even slash dee them anymore: Yes, I can confirm the report. In the case of this scenario the reindex is waiting for the first transaction to finish before step 5, the cancellation causing the follow-up process to not be done (set_dead & the next ones). So at this stage the swap has actually happened. I am still analyzing the report in depths, but you don't have any problems with a plain index when interrupting at this stage, and the old index can be cleanly dropped with the new one present, so my first thoughts are that we are just missing some more dependency cleanup at the swap phase when dealing with a partition index. -- Michael
Attachment
Re: v12.0: interrupt reindex CONCURRENTLY: ccold: ERROR: could notfind tuple for parent of relation ...
From
Michael Paquier
Date:
On Thu, Oct 24, 2019 at 01:59:29PM +0900, Michael Paquier wrote: > Yes, I can confirm the report. In the case of this scenario the > reindex is waiting for the first transaction to finish before step 5, > the cancellation causing the follow-up process to not be done > (set_dead & the next ones). So at this stage the swap has actually > happened. I am still analyzing the report in depths, but you don't > have any problems with a plain index when interrupting at this stage, > and the old index can be cleanly dropped with the new one present, so > my first thoughts are that we are just missing some more dependency > cleanup at the swap phase when dealing with a partition index. Okay, I have found this one. The issue is that at the swap phase pg_class.relispartition of the new index is updated to use the value of the old index (true for a partition index), however relispartition needs to be updated as well for the old index or when trying to interact with it we get failures as the old index is part of no inheritance trees. We could use just use false as the index created concurrently is not attached to a partition with its inheritance links updated until the swap phase, but it feels more natural to just swap relispartition for the old and the new index, as per the attached. This brings also the point that you could just update pg_class to fix things if you have a broken cluster. In short, the attached fixes the issue for me, and that's the last bug I know of in what has been reported.. -- Michael
Attachment
Re: v12.0: interrupt reindex CONCURRENTLY: ccold: ERROR: could notfind tuple for parent of relation ...
From
Michael Paquier
Date:
On Mon, Oct 28, 2019 at 04:14:41PM +0900, Michael Paquier wrote: > This brings also the point that you could just update pg_class to fix > things if you have a broken cluster. > > In short, the attached fixes the issue for me, and that's the last bug > I know of in what has been reported.. This one is now done. Justin has also confirmed me offline that it fixed his problems. -- Michael