On Wed, Jul 12, 2023 at 1:56 PM Michael Paquier <michael@paquier.xyz> wrote: > > On Wed, Jul 12, 2023 at 11:38:05AM +0530, Dilip Kumar wrote: > > On Wed, Jul 12, 2023 at 11:12 AM Michael Paquier <michael@paquier.xyz> wrote: > >> > >> On Wed, Jul 12, 2023 at 09:38:41AM +0900, Michael Paquier wrote: > >> > While working recently on what has led to cfc43ae and fc55c7f, I > >> > really got the feeling that there could be some command sequences that > >> > lacked some CCIs (or CommandCounterIncrement calls) to make sure that > >> > the catalog updates are visible in any follow-up steps in the same > >> > transaction. > >> > >> Wait a minute. The validation of a partitioned index uses a copy of > >> the pg_index tuple from the relcache, which be out of date: > >> newtup = heap_copytuple(partedIdx->rd_indextuple); > >> ((Form_pg_index) GETSTRUCT(newtup))->indisvalid = true; > > > > But why the recache entry is outdated, does that mean that in the > > previous command, we missed registering the invalidation for the > > recache entry? > > Yes, something's still a bit off here, even if switching a partitioned > index to become valid should use a fresh tuple copy from the syscache. > > Taking the test case of upthread, from what I can see, the ALTER TABLE > .. REPLICA IDENTITY registers two relcache invalidations for pk_foo > (via RegisterRelcacheInvalidation), which is the relcache entry whose > stuff is messed up. I would have expected a refresh of the cache of > pk_foo to happen when doing the ALTER INDEX .. ATTACH PARTITION, but > for some reason it does not happen when running the whole in a > transaction block.
I think there is something to do with this code here[1], basically, we are in a transaction block so while processing the invalidation we have first cleared the entry for the pk_foo but then we have partially recreated it using 'RelationReloadIndexInfo', in this function we haven't build complete relation descriptor but marked 'relation->rd_isvalid' as true and due to that next relation_open in (ALTER INDEX .. ATTACH PARTITION) will reuse this half backed entry. I am still not sure what is the purpose of just reloading the index and marking the entry as valid which is not completely valid.
RelationClearRelation() { .. /* * Even non-system indexes should not be blown away if they are open and * have valid index support information. This avoids problems with active * use of the index support information. As with nailed indexes, we * re-read the pg_class row to handle possible physical relocation of the * index, and we check for pg_index updates too. */ if ((relation->rd_rel->relkind == RELKIND_INDEX || relation->rd_rel->relkind == RELKIND_PARTITIONED_INDEX) && relation->rd_refcnt > 0 && relation->rd_indexcxt != NULL) { if (IsTransactionState()) RelationReloadIndexInfo(relation); return; }
I reviewed the function RelationReloadIndexInfo() and observed that the 'indisreplident' field and the SelfItemPointer 't_self' are not refreshed to the pg_index tuple of the index.