I wrote:
> The problem here is that RelationSetNewRelfilenode is aggressively
> changing the index's relcache entry before it's written out the
> updated tuple, so that the tuple update tries to make an index
> entry in the new storage which isn't filled yet. I think we can
> fix it by *not* doing that, but leaving it to the relcache inval
> during the CommandCounterIncrement call to update the relcache
> entry. However, it looks like that will take some API refactoring,
> because the storage-creation functions expect to get the new
> relfilenode out of the relcache entry, and they'll have to be
> changed to not do it that way.
So looking at that, it seems like the table_relation_set_new_filenode
API is pretty darn ill-designed. It assumes that it's passed an
already-entirely-valid relcache entry, but it also supposes that
it can pass back information that needs to go into the relation's
pg_class entry. One or the other side of that has to give, unless
you want to doom everything to updating pg_class twice.
I'm not really sure what's the point of giving the tableam control
of relfrozenxid+relminmxid at all, and I notice that index_create
for one is just Asserting that constant values are returned.
I think we need to do one or possibly both of these things:
* split table_relation_set_new_filenode into two functions,
one that doesn't take a relcache entry at all and returns
appropriate relfrozenxid+relminmxid for a new rel, and then
one that just creates storage without dealing with the xid
values;
* change table_relation_set_new_filenode so that it is told
the relfilenode etc to use without assuming that it has a
valid relcache entry to work with.
Thoughts?
regards, tom lane