Andres Freund <andres@anarazel.de> writes:
> On 2019-04-25 14:50:09 -0400, Tom Lane wrote:
>> As far as I can see, there is no API restriction on what parts of the
>> relcache entry it may presume are valid. It *certainly* thinks that
>> rd_rel is valid, which is rather at odds with the fact that this has
>> to be called before the pg_class entry exists all (for the creation
>> case) or has been updated (for the set-new-relfilenode case).
> Well, that's just what we did before. And given the way the code is
> structured, I am not sure I see a decent alternative that's not a
> disproportionate amount of work. I mean, heap.c's heap_create() and
> heap_create_with_catalog() basically work off the Relation after the
> RelationBuildLocalRelation() call, a good bit before the underlying
> storage is valid.
You could imagine restructuring that ... but I agree it'd be a lot
of work.
> Currently the only thing that table_relation_set_new_filenode() accesses
> that already is updated is the RelFileNode. I wonder if we shouldn't
> change the API so that table_relation_set_new_filenode() will get a
> relcache entry *without* any updates passed in, then internally does
> GetNewRelFileNode() (if so desired by the AM), and returns the new rnode
> via a new out parameter.
That could work. The important API spec is then that the relcache entry
reflects the *previous* state of the relation, and is not to be modified
by the tableam call. After the call, we perform the pg_class update and
do CCI, and the relcache inval seen by the CCI causes the relcache entry
to be brought into sync with the new reality. So relcache entries change
at CCI boundaries, not in between.
In the creation case, it works more or less the same, with the
understanding that the "previous state" is some possibly-partly-dummy
state set up by RelationBuildLocalRelation. But we might need to add
a CCI call that wasn't there before; not sure.
> We don't currently allow that, but as far as I can see the current
> coding of ATExecSetTableSpace() also has bad problems with system
> catalog updates. It copies the data and *then* does
> CatalogTupleUpdate(), but *witout* updating the reclache - which ijust
> would cause the update to be lost.
Well, I imagine it's expecting the subsequent CCI to update the relcache
entry, which I think is correct behavior in this worldview. We're
basically trying to make the relcache state follow transaction/command
boundary semantics.
> I could come up with a patch for that if you want me to.
I'm happy to let you take a whack at it if you want.
regards, tom lane