Re: [PATCH 03/16] Add a new syscache to fetch a pg_class entry via its relfilenode - Mailing list pgsql-hackers

From Andres Freund
Subject Re: [PATCH 03/16] Add a new syscache to fetch a pg_class entry via its relfilenode
Date
Msg-id 201206142251.10078.andres@2ndquadrant.com
Whole thread Raw
In response to Re: [PATCH 03/16] Add a new syscache to fetch a pg_class entry via its relfilenode  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: [PATCH 03/16] Add a new syscache to fetch a pg_class entry via its relfilenode
List pgsql-hackers
On Thursday, June 14, 2012 08:50:51 PM Robert Haas wrote:
> On Wed, Jun 13, 2012 at 7:28 AM, Andres Freund <andres@2ndquadrant.com> 
wrote:
> > This patch is problematic because formally indexes used by syscaches
> > needs to be unique, this one is not though because of 0/InvalidOids
> > entries for nailed/shared catalog entries. Those values aren't allowed
> > to be queried though.
> That's not the only reason it's not unique.  Take a look at
> GetRelFileNode().  We really only guarantee that <database OID,
> tablespace OID, relfilenode, backend-ID>, taken as a four-tuple, is
> unique.  You could have the same relfilenode in different tablespaces,
> or even within the same tablespace with different backend-IDs.  The
> latter might not matter for you because you're presumably disregarding
> temp tables, but the former probably does.  It's an uncommon scenario
> because we normally set relid = relfilenode, and of course relid is
> unique across the database, but the table gets rewritten then you end
> up with relid != relfilenode, and I don't think there's anything at
> that point that will prevent the new relfilenode from being chosen as
> some other relations relfilenode, as long as it's in a different
> tablespace.
> 
> I think the solution may be to create a specialized cache for this,
> rather than relying on the general syscache infrastructure.  You might
> look at, e.g., attoptcache.c for an example.  That would allow you to
> build a cache that is aware of things like the relmapper
> infrastructure, and the fact that temp tables are ignorable for your
> purposes.  But I think you will need to include at least the
> tablespace OID in the key along with the relfilenode to make it
> bullet-proof.
Yes, the tablespace OID should definitely be in there. Need to read up on the 
details of an own cache. Once more I didn't want to put in more work before 
discussing it here.

> I haven't read through the patch series far enough to know what this
> is being used for yet, but my fear is that you're using it to handle
> mapping a relflenode extracted from the WAL stream back to a relation
> OID.  The problem with that is that relfilenode assignments obey
> transaction semantics.  So, if someone begins a transaction, truncates
> a table, inserts a tuple, and commits, the heap_insert record is going
> to refer to a relfilenode that, according to the system catalogs,
> doesn't exist.  This is similar to one of the worries in my other
> email, so I won't belabor the point too much more here...
Well, yes. We *need* to do the mapping back from the relfilenode to a table. 
The idea is that by the fact that the receiving side, be it a full cluster or 
just a catalog one, has a fully synchronized catalog in which DDL gets applied 
correctly, inside the transaction just as in the sending side, it should never 
be wrong to do that mapping.
It probably is necessary to make the syscache lookup/infrastructure to use an 
MVCCish snapshot though. No idea how hard that would be yet. Might be a good 
argument for your argument of using a specialized cache.

Lets sidetrack this till we have a tender agreement on how to handle DDL ;). I 
am aware of the issues with rollbacks, truncate et al...

Thanks,

Andres

-- 
Andres Freund        http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: [RFC][PATCH] Logical Replication/BDR prototype and architecture
Next
From: Robert Haas
Date:
Subject: Re: [PATCH 03/16] Add a new syscache to fetch a pg_class entry via its relfilenode