Thread: non-transactional pg_class

non-transactional pg_class

From

Alvaro Herrera

Date:

29 May 2006, 04:53:09

Hi,

I've been taking a look at what's needed for the non-transactional part
of pg_class.  If I've understood this correctly, we need a separate
catalog, which I've dubbed pg_ntclass (better ideas welcome), and a new
pointer in RelationData to hold a pointer to this new catalog for each
relation.  Also a new syscache needs to be created (say, NTRELOID).

Must every relation have a tuple in this catalog?  Currently it is
useful only for RELATION, INDEX and TOASTVALUE relkinds, so maybe we can
get away with not requiring it for other relkinds.

On the other hand, must this new catalog be boostrapped?  We could
initially create RelationDescs with a NULL relation->rd_ntrel, and then
get the tuple from the syscache when somebody tries to read the fields.

I'm envisioning this new catalog have only reltuples and relpages for
now.  (I'll add relvacuumxid and relminxid on the relminxid patch, but
they won't be there on the first pass.)

Obviously the idea is that we would never heap_update tuples there; only
heap_inplace_update (and heap_insert when a new relation is created.)

So there would be three patches:

1. to replace all uses of relation->rd_rel->reltuples and ->relpages
with macros RelationGetReltuples/Relpages.

2. to add the new catalog and syscache, and have the macros get the
tuple from pg_ntclass when first requested.  (Also, of course, mods to
the functions that update pg_class.reltuples, etc, so that they also
update pg_ntclass).

3. the relminxid patch

Have I gotten it right?

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: non-transactional pg_class

From

Tom Lane

Date:

29 May 2006, 14:47:48

Alvaro Herrera <alvherre@commandprompt.com> writes:
> I've been taking a look at what's needed for the non-transactional part
> of pg_class.  If I've understood this correctly, we need a separate
> catalog, which I've dubbed pg_ntclass (better ideas welcome), and a new
> pointer in RelationData to hold a pointer to this new catalog for each
> relation.  Also a new syscache needs to be created (say, NTRELOID).

Do you really need both a relcache slot and a syscache?  Seems
redundant.  For that matter, do you need either?  Both the relcache and
syscache operate on the assumption of transactional updates, so I think
that you're going to have semantic problems using the caches to hold
these tuples.  For instance we don't broadcast any sinval update
messages from a rolled-back transaction.

> On the other hand, must this new catalog be boostrapped?

If relation creation or row insertion is going to try to write into it,
then yes.  You could get away with not writing a row initially as long
as the rows only hold reltuples/relpages, but I think that would stop
working as soon as you put the "unfreeze" code in.

> Obviously the idea is that we would never heap_update tuples there; only
> heap_inplace_update (and heap_insert when a new relation is created.)

Initial insertion (table CREATE) and deletion (table DROP) would both
have to be transactional operations.  This may be safe because we'd hold
exclusive lock on the table and so no one else would be touching the
table's row, but it bears thinking about, because after all the whole
point of the exercise is to keep transactional and nontransactional
updates separate.

What happens if someone tries to do a manual UPDATE in this catalog?
Maybe this can be in the category of "superusers should know enough not
to do that", but I'd like to be clear on exactly what the consequences
might be.  Perhaps "nontransactional catalogs" should be a new relkind
that we disallow normal updates on.

If we do disallow normal updates (and VACUUM FULL too, probably) then
it'd be possible to say that a given entry has a fixed TID for its
entire lifespan.  Then we could store the TID in the table's regular
pg_class entry and dispense with any indexes.  This would be
advantageous if we end up concluding that we can't use the syscache
mechanism (as I suspect that we can't), because we're going to be making
quite a lot of fetches from this catalog.  A direct fetch by TID would
be a lot cheaper than an index search.
        regards, tom lane