Thread: non-transactional pg_class
Hi, I've been taking a look at what's needed for the non-transactional part of pg_class. If I've understood this correctly, we need a separate catalog, which I've dubbed pg_ntclass (better ideas welcome), and a new pointer in RelationData to hold a pointer to this new catalog for each relation. Also a new syscache needs to be created (say, NTRELOID). Must every relation have a tuple in this catalog? Currently it is useful only for RELATION, INDEX and TOASTVALUE relkinds, so maybe we can get away with not requiring it for other relkinds. On the other hand, must this new catalog be boostrapped? We could initially create RelationDescs with a NULL relation->rd_ntrel, and then get the tuple from the syscache when somebody tries to read the fields. I'm envisioning this new catalog have only reltuples and relpages for now. (I'll add relvacuumxid and relminxid on the relminxid patch, but they won't be there on the first pass.) Obviously the idea is that we would never heap_update tuples there; only heap_inplace_update (and heap_insert when a new relation is created.) So there would be three patches: 1. to replace all uses of relation->rd_rel->reltuples and ->relpages with macros RelationGetReltuples/Relpages. 2. to add the new catalog and syscache, and have the macros get the tuple from pg_ntclass when first requested. (Also, of course, mods to the functions that update pg_class.reltuples, etc, so that they also update pg_ntclass). 3. the relminxid patch Have I gotten it right? -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Alvaro Herrera <alvherre@commandprompt.com> writes: > I've been taking a look at what's needed for the non-transactional part > of pg_class. If I've understood this correctly, we need a separate > catalog, which I've dubbed pg_ntclass (better ideas welcome), and a new > pointer in RelationData to hold a pointer to this new catalog for each > relation. Also a new syscache needs to be created (say, NTRELOID). Do you really need both a relcache slot and a syscache? Seems redundant. For that matter, do you need either? Both the relcache and syscache operate on the assumption of transactional updates, so I think that you're going to have semantic problems using the caches to hold these tuples. For instance we don't broadcast any sinval update messages from a rolled-back transaction. > On the other hand, must this new catalog be boostrapped? If relation creation or row insertion is going to try to write into it, then yes. You could get away with not writing a row initially as long as the rows only hold reltuples/relpages, but I think that would stop working as soon as you put the "unfreeze" code in. > Obviously the idea is that we would never heap_update tuples there; only > heap_inplace_update (and heap_insert when a new relation is created.) Initial insertion (table CREATE) and deletion (table DROP) would both have to be transactional operations. This may be safe because we'd hold exclusive lock on the table and so no one else would be touching the table's row, but it bears thinking about, because after all the whole point of the exercise is to keep transactional and nontransactional updates separate. What happens if someone tries to do a manual UPDATE in this catalog? Maybe this can be in the category of "superusers should know enough not to do that", but I'd like to be clear on exactly what the consequences might be. Perhaps "nontransactional catalogs" should be a new relkind that we disallow normal updates on. If we do disallow normal updates (and VACUUM FULL too, probably) then it'd be possible to say that a given entry has a fixed TID for its entire lifespan. Then we could store the TID in the table's regular pg_class entry and dispense with any indexes. This would be advantageous if we end up concluding that we can't use the syscache mechanism (as I suspect that we can't), because we're going to be making quite a lot of fetches from this catalog. A direct fetch by TID would be a lot cheaper than an index search. regards, tom lane