Re: Hot Standby and VACUUM FULL - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: Hot Standby and VACUUM FULL |
Date | |
Msg-id | 16001.1265055639@sss.pgh.pa.us Whole thread Raw |
In response to | Re: Hot Standby and VACUUM FULL (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Hot Standby and VACUUM FULL
|
List | pgsql-hackers |
I wrote: > The design I sketched doesn't require such an assumption anyway. Once > the map file is written, the relocation is effective, commit or no. > As long as we restrict relocations to maintenance operations such as > VACUUM FULL, which have no transactionally significant results, this > doesn't seem like a problem. What we do need is that after a CLUSTER > or V.F., which is going to relocate not only the rel but its indexes, > the relocations of the rel and its indexes have to all "commit" > atomically. But saving up the transaction's map changes and applying > them in one write takes care of that. BTW, I noticed a couple of other issues that need to be dealt with to make that safe. During CLUSTER/V.F. we typically try to update the relation's relfilenode, relpages, reltuples, relfrozenxid (in setNewRelfilenode) as well as its toastrelid (in swap_relation_files). These are regular transactional updates to the pg_class tuple that will fail to commit if the outer transaction rolls back. However: * For a mapped relation, both the old and new relfilenode will be zero, so it doesn't matter. * Possibly losing the updates of relpages and reltuples is not critical. * For relfrozenxid, we can simply force the new and old values to be the same rather than hoping to advance the value, if we're dealing with a mapped relation. Or just let it be; I think that losing an advance of relfrozenxid wouldn't be critical either. * We can not change the toast rel OID of a shared catalog -- there's no way to propagate that into the other copies of pg_class. So we need to rejigger the logic for heap rewriting a little bit. Toast rel swapping has to be handled by swapping their relfilenodes not their OIDs. This is no big deal as far as cluster.c itself is concerned, but the tricky part is that when we write new toasted values into the new toast rel, the TOAST pointers going into the new heap have to be written with the original toast-table OID value not the one that the transient target toast rel has got. This is doable but it would uglify the TOAST API a bit I think. Another possibility is to treat the toast rel OID of a catalog as something that can be supplied by the map file. Not sure which way to jump. regards, tom lane
pgsql-hackers by date: