Tom Lane wrote:
> It isn't 100% MVCC, I agree. But it works because system catalog
> lookups are SnapshotNow, and so when another session comes and wants to
> look at the table it will see the committed new version of the pg_class
> row pointing at the new relfilenode file.
If by "works", you mean "provides correct transactional semantics", then
that simply isn't true. Not making CLUSTER and similar DDL commands MVCC
compliant isn't the end of the world, I agree, but that doesn't make it
correct, either.
> If you want to complain about MVCC violations in CLUSTER, think about
> the fact that it scans the table with SnapshotNow, and therefore loses
> rows that are committed-dead but might still be visible to somebody.
This seems like another facet of the same problem (a serializable
transaction's snapshot effectively includes the relfilenodes that were
visible when the snapshot was taken, and swapping in another relfilenode
under its nose is asking for trouble).
We could fix the CLUSTER bug, although not the TRUNCATE bug, by scanning
the old relation with SnapshotAny (or ideally, "the snapshot such that
we can see all tuples visible to any currently running transaction", if
we can produce such a snapshot easily). Not sure if that's worth doing;
it would be nice to solve the root problem (scanning system catalogs
with SnapshotNow, per discussion elsewhere in thread).
-Neil