Neil Conway <neilc@samurai.com> writes:
> Tom Lane wrote:
> > It isn't 100% MVCC, I agree. But it works because system catalog
> > lookups are SnapshotNow, and so when another session comes and wants to
> > look at the table it will see the committed new version of the pg_class
> > row pointing at the new relfilenode file.
>
> If by "works", you mean "provides correct transactional semantics", then that
> simply isn't true. Not making CLUSTER and similar DDL commands MVCC compliant
> isn't the end of the world, I agree, but that doesn't make it correct, either.
I think he means it works because it doesn't matter whether the serializable
transaction sees the old table or the new one. As soon as the CLUSTER commits
the serializable transaction can start using the new one since it's
functionally identical to the old one (at least it's supposed to be, Tom
points out it isn't).
> > If you want to complain about MVCC violations in CLUSTER, think about
> > the fact that it scans the table with SnapshotNow, and therefore loses
> > rows that are committed-dead but might still be visible to somebody.
Ouch. That's, er, a problem. I guess currently it's fine for any transaction
using READ COMMITTED but it's already wrong for serializable transactions. And
it'll be wrong for READ COMMITTED if CLUSTER is changed not to take an
exclusive lock.
--
greg