On Thu, Feb 09, 2012 at 11:11:16PM +0200, Marti Raudsepp wrote:
> I've always been a little wary of using the TRUNCATE command due to
> the warning in the documentation about it not being "MVCC-safe":
> queries may silently give wrong results and it's hard to tell when
> they are affected.
>
> That got me thinking, why can't we handle this like a standby server
> does -- if some query has data removed from underneath it, it aborts
> with a serialization failure.
>
> Does this solution sound like a good idea?
>
> The attached patch is a lame attempt at implementing this. I added a
> new pg_class.relvalidxmin attribute which tracks the Xid of the last
> TRUNCATE (maybe it should be called reltruncatexid?). Whenever
> starting a relation scan with a snapshot older than relvalidxmin, an
> error is thrown. This seems to work out well since TRUNCATE updates
> pg_class anyway, and anyone who sees the new relfilenode automatically
> knows when it was truncated.
I like the design you have chosen. It would find applications beyond
TRUNCATE, so your use of non-specific naming is sound. For example, older
snapshots will see an empty table "t" after "CREATE TABLE t AS SELECT 1"
commits; that's a comparable MVCC anomaly. Some of our non-MVCC-safe commands
should perhaps just become MVCC-safe, but there will always be use cases for
operations that shortcut MVCC. When one truly does want that, your proposal
for keeping behavior consistent makes plenty of sense.
> Should I also add another counter to pg_stat_database_conflicts?
> Currently this table is only used on standby servers.
> ERROR: canceling statement due to conflict with TRUNCATE TABLE on foo
> DETAIL: Rows visible to this transaction have been removed.
My initial reaction is not to portray this like a recovery conflict, since
several aspects distinguish it from all recovery conflict types.
(I have not read your actual patch.)
Thanks,
nm