Re: locks in CREATE TRIGGER, ADD FK - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: locks in CREATE TRIGGER, ADD FK |
Date | |
Msg-id | 6934.1111593437@sss.pgh.pa.us Whole thread Raw |
In response to | Re: locks in CREATE TRIGGER, ADD FK (Neil Conway <neilc@samurai.com>) |
Responses |
Re: locks in CREATE TRIGGER, ADD FK
|
List | pgsql-hackers |
Neil Conway <neilc@samurai.com> writes: > Tom Lane wrote: >> It isn't 100% MVCC, I agree. But it works because system catalog >> lookups are SnapshotNow, and so when another session comes and wants to >> look at the table it will see the committed new version of the pg_class >> row pointing at the new relfilenode file. > If by "works", you mean "provides correct transactional semantics", then > that simply isn't true. Not making CLUSTER and similar DDL commands MVCC > compliant isn't the end of the world, I agree, but that doesn't make it > correct, either. I agree that we aren't MVCC with respect to DDL operations (and for this purpose CLUSTER is DDL). Trying to become so would open a can of worms far larger than it's worth, though, IMHO. Let me give you a couple of illustrations of why things are the way they are. Consider Transaction 1 Transaction 2 BEGIN; ... BEGIN; ... INSERT INTO a; CLUSTER a; ... Currently, because T2 continues to hold a write lock on A until it commits, T1's CLUSTER will block until T2 commits; therefore CLUSTER's use of SnapshotNow is sufficient to not lose any live tuples. Were T1 to use a transaction-aware snapshot to scan A, this would not be so. (SnapshotAny would work, but CLUSTER would have to become an order of magnitude harder than it is now, because it would have to preserve UPDATE chain link relationships ... compare VACUUM FULL's treatment of tuple chains.) Transaction 1 Transaction 2 BEGIN; ... CLUSTER a; INSERT INTO a; Were T1 using a transaction-aware snapshot to read pg_class, it would insert its new tuple into the wrong relfilenode for A, causing either immediate failure or eventual loss of a live tuple. Transaction 1 Transaction 2 BEGIN; ... CREATE TRIGGER ... ON INSERT TO a ... INSERT INTO a; Were T1 using a transaction-aware snapshot to read pg_trigger, it would fail to fire the already-committed insert trigger. (Variants of this include failing to insert index entries into a new index, failing to apply a new constraint, etc.) You don't have to assume that T1 is in Serializable mode, either. It might be using Read Committed, but the INSERT starts and sets its snapshot while T2 is still in progress; then of course blocks until T2 commits; then does the wrong thing because it is still paying attention to pre-T2 catalog entries. This is why LockRelation accepts SI inval messages immediately after each lock acquisition; it's to ensure that we do see the effects of T2. Transaction 1 Transaction 2 BEGIN; SELECT FROM a; ... CREATE TRIGGER ... ON INSERT TO a ... INSERT INTO a; Ordinarily, once T1 has touched relation A, it can be sure that A's schema will not change until end of transaction. The change you committed last night removes that guarantee, at least for the limited case of triggers, and makes the above interleaving possible. While I haven't come up with a clear failure case after a few minutes' thought, I'm not convinced that there isn't one. regards, tom lane
pgsql-hackers by date: