Thread: Is the "ACCESS EXCLUSIVE" lock for TRUNCATE really necessary?
Hi I know that TRUNCATE needs to acquire an ACCESS EXCLUSIVE lock, because it will remove the datafile on commit, and needs to ensure that noone will be using it anymore by then. For a lot of applications (at least mine) this is imposes problems. I'd like to use TRUNCATE in a few places, not only to "clean out" the table, but also to prevent others from inserting while I'm deleting everything (Think a cache table - if I clear the cache, I don't want concurrent transactions to be able to insert - but readers are not a problem, they still see the database in their "old state", so seeing the cache in the "old state" is correct). Now, I was thinking if TRUNCATE couldn't just let relfilenode in pg_class point to a new datafile, and leave the old one in place. TRUNCATE would still need a lock that prevents any write-access to the table, but it wouldn't need to lock-out readers too. VACUUM could then remove datafiles when it purges a record from pg_class. I'm asking mainly out of curiosity - I though about this for a while now, and couldn't come up with a reason why this wouldn't be possible. greetings, Florian Pflug
Tom Lane wrote: > "Florian G. Pflug" <fgp@phlo.org> writes: >>Now, I was thinking if TRUNCATE couldn't just let relfilenode in >>pg_class point to a new datafile, and leave the old one in place. > > Until when? How would you synchronize the switchover? Every snapshot would either contain the old, or the new version of the corresponding pg_class tuple. The ones using the old version couldn't possible be writer, only reader (TRUNCATE would still need to acquire a lock that ensures that). New transactions started after the commit of the truncate would see the new version, and use the new datafile. Read-Committed transactions started before the truncate commited would be able to read the file, and block when trying to write it until the truncate is committed. Upon commit, they'd need reread pg_class, and use the new datafile. A serializable transaction would always read the old datafile, and would generate a serialization error when trying to write to the table, I'd believe. The serializable case shows what I percieve to be the difference between delete and truncate from my point of view. Delete deletes all records visible to the deleting transaction, while truncate deletes all records. For me at least, this fits perfectly with the serialization error, because you cannot insert into a table that another transaction wants to clear completly. >>VACUUM could then remove datafiles when it purges a record from pg_class. > > And how would you prevent VACUUM from doing so incorrectly? The fact > that pg_class contains a dead record is not ordinarily an indication > that the relfilenode mentioned in the record is not needed anymore. But if a file is not referenced by relfilenode of any live tuple in pg_class, (live meaning not to be vacuumed) how can there be any transactions still accessing it? It wouldn't have known about the existance of the file in the first place, I'd believe Does it make more sense to you now? Greetings, Florian Pflug
"Florian G. Pflug" <fgp@phlo.org> writes: > Now, I was thinking if TRUNCATE couldn't just let relfilenode in > pg_class point to a new datafile, and leave the old one in place. Until when? How would you synchronize the switchover? > VACUUM could then remove datafiles when it purges a record from pg_class. And how would you prevent VACUUM from doing so incorrectly? The fact that pg_class contains a dead record is not ordinarily an indication that the relfilenode mentioned in the record is not needed anymore. regards, tom lane
"Florian G. Pflug" <fgp@phlo.org> writes: > Tom Lane wrote: >> Until when? How would you synchronize the switchover? > Every snapshot would either contain the old, or the new version of > the corresponding pg_class tuple. The ones using the old version > couldn't possible be writer, only reader (TRUNCATE would still need > to acquire a lock that ensures that). New transactions started after > the commit of the truncate would see the new version, and use > the new datafile. Wrong. *All* transactions read the system catalogs with SnapshotNow. There's been some discussion of using MVCC semantics on catalog accesses, but we are very far from being ready to do that; the implications will reach into every part of the backend. To give just one example of why this isn't necessarily a hot idea, suppose that transaction A adds a constraint to a table T and commits, and then transaction B (which started before A and hasn't previously touched T) comes along and inserts a tuple into T that violates the constraint. If B uses an old snapshot for looking up T's catalog info then it will not see the constraint that it must satisfy. Locking does not help since A released its locks on T before B would have tried to take any lock. regards, tom lane
Tom Lane wrote: > "Florian G. Pflug" <fgp@phlo.org> writes: >>Tom Lane wrote: >> >>>Until when? How would you synchronize the switchover? > >>Every snapshot would either contain the old, or the new version of >>the corresponding pg_class tuple. The ones using the old version >>couldn't possible be writer, only reader (TRUNCATE would still need >>to acquire a lock that ensures that). New transactions started after >>the commit of the truncate would see the new version, and use >>the new datafile. > > Wrong. *All* transactions read the system catalogs with SnapshotNow. Ah, well that clearly kills my idea... Too bad... I was fooled by the fact that most ddl-statements can be rolled back, and assumed that this follows from using "normal" mvcc semantics when reading the catalog tables. Thanks for your explanations! greetings, Florian Pflug