Thread: Is the "ACCESS EXCLUSIVE" lock for TRUNCATE really necessary?

Is the "ACCESS EXCLUSIVE" lock for TRUNCATE really necessary?

From

"Florian G. Pflug"

Date:

06 March 2006, 20:54:06

Hi

I know that TRUNCATE needs to acquire an ACCESS EXCLUSIVE lock,
because it will remove the datafile on commit, and needs to ensure
that noone will be using it anymore by then. For a lot of applications
(at least mine) this is imposes problems. I'd like to use TRUNCATE
in a few places, not only to "clean out" the table, but also to prevent
others from inserting while I'm deleting everything (Think a cache table -
if I clear the cache, I don't want concurrent transactions to be able
to insert - but readers are not a problem, they still see the database
in their "old state", so seeing the cache in the "old state" is correct).

Now, I was thinking if TRUNCATE couldn't just let relfilenode in
pg_class point to a new datafile, and leave the old one in place.
TRUNCATE would still need a lock that prevents any write-access to
the table, but it wouldn't need to lock-out readers too.
VACUUM could then remove datafiles when it purges a record from pg_class.

I'm asking mainly out of curiosity - I though about this for a while now,
and couldn't come up with a reason why this wouldn't be possible.

greetings, Florian Pflug

Re: Is the "ACCESS EXCLUSIVE" lock for TRUNCATE really

From

"Florian G. Pflug"

Date:

06 March 2006, 21:52:47

Tom Lane wrote:
> "Florian G. Pflug" <fgp@phlo.org> writes:
>>Now, I was thinking if TRUNCATE couldn't just let relfilenode in
>>pg_class point to a new datafile, and leave the old one in place.
>
> Until when?  How would you synchronize the switchover?
Every snapshot would either contain the old, or the new version of
the corresponding pg_class tuple. The ones using the old version
couldn't possible be writer, only reader (TRUNCATE would still need
to acquire a lock that ensures that). New transactions started after
the commit of the truncate would see the new version, and use
the new datafile.

Read-Committed transactions started before the truncate commited would
be able to read the file, and block when trying to write it until the truncate
is committed. Upon commit, they'd need reread pg_class, and use the new
datafile. A serializable transaction would always read the old datafile,
and would generate a serialization error when trying to write to the table,
I'd believe.

The serializable case shows what I percieve to be the difference between
delete and truncate from my point of view. Delete deletes all records
visible to the deleting transaction, while truncate deletes all records. For
me at least, this fits perfectly with the serialization error, because
you cannot insert into a table that another transaction wants to clear completly.

>>VACUUM could then remove datafiles when it purges a record from pg_class.
>
> And how would you prevent VACUUM from doing so incorrectly?  The fact
> that pg_class contains a dead record is not ordinarily an indication
> that the relfilenode mentioned in the record is not needed anymore.
But if a file is not referenced by relfilenode of any live tuple in pg_class,
(live meaning not to be vacuumed)
how can there be any transactions still accessing it? It wouldn't have known
about the existance of the file in the first place, I'd believe

Does it make more sense to you now?
Greetings, Florian Pflug

Re: Is the "ACCESS EXCLUSIVE" lock for TRUNCATE really necessary?

From

Tom Lane

Date:

06 March 2006, 21:55:56

"Florian G. Pflug" <fgp@phlo.org> writes:
> Now, I was thinking if TRUNCATE couldn't just let relfilenode in
> pg_class point to a new datafile, and leave the old one in place.

Until when?  How would you synchronize the switchover?

> VACUUM could then remove datafiles when it purges a record from pg_class.

And how would you prevent VACUUM from doing so incorrectly?  The fact
that pg_class contains a dead record is not ordinarily an indication
that the relfilenode mentioned in the record is not needed anymore.

            regards, tom lane

Re: Is the "ACCESS EXCLUSIVE" lock for TRUNCATE really necessary?

From

Tom Lane

Date:

06 March 2006, 23:33:57

"Florian G. Pflug" <fgp@phlo.org> writes:
> Tom Lane wrote:
>> Until when?  How would you synchronize the switchover?

> Every snapshot would either contain the old, or the new version of
> the corresponding pg_class tuple. The ones using the old version
> couldn't possible be writer, only reader (TRUNCATE would still need
> to acquire a lock that ensures that). New transactions started after
> the commit of the truncate would see the new version, and use
> the new datafile.

Wrong.  *All* transactions read the system catalogs with SnapshotNow.

There's been some discussion of using MVCC semantics on catalog
accesses, but we are very far from being ready to do that; the
implications will reach into every part of the backend.  To give just
one example of why this isn't necessarily a hot idea, suppose that
transaction A adds a constraint to a table T and commits, and then
transaction B (which started before A and hasn't previously touched T)
comes along and inserts a tuple into T that violates the constraint.
If B uses an old snapshot for looking up T's catalog info then it will
not see the constraint that it must satisfy.  Locking does not help
since A released its locks on T before B would have tried to take any
lock.

            regards, tom lane

Re: Is the "ACCESS EXCLUSIVE" lock for TRUNCATE really

From

"Florian G. Pflug"

Date:

07 March 2006, 09:15:00

Tom Lane wrote:
> "Florian G. Pflug" <fgp@phlo.org> writes:
>>Tom Lane wrote:
>>
>>>Until when?  How would you synchronize the switchover?
>
>>Every snapshot would either contain the old, or the new version of
>>the corresponding pg_class tuple. The ones using the old version
>>couldn't possible be writer, only reader (TRUNCATE would still need
>>to acquire a lock that ensures that). New transactions started after
>>the commit of the truncate would see the new version, and use
>>the new datafile.
>
> Wrong.  *All* transactions read the system catalogs with SnapshotNow.
Ah, well that clearly kills my idea... Too bad...

I was fooled by the fact that most ddl-statements can be rolled back,
and assumed that this follows from using "normal" mvcc semantics when
reading the catalog tables.

Thanks for your explanations!

greetings, Florian Pflug