Tom Lane wrote:
> "Florian G. Pflug" <fgp@phlo.org> writes:
>>Now, I was thinking if TRUNCATE couldn't just let relfilenode in
>>pg_class point to a new datafile, and leave the old one in place.
>
> Until when? How would you synchronize the switchover?
Every snapshot would either contain the old, or the new version of
the corresponding pg_class tuple. The ones using the old version
couldn't possible be writer, only reader (TRUNCATE would still need
to acquire a lock that ensures that). New transactions started after
the commit of the truncate would see the new version, and use
the new datafile.
Read-Committed transactions started before the truncate commited would
be able to read the file, and block when trying to write it until the truncate
is committed. Upon commit, they'd need reread pg_class, and use the new
datafile. A serializable transaction would always read the old datafile,
and would generate a serialization error when trying to write to the table,
I'd believe.
The serializable case shows what I percieve to be the difference between
delete and truncate from my point of view. Delete deletes all records
visible to the deleting transaction, while truncate deletes all records. For
me at least, this fits perfectly with the serialization error, because
you cannot insert into a table that another transaction wants to clear completly.
>>VACUUM could then remove datafiles when it purges a record from pg_class.
>
> And how would you prevent VACUUM from doing so incorrectly? The fact
> that pg_class contains a dead record is not ordinarily an indication
> that the relfilenode mentioned in the record is not needed anymore.
But if a file is not referenced by relfilenode of any live tuple in pg_class,
(live meaning not to be vacuumed)
how can there be any transactions still accessing it? It wouldn't have known
about the existance of the file in the first place, I'd believe
Does it make more sense to you now?
Greetings, Florian Pflug