Thread: Yet another "drop table vs delete" question
I was wondering if dropping a table is more efficient in PostgreSQL 8.x in comparison to deleting it's content ? To put you in the picture - I want to schedule via cron job some daily data materialization. With the resource I got there are only those two ways: 1) DROP statement: a) DROP TABLE tmp; b) CREATE TABLE tmp AS select * from ....; 2) DELETE statement: a) DELETE FROM tmp; b) INSERT INTO tmp SELECT * from ....; tmp table doesn't have any index nor constraints. The thing is, postgresql may leave some invalid content behind in both situations. The real question is - which of those two options leaves less garbage to be vaccumed ? At this point I don't relay care about cost based efficiency but cutting down pg background work.
On Tuesday 21 April 2009 20:26:24 marek.patrzek@gmail.com wrote: > I was wondering if dropping a table is more efficient in PostgreSQL > 8.x in comparison to deleting it's content ? Yes, but if you are asking that question, you probably really want to use TRUNCATE.
On Apr 21, 2009, at 12:28 PM, Peter Eisentraut wrote: > Yes, but if you are asking that question, you probably really want > to use > TRUNCATE. The advantage of DROP TABLE being, of course, that DROP TABLE is transactionally-safe, while TRUNCATE is not.
marek.patrzek@gmail.com escribió: > I was wondering if dropping a table is more efficient in PostgreSQL > 8.x in comparison to deleting it's content ? "8.x" is a meaningless version number in Postgres. Major versions (with new features, etc) are labeled by the first two elements, so 8.0, 8.1, and so on. To actually answer your question, > The thing is, postgresql may leave some invalid content behind in both > situations. The real question is - which of those two options leaves > less garbage to be vaccumed ? At this point I don't relay care about > cost based efficiency but cutting down pg background work. Try TRUNCATE. That leaves the less garbage behind and takes the less time. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Christophe <xof@thebuild.com> writes: > On Apr 21, 2009, at 12:28 PM, Peter Eisentraut wrote: >> Yes, but if you are asking that question, you probably really want >> to use TRUNCATE. > The advantage of DROP TABLE being, of course, that DROP TABLE is > transactionally-safe, while TRUNCATE is not. The above is complete nonsense. They're both going to drop data that might conceivably be visible in the snapshot of some concurrent transaction that hasn't yet touched the table (else it would have lock) but might wish to do so later. If you use DELETE, you'll be transactionally safe, but the cost is concomitantly a lot higher than either DROP or TRUNCATE. regards, tom lane
On Apr 21, 2009, at 1:20 PM, Tom Lane wrote: > They're both going to drop data that > might conceivably be visible in the snapshot of some concurrent > transaction that hasn't yet touched the table (else it would have > lock) > but might wish to do so later. Unless I'm deeply misunderstanding something (always a possibility), DROP TABLE and TRUNCATE are not symmetrical in this regard. Once a transaction has issued a DROP TABLE, all other transactions that attempt to modify it are going to block on the first transaction's ACCESS EXCLUSIVE lock until it commits or aborts. In the case of TRUNCATE, the other transactions will see the table as being empty from the moment in the first transaction issues the TRUNCATE, and will see the rows reappear if the first transaction rolls back. Yes?
Christophe <xof@thebuild.com> writes: > On Apr 21, 2009, at 1:20 PM, Tom Lane wrote: >> They're both going to drop data that >> might conceivably be visible in the snapshot of some concurrent >> transaction that hasn't yet touched the table (else it would have >> lock) >> but might wish to do so later. > Unless I'm deeply misunderstanding something (always a possibility), > DROP TABLE and TRUNCATE are not symmetrical in this regard. Once a > transaction has issued a DROP TABLE, all other transactions that > attempt to modify it are going to block on the first transaction's > ACCESS EXCLUSIVE lock until it commits or aborts. In the case of > TRUNCATE, the other transactions will see the table as being empty > from the moment in the first transaction issues the TRUNCATE, and will > see the rows reappear if the first transaction rolls back. Yes? No. They both take exclusive lock. Oh, and they are both safe in the sense that the data is still there if the calling transaction fails or rolls back. Perhaps that was what you meant by "transactional safety"? I was thinking of MVCC semantics, which is a different issue. regards, tom lane
xof@thebuild.com (Christophe) writes: > On Apr 21, 2009, at 12:28 PM, Peter Eisentraut wrote: >> Yes, but if you are asking that question, you probably really want >> to use >> TRUNCATE. > > The advantage of DROP TABLE being, of course, that DROP TABLE is > transactionally-safe, while TRUNCATE is not. TRUNCATE mayn't have been transactionally safe back in PostgreSQL 8.0, but it is, now. http://www.postgresql.org/docs/8.3/static/sql-truncate.html -- let name="cbbrowne" and tld="acm.org" in name ^ "@" ^ tld;; http://linuxfinances.info/info/linuxdistributions.html Rules of the Evil Overlord #151. "I will not set myself up as a god. That perilous position is reserved for my trusted lieutenant." <http://www.eviloverlord.com/>
On Apr 21, 2009, at 1:36 PM, Tom Lane wrote: > I was thinking of MVCC semantics, which is a different issue. Indeed so, my error. This is a bit of a drift off-topic, but rereading the docs, I'm now having trouble visualizing the real-world effect of the non-MVCC-safeness of TRUNCATE. A transaction that queries the table before the TRUNCATE is going to pick up an ACCESS SHARED lock, which will prevent the TRUNCATE from running until it's released. The TRUNCATE will pick up an ACCESS EXCLUSIVE lock that will block any subsequent queries until the transaction doing the TRUNCATE commits. I'm sure there is a scenario under which a separate transaction could see non-MVCC behavior from TRUNCATE, but I'm having trouble see what it is.
On Tue, 2009-04-21 at 13:59 -0700, Christophe wrote: > I'm sure there is a scenario under which a separate > transaction could see non-MVCC behavior from TRUNCATE, but I'm > having > trouble see what it is. Session1: BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE; SELECT * FROM foo; Session2: BEGIN; TRUNCATE bar; COMMIT; Session1: SELECT * from bar; COMMIT; In Session1, the serializable transaction sees an empty version of bar, even though it had tuples in at the time Session1 got its serializable snapshot. If Session2 does a DROP TABLE instead of TRUNCATE, Session1 will get an error when it tries to read "bar". Regards, Jeff Davis
On Apr 21, 2009, at 2:15 PM, Jeff Davis wrote: > In Session1, the serializable transaction sees an empty version of > bar, > even though it had tuples in at the time Session1 got its serializable > snapshot. Indeed so, and I understand that part. But since Session1 didn't try to access 'bar', it can't distinguish that sequence from: Session2: BEGIN; TRUNCATE bar; COMMIT; Session1: BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE; SELECT * FROM foo; SELECT * from bar; COMMIT; I've been trying to come up with a scenario in which a TRUNCATE violates concurrency expectations; I'm sure one exists, but my brain isn't wrapping around it.
Jeff Davis <pgsql@j-davis.com> writes: > On Tue, 2009-04-21 at 13:59 -0700, Christophe wrote: >> I'm sure there is a scenario under which a separate >> transaction could see non-MVCC behavior from TRUNCATE, but I'm >> having trouble see what it is. > Session1: > BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE; > SELECT * FROM foo; > Session2: > BEGIN; > TRUNCATE bar; > COMMIT; > Session1: > SELECT * from bar; > COMMIT; > In Session1, the serializable transaction sees an empty version of bar, > even though it had tuples in at the time Session1 got its serializable > snapshot. Exactly. > If Session2 does a DROP TABLE instead of TRUNCATE, Session1 will get an > error when it tries to read "bar". Actually, the scenario that I suppose the OP had in mind was to drop and immediately recreate "bar" (probably in the same transaction). If you do that, then session 1 will actually see the new version of "bar" when it eventually gets around to examining the table --- this is because system catalog accesses always follow SnapshotNow rules. So there is really darn little difference between TRUNCATE and drop/recreate. The advantage of TRUNCATE is you don't have to run around and manually re-establish indexes, foreign keys, etc. It's probably also a tad faster because of less catalog churn. regards, tom lane
On 21 Kwi, 21:30, alvhe...@commandprompt.com (Alvaro Herrera) wrote: > marek.patr...@gmail.com escribió: > > > I was wondering if dropping a table is more efficient in PostgreSQL > > 8.x in comparison to deleting it's content ? > > "8.x" is a meaningless version number in Postgres. Major versions (with > new features, etc) are labeled by the first two elements, so 8.0, 8.1, > and so on. > > To actually answer your question, > > > The thing is, postgresql may leave some invalid content behind in both > > situations. The real question is - which of those two options leaves > > less garbage to be vaccumed ? At this point I don't relay care about > > cost based efficiency but cutting down pg background work. > > Try TRUNCATE. That leaves the less garbage behind and takes the less > time. > > -- > Alvaro Herrera http://www.CommandPrompt.com/ > The PostgreSQL Company - Command Prompt, Inc. > > -- > Sent via pgsql-general mailing list (pgsql-gene...@postgresql.org) > To make changes to your subscription:http://www.postgresql.org/mailpref/pgsql-general Thank you for all your help.
Christophe <xof@thebuild.com> writes: > On Apr 21, 2009, at 2:15 PM, Jeff Davis wrote: >> In Session1, the serializable transaction sees an empty version of >> bar, even though it had tuples in at the time Session1 got its serializable >> snapshot. > Indeed so, and I understand that part. But since Session1 didn't try > to access 'bar', it can't distinguish that sequence from: Well, it could if the application has expectations about consistency between the contents of 'foo' and 'bar'. I think we have enough interlocks to ensure that it's not possible for an explicit foreign-key constraint to be violated that way, but the application could have expectations that it's not formalized as an FK. In general the point of a serializable snapshot is to ensure that you see logically simultaneous contents of all the tables you look at, and that will definitely not be the case in this type of scenario. regards, tom lane
On Tue, 2009-04-21 at 14:30 -0700, Christophe wrote: > Indeed so, and I understand that part. But since Session1 didn't try > to access 'bar', it can't distinguish that sequence from: > > Session2: > BEGIN; > TRUNCATE bar; > COMMIT; > > Session1: > BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE; > SELECT * FROM foo; > SELECT * from bar; > COMMIT; Add something else into the mix, like if the transaction in Session2 updates "foo", and I think it will cause the MVCC violation you're looking for. Session0: INSERT INTO foo VALUES(1); INSERT INTO bar VALUES(2); Session1: BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE; SELECT * FROM foo; Session2: BEGIN; INSERT INTO foo VALUES(3); TRUNCATE bar; COMMIT; Session1: SELECT * from bar; COMMIT; Atomicity says that Session1 should either see 1 and 3 in foo, and nothing in bar (if it happens after Session2); or it should see 1 in foo and 2 in bar (if it happens first). So the rule that a SERIALIZABLE transaction should get one consistent snapshot for its duration is broken in this case. I don't think it's an issue if only using READ COMMITTED (but I've been wrong on similar issues in the past). Regards, Jeff Davis
Alvaro Herrera wrote: > Try TRUNCATE. That leaves the less garbage behind and takes the less > time. A follow up question, as far as I understand it, delete removes entries in the fsm, so vacuum has to clean it all up when performing a delete, is this approximately correct? what happens with truncate? does it remove everything so that vacuum has almost no work to do or is it approximately as much work either way? regards thomas
On Apr 22, 2009, at 8:04 AM, Thomas Finneid wrote: > Alvaro Herrera wrote: >> Try TRUNCATE. That leaves the less garbage behind and takes the less >> time. > > A follow up question, as far as I understand it, delete removes > entries in the fsm, so vacuum has to clean it all up when performing > a delete, is this approximately correct? what happens with truncate? > does it remove everything so that vacuum has almost no work to do or > is it approximately as much work either way? No, DELETE doesn't do anything with the FSM. It simply marks the tuple in the page as deleted (not visible) to transactions with transaction ids larger than the DELETE transaction's commit id (assuming it commits). So, once all transactions with transaction ids lower than that DELETE transaction's commit id it can be considered dead since nothing can see it anymore. VACUUM looks for those dead tuples and adds them to the FSM. INSERTs and UPDATEs (and COPYs) then look to the free space map for a dead tuple first when space is needed for a new tuple before allocating space in hopes of avoiding that space allocation by reusing the dead tuple's space. You're pretty much on with regards to TRUNCATE as it deletes the pages that are allocated to the table (I'm not sure if the TRUNCATE handles clearing out the FSM entries for that table or if VACUUM does when the table is next vacuum'd). Erik Jones, Database Administrator Engine Yard Support, Scalability, Reliability 866.518.9273 x 260 Location: US/Pacific IRC: mage2k