Thread: Temporary table retains old contents on update eventually causing slow temp file usage.
Temporary table retains old contents on update eventually causing slow temp file usage.
From
Rusty Conover
Date:
Hi, It would seem that doing any changes on a temp table forces a copy of the entire contents of the table to be retained in memory/disk. Is this happening due to MVCC? Is there a way to change this behavior? It could be very useful when you have really huge temp tables that need to be updated a few times before they can be dropped. Below is an example of the problem. I'll create a temp table, insert 600 rows (just a bunch of urls, you can use anything really), then update the table a few times without actually changing anything. Of course this test case really doesn't show the extent of the problem, because its such a small amount of data involved. When I have a temp table of about 150 megs and do more then a few updates on it, it forces postgresql to use the disk making things really slow. Originally the entire temp table fit into RAM. I tried using savepoints and releasing them to see if it would make any difference and it did not, which isn't unexpected. Could pg_relation_size() be incorrect in this case? Cheers, Rusty -- Rusty Conover InfoGears Inc. http://www.infogears.com test=# begin; BEGIN test=# create temp table test_urls (u text); CREATE TABLE test=# insert into test_urls (u) select url from url limit 600; INSERT 0 600 test=# select pg_relation_size('test_urls'); pg_relation_size ------------------ 73728 (1 row) test=# update test_urls set u = u; UPDATE 600 test=# select pg_relation_size('test_urls'); pg_relation_size ------------------ 147456 (1 row) test=# update test_urls set u = u; UPDATE 600 test=# select pg_relation_size('test_urls'); pg_relation_size ------------------ 212992 (1 row) test=# update test_urls set u = u; UPDATE 600 test=# select pg_relation_size('test_urls'); pg_relation_size ------------------ 286720 (1 row) test=# update test_urls set u = u; UPDATE 600 test=# select pg_relation_size('test_urls'); pg_relation_size ------------------ 352256 (1 row) test=# update test_urls set u = u; UPDATE 600 test=# select pg_relation_size('test_urls'); pg_relation_size ------------------ 425984 (1 row)
On Tue, 18 Jul 2006, Rusty Conover wrote: > Hi, > > It would seem that doing any changes on a temp table forces a copy of > the entire contents of the table to be retained in memory/disk. Is > this happening due to MVCC? Is there a way to change this behavior? > It could be very useful when you have really huge temp tables that > need to be updated a few times before they can be dropped. This is caused by our MVCC implementation. It cannot be easily changed. We rely on MVCC for two things: concurrency and rolling back of aborted commands. Without the latter, we couldn't support the following trivially: template1=# create temp table bar (i int); CREATE TABLE template1=# begin; BEGIN template1=# insert into bar values(1); INSERT 0 1 template1=# abort; ROLLBACK template1=# select * from bar; i --- (0 rows) It would be nice if we could special case temp tables because of the fact that concurrency does not come into the equation but I cannot see it happening without a generalised overwriting MVCC system. The only alternative in the mean time is to vacuum your temporary table(s) as part of your interaction with them. Thanks, Gavin
Re: Temporary table retains old contents on update eventually causing slow temp file usage.
From
Rusty Conover
Date:
On Jul 18, 2006, at 6:22 AM, Gavin Sherry wrote: > On Tue, 18 Jul 2006, Rusty Conover wrote: > >> Hi, >> >> It would seem that doing any changes on a temp table forces a copy of >> the entire contents of the table to be retained in memory/disk. Is >> this happening due to MVCC? Is there a way to change this behavior? >> It could be very useful when you have really huge temp tables that >> need to be updated a few times before they can be dropped. > > This is caused by our MVCC implementation. It cannot be easily > changed. We > rely on MVCC for two things: concurrency and rolling back of aborted > commands. Without the latter, we couldn't support the following > trivially: > > template1=# create temp table bar (i int); > CREATE TABLE > template1=# begin; > BEGIN > template1=# insert into bar values(1); > INSERT 0 1 > template1=# abort; > ROLLBACK > template1=# select * from bar; > i > --- > (0 rows) > > It would be nice if we could special case temp tables because of > the fact > that concurrency does not come into the equation but I cannot see it > happening without a generalised overwriting MVCC system. > > The only alternative in the mean time is to vacuum your temporary > table(s) > as part of your interaction with them. I forgot to add in my original post that the temporary tables I'm dealing with have the "on commit drop" flag, so really persisting beyond the transaction isn't needed. But I don't think that makes any difference, because of savepoints' required functionality. The problem with vacuuming is that you can't do it by default right now inside of a transaction. Reading vacuum.c though, it leaves the door open: /* * We cannot run VACUUM inside a user transaction block; if we were inside * a transaction, then our commit- and start-transaction-command calls * would not have the intended effect! Furthermore, the forced commit that * occurs before truncating the relation's file would have the effect of * committing the rest of the user's transaction too, which would * certainly not be the desired behavior. (This only applies to VACUUM * FULL, though. We could in theory run lazy VACUUM inside a transaction * block, but we choose to disallow that case because we'd rather commit * as soon as possible after finishing the vacuum. This is mainly so that * we can let go the AccessExclusiveLock that we may be holding.) * * ANALYZE (without VACUUM) can run either way. */ Since we're dealing with a temporary table we shouldn't have any problems with the AccessExclusiveLock. Would lazy vacuuming mark the pages as free? I assume it wouldn't release them or shrink the size of the relation, but could they be reused on later updates in that same transaction? Cheers, Rusty -- Rusty Conover InfoGears Inc. Web: http://www.infogears.com