Thread: Temporary table retains old contents on update eventually causing slow temp file usage.

Hi,

It would seem that doing any changes on a temp table forces a copy of
the entire contents of the table to be retained in memory/disk. Is
this  happening due to MVCC?  Is there a way to change this behavior?
It could be very useful when you have really huge temp tables that
need to be updated a few times before they can be dropped.

Below is an example of the problem.  I'll create a temp table, insert
600 rows (just a bunch of urls, you can use anything really), then
update the table a few times without actually changing anything.  Of
course this test case really doesn't show the extent of the problem,
because its such a small amount of data involved.  When I have a temp
table of about 150 megs and do more then a few updates on it, it
forces postgresql to use the disk making things really slow.
Originally the entire temp table fit into RAM.

I tried using savepoints and releasing them to see if it would make
any difference and it did not, which isn't unexpected.  Could
pg_relation_size() be incorrect in this case?

Cheers,

Rusty
--
Rusty Conover
InfoGears Inc.
http://www.infogears.com



test=# begin;
BEGIN
test=# create temp table test_urls (u text);
CREATE TABLE
test=# insert into test_urls (u) select url from url limit 600;
INSERT 0 600
test=# select pg_relation_size('test_urls');
pg_relation_size
------------------
             73728
(1 row)

test=# update test_urls set u = u;
UPDATE 600
test=# select pg_relation_size('test_urls');
pg_relation_size
------------------
            147456
(1 row)

test=# update test_urls set u = u;
UPDATE 600
test=# select pg_relation_size('test_urls');
pg_relation_size
------------------
            212992
(1 row)

test=# update test_urls set u = u;
UPDATE 600
test=# select pg_relation_size('test_urls');
pg_relation_size
------------------
            286720
(1 row)

test=# update test_urls set u = u;
UPDATE 600
test=# select pg_relation_size('test_urls');
pg_relation_size
------------------
            352256
(1 row)

test=# update test_urls set u = u;
UPDATE 600
test=# select pg_relation_size('test_urls');
pg_relation_size
------------------
            425984
(1 row)



Re: Temporary table retains old contents on update eventually

From
Gavin Sherry
Date:
On Tue, 18 Jul 2006, Rusty Conover wrote:

> Hi,
>
> It would seem that doing any changes on a temp table forces a copy of
> the entire contents of the table to be retained in memory/disk. Is
> this  happening due to MVCC?  Is there a way to change this behavior?
> It could be very useful when you have really huge temp tables that
> need to be updated a few times before they can be dropped.

This is caused by our MVCC implementation. It cannot be easily changed. We
rely on MVCC for two things: concurrency and rolling back of aborted
commands. Without the latter, we couldn't support the following trivially:

template1=# create temp table bar (i int);
CREATE TABLE
template1=# begin;
BEGIN
template1=# insert into bar values(1);
INSERT 0 1
template1=# abort;
ROLLBACK
template1=# select * from bar;
 i
---
(0 rows)

It would be nice if we could special case temp tables because of the fact
that concurrency does not come into the equation but I cannot see it
happening without a generalised overwriting MVCC system.

The only alternative in the mean time is to vacuum your temporary table(s)
as part of your interaction with them.

Thanks,

Gavin

On Jul 18, 2006, at 6:22 AM, Gavin Sherry wrote:

> On Tue, 18 Jul 2006, Rusty Conover wrote:
>
>> Hi,
>>
>> It would seem that doing any changes on a temp table forces a copy of
>> the entire contents of the table to be retained in memory/disk. Is
>> this  happening due to MVCC?  Is there a way to change this behavior?
>> It could be very useful when you have really huge temp tables that
>> need to be updated a few times before they can be dropped.
>
> This is caused by our MVCC implementation. It cannot be easily
> changed. We
> rely on MVCC for two things: concurrency and rolling back of aborted
> commands. Without the latter, we couldn't support the following
> trivially:
>
> template1=# create temp table bar (i int);
> CREATE TABLE
> template1=# begin;
> BEGIN
> template1=# insert into bar values(1);
> INSERT 0 1
> template1=# abort;
> ROLLBACK
> template1=# select * from bar;
>  i
> ---
> (0 rows)
>
> It would be nice if we could special case temp tables because of
> the fact
> that concurrency does not come into the equation but I cannot see it
> happening without a generalised overwriting MVCC system.
>
> The only alternative in the mean time is to vacuum your temporary
> table(s)
> as part of your interaction with them.

I forgot to add in my original post that the temporary tables I'm
dealing with have the "on commit drop" flag, so really persisting
beyond the transaction isn't needed.   But I don't think that makes
any difference, because of savepoints' required functionality.

The problem with vacuuming is that you can't do it by default right
now inside of a transaction.

Reading vacuum.c though, it leaves the door open:

/*
  * We cannot run VACUUM inside a user transaction block; if we were
inside
  * a transaction, then our commit- and start-transaction-command calls
  * would not have the intended effect! Furthermore, the forced
commit that
  * occurs before truncating the relation's file would have the
effect of
  * committing the rest of the user's transaction too, which would
  * certainly not be the desired behavior.  (This only applies to VACUUM
  * FULL, though.  We could in theory run lazy VACUUM inside a
transaction
  * block, but we choose to disallow that case because we'd rather
commit
  * as soon as possible after finishing the vacuum.      This is
mainly so that
  * we can let go the AccessExclusiveLock that we may be holding.)
  *
  * ANALYZE (without VACUUM) can run either way.
  */

Since we're dealing with a temporary table we shouldn't have any
problems with the AccessExclusiveLock. Would lazy vacuuming mark the
pages as free? I assume it wouldn't release them or shrink the size
of the relation, but could they be reused on later updates in that
same transaction?

Cheers,

Rusty
--
Rusty Conover
InfoGears Inc.
Web: http://www.infogears.com