Thread: Same-page UPDATEs in bloated tables

Same-page UPDATEs in bloated tables

From
Ian Dowse
Date:
Hi,

I've been seeing an issue with 8.1.4 that seems to be caused by the
way UPDATE operations prefer to place the new row version in the
same page as the original row. The issue is specific to UPDATEs;
it does not occur when each UPDATE is replaced by a DELETE/INSERT
pair. The problem can prevent a temporarily bloated table from ever
returning to its normal size even though all rows are frequently
changing and regular vacuuming is taking place.

A simple way to demonstrate the issue is to insert 10001 rows into
an empty table and delete the first 10000 rows. Now, repeatedly
performing (lazy) vacuums and UPDATEs will never result in the table
size shrinking:

    x=# SELECT * FROM foo;
      id   | val
    -------+------
     10001 | foo2
    (1 row)

    x=# VACUUM ANALYSE foo;
    VACUUM
    x=# SELECT relpages, reltuples FROM pg_class WHERE relname = 'foo';
     relpages | reltuples
    ----------+-----------
          527 |         1
    (1 row)

    x=# UPDATE foo SET val = 'foo3' WHERE id = 10001;
    UPDATE 1
    x=# VACUUM ANALYSE foo;
    VACUUM
    x=# SELECT relpages, reltuples FROM pg_class WHERE relname = 'foo';
     relpages | reltuples
    ----------+-----------
          527 |         1
    (1 row)

However, using DELETE/INSERT instead causes the vacuum to immediately
shrink the table:

    x=# DELETE from foo WHERE id = 10001;
    DELETE 1
    x=# INSERT INTO foo(val) VALUES('foo4');
    INSERT 0 1
    x=# VACUUM ANALYSE foo;
    VACUUM
    x=# SELECT relpages, reltuples FROM pg_class WHERE relname = 'foo';
     relpages | reltuples
    ----------+-----------
        1 |         1
    (1 row)

(Note that the above is just intended as a simple example to
demonstrate the effect; the actual case where this behaviour was
observed involved a high-churn table that never emptied but where
every row was updated at least a few times per day and (lazy)
vacuuming was performed approx every 10 minutes. The table would
occasionally become bloated, e.g. while performing a slow backup
of the database, and it would never recover its original size if
the rows were updated with UPDATE operations, but would quickly
return to its original size if DELETE/INSERT pairs were used instead).

This behaviour seemed fairly surprising given that UPDATEs are
supposed to be quite similar to INSERT/DELETE pairs in PostgreSQL.
Would it be possible to come up with some logic (in heap_update()?)
that could avoid using the same page if the page offset is way
beyond the nominal size of the table? Maybe some appropriate
statistics could be recorded by vacuum/analyse?

Ian

Re: Same-page UPDATEs in bloated tables

From
Alvaro Herrera
Date:
Ian Dowse wrote:

> I've been seeing an issue with 8.1.4 that seems to be caused by the
> way UPDATE operations prefer to place the new row version in the
> same page as the original row. The issue is specific to UPDATEs;
> it does not occur when each UPDATE is replaced by a DELETE/INSERT
> pair. The problem can prevent a temporarily bloated table from ever
> returning to its normal size even though all rows are frequently
> changing and regular vacuuming is taking place.
>
> A simple way to demonstrate the issue is to insert 10001 rows into
> an empty table and delete the first 10000 rows. Now, repeatedly
> performing (lazy) vacuums and UPDATEs will never result in the table
> size shrinking:

Yeah.  This scenario is one of those for which "popular knowledge"
("common wisdom"? "Postgres folklore"?) tells you to do a VACUUM FULL or
equivalent (e.g. CLUSTER).

Using the same page for an updated tuple is generally a useful
optimization, so I don't think you have much hopes for having it
disabled.  The INSERT+DELETE equivalent doesn't have the opportunity to
use that optimization though, which is why it has to go to the FSM and
thus get a different page to do the INSERT on.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: Same-page UPDATEs in bloated tables

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Using the same page for an updated tuple is generally a useful
> optimization, so I don't think you have much hopes for having it
> disabled.

Especially not since there's no very reasonable way for anything as
low-level as heap_update to know that "the table is way beyond its
nominal size".  What's nominal size anyway?

Actually, the recent thinking in this area has been to try to *increase*
the usage of same-page UPDATE, so as to prevent table bloat in the first
place ...

            regards, tom lane

Re: Same-page UPDATEs in bloated tables

From
Ian Dowse
Date:
In message <24849.1160940026@sss.pgh.pa.us>, Tom Lane writes:
>Alvaro Herrera <alvherre@commandprompt.com> writes:
>> Using the same page for an updated tuple is generally a useful
>> optimization, so I don't think you have much hopes for having it
>> disabled.
>
>Especially not since there's no very reasonable way for anything as
>low-level as heap_update to know that "the table is way beyond its
>nominal size".  What's nominal size anyway?

Thanks - yes, I don't know enough about PostgreSQL internals to
suggest a good approach, but thought there might be a small change
possible that would cause the table size to eventually recover
itself without manual intervention. For example even if the same-page
optimisation was only disabled on the very last page and for 1 in
10 updates then the table size would start to shrink. Even better
would be a way for this to happen more aggressively when the table
is very sparsely populated relative to the target fill factor.

Just to explain a bit more about the original access pattern, we
were only performing UPDATEs (no INSERT/DELETEs), so the reason
that the table gets bloated to begin with is that vacuuming is
ineffective during long transactions such as backups. In our
particular case, full vacuuming is not an acceptable option due to
the exclusive locking, so to handle the rare table bloat problems
we currently have to just switch all our UPDATEs to use slower
INSERT/DELETE operations instead.

>Actually, the recent thinking in this area has been to try to *increase*
>the usage of same-page UPDATE, so as to prevent table bloat in the first
>place ...

If you mean some kind of in-place update, then that would solve the
problem we were seeing too if it can prevent table bloat of high
UPDATE churn tables during long trasactions such as backups.

Thanks,

Ian