Thread: Re: Re: Why frequently updated tables are an issue

Re: Re: Why frequently updated tables are an issue

From
Date:
I'm thinking that this subject has some cross-overs with another topic
mentioned recently: improving VACUUM performance.

Just to recap: Frequently updated tables are a problem because they grow
significantly, with many useless row versions in there. The usual
solution is to run frequent VACUUMs on that table, though even that
isn't enough for some update/change rates because we need to scan the
whole table rather than just the hot blocks.

The ideas thrown around earlier this month were to create a "Cleaner"
process that performed individual block-level VACUUMs on dirty blocks
in shared_buffers, prior to their being bgwriten/checkpointed. (It
wasn't called that at the time, IIRC, but was described more in terms
of the CTID list that would be required to manage this).

Such a Cleaner process could solve the problem of frequently updated
tables by maintaining the number of row versions at a reasonably low
level. There wasn't much excitement when the "speed-up vacuum" idea was
discussed, including from myself. Thinking about the frequent-update
problem makes me want to revisit this idea now, since it has a much
clearer role in solving the frequent update problem. As I look at this
now it sounds very much like the freelist problem looked a while back,
with a similar-ish solution...

Frequent updates will happen on blocks within the T2 cache, since they
will be heavily accessed. The more frequently accessed, the more they
need cleaning. This is actually the opposite end of the cache from the
bgwriter, who is interested in writing blocks that aren't frequently
used - to avoid having to continually write the same blocks. As a
result, I'm thinking maybe we wouldn't want to have a queue to manage
the list of blocks that need both Cleaning and writing. Discuss...

Another way to do this is to find a way of identifying the specific
blocks that require Cleaning. We could use an event to fire-off the
cleaning action, such as when an update causes a block split. In that
case, the older block could be added to the Cleaner's cleanlist.
The Cleaner then works through the cleanlist, doing a block-level
vacuum.

That way the Cleaner would stay a reasonable number of row versions back
from the point of update to make a VACUUM potentially useful, as well
as being fired infrequently enough not to get in the way [since as
Alvaro has been pointed out, the indexes need to be visited in addition
to the main data block - which could introduce contention for index
locks because the frequent updater is probably using an index to reach
the data] Also, picking the block-split event would also mean stopping
the frequent-updater just at the point where significant extra costs
get incurred from all the additional row versions - when a row has
versions across many blocks.

I'm sure there are some faulted thoughts in all of that, but overall, I
think that a Cleaner process to perform block-level VACUUMs becomes
more viable when it has an important problem to solve.

Best Regards, Simon Riggs