Re: vacuum, performance, and MVCC - Mailing list pgsql-hackers

From Mark Woodward
Subject Re: vacuum, performance, and MVCC
Date
Msg-id 18121.24.91.171.78.1151091035.squirrel@mail.mohawksoft.com
Whole thread Raw
In response to Re: vacuum, performance, and MVCC  (Bruce Momjian <bruce@momjian.us>)
Responses Re: vacuum, performance, and MVCC  ("Jonah H. Harris" <jonah.harris@gmail.com>)
List pgsql-hackers
> Tom Lane wrote:
>> If you're doing heavy updates of a big table then it's likely to end up
>> visiting most of the table anyway, no?  There is talk of keeping a map
>> of dirty pages, but I think it'd be a win for infrequently-updated
>> tables, not ones that need constant vacuuming.
>>
>> I think a lot of our problems in this area could be solved with fairly
>> straightforward tuning efforts on the existing autovacuum
>> infrastructure.  In particular, someone should be looking into
>> recommendable default vacuum-cost-delay settings so that a background
>> vacuum doesn't affect performance too much.  Another problem with the
>> current autovac infrastructure is that it doesn't respond very well to
>> the case where there are individual tables that need constant attention
>> as well as many that don't.  If you have N databases then you can visit
>> a particular table at most once every N*autovacuum_naptime seconds, and
>> *every* table in the entire cluster gets reconsidered at that same rate.
>> I'm not sure if we need the ability to have multiple autovac daemons
>> running at the same time, but we definitely could use something with a
>> more flexible table-visiting pattern.  Perhaps it would be enough to
>> look through the per-table stats for each database before selecting the
>> database to autovacuum in each cycle, instead of going by "least
>> recently autovacuumed".
>>
>> Bottom line: there's still lots of low-hanging fruit.  Why are people
>> feeling that we need to abandon or massively complicate our basic
>> architecture to make progress?
>
> I think at some point we have to admit that _polling_ the tables, which
> is what autovacuum does, just isn't going to work well, no matter how
> much it is tweeked, and another approach should be considered for
> certain workload cases.

Thank you, that is *eactly* the anaology I have been unable to formulate.
It was on my mind but I could not put my finger on it.

Vacuum is findimentally inefficient as it does not know what has changed
and must go through an entirety to find the specific each time. Going
through the whole table each time is messed up and wasteful.

>
> At some point, the autovacuum approach starts to look like a car with
> fifty bumper stickers.  The first few were fine, but at some point, the
> tweeks (bumper stickers) start to overwhelm the car, and it is time to
> look for a new car.
>
> I think particularly for the UPDATE with no index key changes, a new
> approach must be considred.
>

I have been ranting about a first row strategy, one where the first
version of a row is the top of a linked list of versions.

(1) The indexes point to the first "key" row.
(2) When a row is updated, it is found in the various indexes, if the key
row currenlty exists in the index, no changes to the index are made. If it
is not found, the old version of the row is orphaned and behaves as
PostgreSQL always behaves.
(3) If the row is not orphaned, its "last version" reference is updated.

For the most part, this should only affect updates where the index entries
don't change. If the index value is always change, PostgreSQL will behave
as it currently does. If the index values do not change, updates will be
faster to do and won't impact queries.



pgsql-hackers by date:

Previous
From: "Jonah H. Harris"
Date:
Subject: Re: vacuum, performance, and MVCC
Next
From: "Jonah H. Harris"
Date:
Subject: Re: vacuum, performance, and MVCC