Re: vacuum, performance, and MVCC - Mailing list pgsql-hackers

From Tom Lane
Subject Re: vacuum, performance, and MVCC
Date
Msg-id 16709.1151082529@sss.pgh.pa.us
Whole thread Raw
In response to Re: vacuum, performance, and MVCC  (Csaba Nagy <nagy@ecircle-ag.com>)
Responses Re: vacuum, performance, and MVCC  ("Mark Woodward" <pgsql@mohawksoft.com>)
Re: vacuum, performance, and MVCC  (Bruce Momjian <bruce@momjian.us>)
Re: vacuum, performance, and MVCC  (Hannu Krosing <hannu@skype.net>)
List pgsql-hackers
Csaba Nagy <nagy@ecircle-ag.com> writes:
>> Surprisingly its mostly WAL traffic, the heap/index pages themselves are
>> often not yet synced to disk by time of vacuum, so no additional traffic
>> there. If you had made 5 updates per page and then vacuum it, then you
>> make effectively 1 extra WAL write meaning 20% increase in WAL traffic. 

> Is this also holding about read traffic ? I thought vacuum will make a
> full table scan... for big tables a full table scan is always badly
> influencing the performance of the box. If the full table scan would be
> avoided, then I wouldn't mind running vacuum in a loop... 

If you're doing heavy updates of a big table then it's likely to end up
visiting most of the table anyway, no?  There is talk of keeping a map
of dirty pages, but I think it'd be a win for infrequently-updated
tables, not ones that need constant vacuuming.

I think a lot of our problems in this area could be solved with fairly
straightforward tuning efforts on the existing autovacuum
infrastructure.  In particular, someone should be looking into
recommendable default vacuum-cost-delay settings so that a background
vacuum doesn't affect performance too much.  Another problem with the
current autovac infrastructure is that it doesn't respond very well to
the case where there are individual tables that need constant attention
as well as many that don't.  If you have N databases then you can visit
a particular table at most once every N*autovacuum_naptime seconds, and
*every* table in the entire cluster gets reconsidered at that same rate.
I'm not sure if we need the ability to have multiple autovac daemons
running at the same time, but we definitely could use something with a
more flexible table-visiting pattern.  Perhaps it would be enough to
look through the per-table stats for each database before selecting the
database to autovacuum in each cycle, instead of going by "least
recently autovacuumed".

Bottom line: there's still lots of low-hanging fruit.  Why are people
feeling that we need to abandon or massively complicate our basic
architecture to make progress?
        regards, tom lane


pgsql-hackers by date:

Previous
From: Gaetano Mendola
Date:
Subject: Re: checking on buildfarm member thrush
Next
From: Tzahi Fadida
Date:
Subject: Re: Planning without reason.