Re: vacuum, performance, and MVCC - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: vacuum, performance, and MVCC
Date
Msg-id 200606242336.k5ONa5b04286@momjian.us
Whole thread Raw
In response to Re: vacuum, performance, and MVCC  (Hannu Krosing <hannu@skype.net>)
Responses Re: vacuum, performance, and MVCC  (Hannu Krosing <hannu@skype.net>)
List pgsql-hackers
Hannu Krosing wrote:
> ?hel kenal p?eval, R, 2006-06-23 kell 13:08, kirjutas Tom Lane:
> > Csaba Nagy <nagy@ecircle-ag.com> writes:
> > >> Surprisingly its mostly WAL traffic, the heap/index pages themselves are
> > >> often not yet synced to disk by time of vacuum, so no additional traffic
> > >> there. If you had made 5 updates per page and then vacuum it, then you
> > >> make effectively 1 extra WAL write meaning 20% increase in WAL traffic. 
> > 
> > > Is this also holding about read traffic ? I thought vacuum will make a
> > > full table scan... for big tables a full table scan is always badly
> > > influencing the performance of the box. If the full table scan would be
> > > avoided, then I wouldn't mind running vacuum in a loop... 
> > 
> > If you're doing heavy updates of a big table then it's likely to end up
> > visiting most of the table anyway, no?  There is talk of keeping a map
> > of dirty pages, but I think it'd be a win for infrequently-updated
> > tables, not ones that need constant vacuuming.
> > 
> > I think a lot of our problems in this area could be solved with fairly
> > straightforward tuning efforts on the existing autovacuum
> > infrastructure.  In particular, someone should be looking into
> > recommendable default vacuum-cost-delay settings so that a background
> > vacuum doesn't affect performance too much.
> 
> One thing that would help updates quite a lot in some scenarios is
> keeping the pages only partially-filled, so that most updates could keep
> the new version in the same page. I think that has also been discussed
> as an option to vacuum and maybe as part of initial inserts. Maybe some
> of it even ended up as a todo item.

We have a patch in the queue for index fillfactor which will be in 8.2.
I am also hoping the frequently updated rows will migrate out to the
empty pages.

> > Another problem with the
> > current autovac infrastructure is that it doesn't respond very well to
> > the case where there are individual tables that need constant attention
> > as well as many that don't.  If you have N databases then you can visit
> > a particular table at most once every N*autovacuum_naptime seconds, and
> > *every* table in the entire cluster gets reconsidered at that same rate.
> > I'm not sure if we need the ability to have multiple autovac daemons
> > running at the same time, 
> 
> My patch enabling effective continuous vacuum of fast-update tables,
> while still being able to vacuum huge slowly changing ones is still not
> applied. Without that patch there is no reason to vacuum the small and
> fast changingg tables while vacuum on bigger tables is running, as it
> won't clean out dead tuples anyway.

I think it will be applied, but I am looking for someone else to eyeball
it since Tom has come concerns.

> > but we definitely could use something with a
> > more flexible table-visiting pattern.  Perhaps it would be enough to
> > look through the per-table stats for each database before selecting the
> > database to autovacuum in each cycle, instead of going by "least
> > recently autovacuumed".
> > 
> > Bottom line: there's still lots of low-hanging fruit.  Why are people
> > feeling that we need to abandon or massively complicate our basic
> > architecture to make progress?
> 
> Maybe we could start from reusing the index tuples which point to
> invisible tuples ? The index is not MVCC anyway, so maybe it is easier
> to do in-place replacement there ?
> 
> This probably has the same obstacles which have prevented us from
> removing those in the first place (removing instead of marking as
> invisible). Does it cause some locking issues ? Or does it go against
> some other constraints of our index lookups ?
> 
> I think that just setting the invisible bit in an index leaf node causes
> nearly as much disk io as removing the node.
> 
> If we could delete/reuse old index tuples, it would solve a sizable
> chunk of index-growth problem, especially for cases where referenced key
> value does not change.

I think heap _and_ index reuse is the only useful direction.  Index or
heap reuse alone seems too marginal for the added complexity.

--  Bruce Momjian   bruce@momjian.us EnterpriseDB    http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: vacuum, performance, and MVCC
Next
From: Bruce Momjian
Date:
Subject: Re: Gist does not build with VC++ anymore