Re: vacuum, performance, and MVCC - Mailing list pgsql-hackers

From Hannu Krosing
Subject Re: vacuum, performance, and MVCC
Date
Msg-id 1151179816.3884.43.camel@localhost.localdomain
Whole thread Raw
In response to Re: vacuum, performance, and MVCC  (Jan Wieck <JanWieck@Yahoo.com>)
Responses Re: vacuum, performance, and MVCC  (Jan Wieck <JanWieck@Yahoo.com>)
List pgsql-hackers
Ühel kenal päeval, L, 2006-06-24 kell 15:44, kirjutas Jan Wieck:

> >> That fixes the symptom, not the problem. The problem is performance
> >> steadily degrades over time.
> > 
> > No, you got it backwards.  The performance degradation is the symptom.
> > The problem is that there are too many dead tuples in the table.  There
> > is one way to solve that problem -- remove them, which is done by
> > running vacuum.
> 
> Precisely.
> 
> > There are some problems with vacuum itself, that I agree with.  For
> > example it would be good if a long-running vacuum wouldn't affect a
> > vacuum running in another table because of the long-running transaction
> > effect it has.  It would be good if vacuum could be run partially over a
> > table.  It would be good if there was a way to speed up vacuum by using
> > a dead space map or something.
> 
> It would be good if vacuum wouldn't waste time on blocks that don't have 
> any possible work in them. Vacuum has two main purposes. A) remove dead 
> rows and B) freeze xids. Once a block has zero deleted rows and all xids 
> are frozen, there is nothing to do with this block and vacuum should 
> skip it until a transaction updates that block.
> 
> This requires 2 bits per block, which is 32K per 1G segment of a heap. 
> Clearing the bits is done when the block is marked dirty. This way 
> vacuum would not waste any time and IO on huge slow changing tables. 
> That part, sequentially scanning huge tables that didn't change much is 
> what keeps us from running vacuum every couple of seconds.

Seems like a plan. 

Still, there is another problem which is not solved by map approach
only, at least with current implementation of vacuum.

This is the fact that we need to do full scan over index(es) to clean up
pointers to removed tuples. And huge tables tend to have huge indexes.

As indexes have no MVCC info inside them, it may be possible to start
reusing index entries pointing to rows that are invisible to all running
transactions. Currently we just mark these index entries as dead, but
maybe there is a way to reuse them. This could solve the index bloat
problem for may cases.

Another possible solution for indexes with mostly dead pointers is doing
a reindex, but this will become possible only after we have implemented
a concurrent, non-blocking CREATE INDEX.

-- 
----------------
Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com




pgsql-hackers by date:

Previous
From: Robert Treat
Date:
Subject: Re: Anyone still care about Cygwin? (was Re: [CORE] GPL
Next
From: Martijn van Oosterhout
Date:
Subject: Re: vacuum, performance, and MVCC