Re: vacuum, performance, and MVCC - Mailing list pgsql-hackers

From Jan Wieck
Subject Re: vacuum, performance, and MVCC
Date
Msg-id 449EC845.1000608@Yahoo.com
Whole thread Raw
In response to Re: vacuum, performance, and MVCC  (Hannu Krosing <hannu@skype.net>)
List pgsql-hackers
On 6/24/2006 4:10 PM, Hannu Krosing wrote:

> Ühel kenal päeval, L, 2006-06-24 kell 15:44, kirjutas Jan Wieck:
> 
>> >> That fixes the symptom, not the problem. The problem is performance
>> >> steadily degrades over time.
>> > 
>> > No, you got it backwards.  The performance degradation is the symptom.
>> > The problem is that there are too many dead tuples in the table.  There
>> > is one way to solve that problem -- remove them, which is done by
>> > running vacuum.
>> 
>> Precisely.
>> 
>> > There are some problems with vacuum itself, that I agree with.  For
>> > example it would be good if a long-running vacuum wouldn't affect a
>> > vacuum running in another table because of the long-running transaction
>> > effect it has.  It would be good if vacuum could be run partially over a
>> > table.  It would be good if there was a way to speed up vacuum by using
>> > a dead space map or something.
>> 
>> It would be good if vacuum wouldn't waste time on blocks that don't have 
>> any possible work in them. Vacuum has two main purposes. A) remove dead 
>> rows and B) freeze xids. Once a block has zero deleted rows and all xids 
>> are frozen, there is nothing to do with this block and vacuum should 
>> skip it until a transaction updates that block.
>> 
>> This requires 2 bits per block, which is 32K per 1G segment of a heap. 
>> Clearing the bits is done when the block is marked dirty. This way 
>> vacuum would not waste any time and IO on huge slow changing tables. 
>> That part, sequentially scanning huge tables that didn't change much is 
>> what keeps us from running vacuum every couple of seconds.
> 
> Seems like a plan. 
> 
> Still, there is another problem which is not solved by map approach
> only, at least with current implementation of vacuum.
> 
> This is the fact that we need to do full scan over index(es) to clean up
> pointers to removed tuples. And huge tables tend to have huge indexes.

Right, now that you say it I remember why this wasn't so easy as it 
sounded at the beginning.

Obviously there is no other way to find an index tuple without a 
sequential scan other than doing an index scan. So vacuum would have to 
estimate based on the bitmaps if it could be beneficial (huge table, 
little vacuumable pages) to actually remove/flag single index tuples 
before removing the heap tuple. This can be done in advance to removing 
the heap tuple because index tuples might not be there to begin with.

However, that is a very costly thing to do and not trivial to implement.


Jan

-- 
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #


pgsql-hackers by date:

Previous
From: Jan Wieck
Date:
Subject: Re: vacuum, performance, and MVCC
Next
From: Heikki Linnakangas
Date:
Subject: Re: vacuum, performance, and MVCC