Re: vacuum, performance, and MVCC - Mailing list pgsql-hackers

From Jan Wieck
Subject Re: vacuum, performance, and MVCC
Date
Msg-id 449D9629.5050608@Yahoo.com
Whole thread Raw
In response to Re: vacuum, performance, and MVCC  (Alvaro Herrera <alvherre@alvh.no-ip.org>)
Responses Re: vacuum, performance, and MVCC
List pgsql-hackers
On 6/22/2006 2:37 PM, Alvaro Herrera wrote:

> Adding back pgsql-hackers.
> 
> Mark Woodward wrote:
>> > Mark Woodward wrote:
>> >
>> >> Hmm, OK, then the problem is more serious than I suspected.
>> >> This means that every index on a row has to be updated on every
>> >> transaction that modifies that row. Is that correct?
>> >
>> > Add an index entry, yes.
>> >
>> >> I am attaching some code that shows the problem with regard to
>> >> applications such as web server session management, when run, each
>> >> second
>> >> the system can handle fewer and fewer connections. Here is a brief
>> >> output:
>> >> [...]
>> >> There has to be a more linear way of handling this scenario.
>> >
>> > So vacuum the table often.
>> 
>> That fixes the symptom, not the problem. The problem is performance
>> steadily degrades over time.
> 
> No, you got it backwards.  The performance degradation is the symptom.
> The problem is that there are too many dead tuples in the table.  There
> is one way to solve that problem -- remove them, which is done by
> running vacuum.

Precisely.

> There are some problems with vacuum itself, that I agree with.  For
> example it would be good if a long-running vacuum wouldn't affect a
> vacuum running in another table because of the long-running transaction
> effect it has.  It would be good if vacuum could be run partially over a
> table.  It would be good if there was a way to speed up vacuum by using
> a dead space map or something.

It would be good if vacuum wouldn't waste time on blocks that don't have 
any possible work in them. Vacuum has two main purposes. A) remove dead 
rows and B) freeze xids. Once a block has zero deleted rows and all xids 
are frozen, there is nothing to do with this block and vacuum should 
skip it until a transaction updates that block.

This requires 2 bits per block, which is 32K per 1G segment of a heap. 
Clearing the bits is done when the block is marked dirty. This way 
vacuum would not waste any time and IO on huge slow changing tables. 
That part, sequentially scanning huge tables that didn't change much is 
what keeps us from running vacuum every couple of seconds.


Jan

-- 
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #


pgsql-hackers by date:

Previous
From: Hannu Krosing
Date:
Subject: Re: vacuum, performance, and MVCC
Next
From: Robert Treat
Date:
Subject: Re: Anyone still care about Cygwin? (was Re: [CORE] GPL