Re: vacuum, performance, and MVCC - Mailing list pgsql-hackers

From Christopher Browne
Subject Re: vacuum, performance, and MVCC
Date
Msg-id 87ejxeb4jj.fsf@wolfe.cbbrowne.com
Whole thread Raw
In response to vacuum, performance, and MVCC  ("Mark Woodward" <pgsql@mohawksoft.com>)
List pgsql-hackers
Martha Stewart called it a Good Thing when JanWieck@Yahoo.com (Jan Wieck) wrote:
> On 6/22/2006 2:37 PM, Alvaro Herrera wrote:
>
>> Adding back pgsql-hackers.
>> Mark Woodward wrote:
>>> > Mark Woodward wrote:
>>> >
>>> >> Hmm, OK, then the problem is more serious than I suspected.
>>> >> This means that every index on a row has to be updated on every
>>> >> transaction that modifies that row. Is that correct?
>>> >
>>> > Add an index entry, yes.
>>> >
>>> >> I am attaching some code that shows the problem with regard to
>>> >> applications such as web server session management, when run, each
>>> >> second
>>> >> the system can handle fewer and fewer connections. Here is a brief
>>> >> output:
>>> >> [...]
>>> >> There has to be a more linear way of handling this scenario.
>>> >
>>> > So vacuum the table often.
>>> That fixes the symptom, not the problem. The problem is performance
>>> steadily degrades over time.
>> No, you got it backwards.  The performance degradation is the
>> symptom.
>> The problem is that there are too many dead tuples in the table.  There
>> is one way to solve that problem -- remove them, which is done by
>> running vacuum.
>
> Precisely.
>
>> There are some problems with vacuum itself, that I agree with.  For
>> example it would be good if a long-running vacuum wouldn't affect a
>> vacuum running in another table because of the long-running transaction
>> effect it has.  It would be good if vacuum could be run partially over a
>> table.  It would be good if there was a way to speed up vacuum by using
>> a dead space map or something.
>
> It would be good if vacuum wouldn't waste time on blocks that don't
> have any possible work in them. Vacuum has two main purposes. A)
> remove dead rows and B) freeze xids. Once a block has zero deleted
> rows and all xids are frozen, there is nothing to do with this block
> and vacuum should skip it until a transaction updates that block.
>
> This requires 2 bits per block, which is 32K per 1G segment of a
> heap. Clearing the bits is done when the block is marked dirty. This
> way vacuum would not waste any time and IO on huge slow changing
> tables. That part, sequentially scanning huge tables that didn't
> change much is what keeps us from running vacuum every couple of
> seconds.

This is, in effect, the "VACUUM Space Map."

I see one unfortunate thing about that representation of it, namely
that it would in effect require that non-frozen pages be kept on the
VSM for potentially a long time.

Based on *present* VACUUM strategy, at least.

Would it not be the case, here, that any time a page could be
"frozen," it would have to be?  In effect, we are always trying to run
VACUUM FREEZE?
-- 
output = ("cbbrowne" "@" "gmail.com")
http://cbbrowne.com/info/finances.html
Rules  of the  Evil  Overlord #72.  "If  all the  heroes are  standing
together around  a strange device and  begin to taunt me,  I will pull
out a conventional weapon  instead of using my unstoppable superweapon
on them. <http://www.eviloverlord.com/>


pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: Buffer for inner and outer table
Next
From: Bruce Momjian
Date:
Subject: Re: vacuum, performance, and MVCC