Re: Some ideas about Vacuum - Mailing list pgsql-hackers

From Markus Schiltknecht
Subject Re: Some ideas about Vacuum
Date
Msg-id 4784F8E5.4020403@bluegap.ch
Whole thread Raw
In response to Re: Some ideas about Vacuum  (Gregory Stark <stark@enterprisedb.com>)
Responses Re: Some ideas about Vacuum  ("Gokulakannan Somasundaram" <gokul007@gmail.com>)
List pgsql-hackers
Hi,

Gregory Stark wrote:
> That's an interesting thought. I think your caveats are right but with some
> more work it might be possible to work it out. For example if a background
> process processed the WAL and accumulated an array of possibly-dead tuples to
> process in batch. It would wait whenever it sees an xid which isn't yet past
> globalxmin, and keep accumulating until it has enough to make it worthwhile
> doing a pass.

I don't understand why one would want to go via the WAL, that only 
creates needless I/O. Better accumulate the data right away, during the 
inserts, updates and deletes.  Spilling the accumulated data to disk, if 
absolutely required, would presumably still result in less I/O.

> I think a bigger issue with this approach is that it ties all your tables
> together. You can't process one table frequently while some other table has
> some long-lived deleted tuples.

Don't use the WAL as the source of that information and that's issue's gone.

> I'm also not sure it really buys us anything over having a second
> dead-space-map data structure. The WAL is much larger and serves other
> purposes which would limit what we can do with it.

Exactly.

>> You seem to be assuming that only few tuples have changed between vacuums, so
>> that WAL could quickly guide the VACUUM processes to the areas where cleaning
>> is necessary.
>>
>> Let's drop that assumption, because by default, autovacuum_scale_factor is 20%,
>> so a VACUUM process normally kicks in after 20% of tuples changed (disk space
>> is cheap, I/O isn't). Additionally, there's a default nap time of one minute -
>> and VACUUM is forced to take at least that much of a nap.
> 
> I think this is exactly backwards. The goal should be to improve vacuum, then
> adjust the autovacuum_scale_factor as low as we can. As vacuum gets cheaper
> the scale factor can go lower and lower.

But you can't lower it endlessly, it's still a compromise, because it 
also means reducing the amount of tuples being cleaned per scan, which 
is against the goal of minimizing overall I/O cost of vacuuming.

> We shouldn't allow the existing
> autovacuum behaviour to control the way vacuum works.

That's a point.

> As a side point, "disk is cheap, I/O isn't" is a weird statement. The more
> disk you use the more I/O you'll have to do to work with the data.

That's only true, as long as you need *all* your data to work with it.

> I still
> maintain the default autovacuum_scale_factor is *far* to liberal. If I had my
> druthers it would be 5%. But that's mostly informed by TPCC experience, in
> real life the actual value will vary depending on the width of your records
> and the relative length of your transactions versus transaction rate. The TPCC
> experience is with ~ 400 byte records and many short transactions.

Hm.. 5% vs 20% would mean 4x as many vacuum scans, but only a 15% growth 
in size (105% vs 120%), right? Granted, those 15% are also taken from 
memory and caches, resulting in additional I/O...  Still these numbers 
are surprising me. Or am I missing something?

Regards

Markus



pgsql-hackers by date:

Previous
From: Markus Schiltknecht
Date:
Subject: Re: Named vs Unnamed Partitions
Next
From: Gregory Stark
Date:
Subject: Re: OUTER JOIN performance regression remains in 8.3beta4