Thread: synchronized scans for VACUUM

synchronized scans for VACUUM

From

Jeff Davis

Date:

31 May 2008, 22:14:14

Previous thread for reference:

http://archives.postgresql.org/pgsql-patches/2007-06/msg00096.php

The objections to synchronized scans for VACUUM as listed in that thread
(summary):

1. vacuum sometimes progresses faster than a regular heapscan, because
it doesn't need to check WHERE clauses, etc.

2. vacuum takes breaks from the scan to clean up the indexes when it
runs out of maintenance_work_mem.

3. vacuum takes breaks for the cost delay

4. vacuum will dirty a lot of the blocks as it goes, and that will cause
some kind of interaction with the ring buffer

I'd like to address these one by one to see what problems are really in
our way:

1. This would mean that it's not an I/O limited scan. I think as long as
we're talking about regular table scans that can benefit from
synchronized scanning, a vacuum of the same table would also benefit. A
microbenchmark could show whether some benefit exists or not.

2. There have been suggestions about a more compact representation for
the tuple id list. If this works, it will solve this problem.

3. Offering synchronized vacuums could reduce the need for these
elective pauses. 

4. This probably has more to do with the buffer ring than synchronized
scans. There could be some bad interaction there, but I don't see that
it's clearly bad.

Additionally, with the possible exception of #4, I don't see the
situation being worse than it is currently.

Thoughts?

Regards,Jeff Davis

Re: synchronized scans for VACUUM

From

Tom Lane

Date:

31 May 2008, 23:08:58

Jeff Davis <pgsql@j-davis.com> writes:
> The objections to synchronized scans for VACUUM as listed in that thread
> (summary):

> 2. vacuum takes breaks from the scan to clean up the indexes when it
> runs out of maintenance_work_mem.

> 2. There have been suggestions about a more compact representation for
> the tuple id list. If this works, it will solve this problem.

It will certainly not "solve" the problem.  What it will do is mean that
the breaks are further apart and longer, which seems to me to make the
conflict with syncscan behavior worse not better.

> 3. vacuum takes breaks for the cost delay

> 3. Offering synchronized vacuums could reduce the need for these
> elective pauses. 

How so?  A vacuum that happens not to be part of a syncscan herd is
going to be just as bad for system performance as ever.

It still seems to me that vacuum is unlikely to be a productive member
of a syncscan herd --- it just isn't going to have similar scan-speed
behavior to typical queries.
        regards, tom lane

Re: synchronized scans for VACUUM

From

Gregory Stark

Date:

01 June 2008, 09:21:01

"Tom Lane" <tgl@sss.pgh.pa.us> writes:

> Jeff Davis <pgsql@j-davis.com> writes:
>> The objections to synchronized scans for VACUUM as listed in that thread
>> (summary):
>
>> 2. vacuum takes breaks from the scan to clean up the indexes when it
>> runs out of maintenance_work_mem.
>
>> 2. There have been suggestions about a more compact representation for
>> the tuple id list. If this works, it will solve this problem.
>
> It will certainly not "solve" the problem.  What it will do is mean that
> the breaks are further apart and longer, which seems to me to make the
> conflict with syncscan behavior worse not better.

How would it make them longer? They still have the same amount of i/o to do
scanning the indexes. I suppose they would dirty more pages which might slow
them down?

In any case I think the representation you proposed back when this idea last
came up was so compact that pretty much any size table ought to be
representable in a reasonable work_mem -- at least for the kind of machine
which would normally be dealing with that size table.

> It still seems to me that vacuum is unlikely to be a productive member
> of a syncscan herd --- it just isn't going to have similar scan-speed
> behavior to typical queries.

That's my thinking too. Our general direction has been toward reducing
vacuum's i/o bandwidth requirements, not worrying about making it run as fast
as possible.

That said if it happened to latch on to a sync scan herd it would have very
few cache misses which would cause it to rack up very few vacuum cost delay
points. Perhaps the vacuum cost delay for a cache hit ought to be 0?

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com Get trained by Bruce Momjian - ask me about
EnterpriseDB'sPostgreSQL training!

Re: synchronized scans for VACUUM

From

Tom Lane

Date:

01 June 2008, 10:57:53

Gregory Stark <stark@enterprisedb.com> writes:
>> It will certainly not "solve" the problem.  What it will do is mean that
>> the breaks are further apart and longer, which seems to me to make the
>> conflict with syncscan behavior worse not better.

> How would it make them longer? They still have the same amount of i/o to do
> scanning the indexes. I suppose they would dirty more pages which might slow
> them down?

More tuples to delete = more writes (in WAL, if not immediately in the
index itself) = longer to complete the indexscan.  It's still cheaper
than doing multiple indexscans, of course, but my point is that the
index-fixing work gets concentrated.
        regards, tom lane