Re: Single pass vacuum - take 1 - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Single pass vacuum - take 1
Date
Msg-id CA+U5nMLvRqKced6KLbaJ5Rj=iJVK-U87DMvU=-YL00Z6ds5zNg@mail.gmail.com
Whole thread Raw
In response to Re: Single pass vacuum - take 1  (Pavan Deolasee <pavan.deolasee@gmail.com>)
Responses Re: Single pass vacuum - take 1
List pgsql-hackers
On Thu, Jul 14, 2011 at 4:57 PM, Pavan Deolasee
<pavan.deolasee@gmail.com> wrote:

> Thanks Simon for looking at the patch.

Sorry, I didn't notice there was a patch attached. Not reviewed it. I
thought we were still just talking.


> I am not sure if the use case is really narrow.

This is a very rare issue, because of all the work yourself and Heikki
have put in.

It's only important when we have a (1) big table (hence long scan
time), (2) a use case that avoids HOT *and* (3) we have dirtied a
large enough section of table that the vacuum map is ineffective and
we need to scan high % of table. That combination is pretty rare, so
penalising everybody else with 8 bytes per block seems too much to me.

Big VACUUMs are a problem, but my observation would be that those are
typically transaction wraparound VACUUMs and the extra writes are not
caused by row removal. So we do sometimes do Phase 2 and Phase 3 even
when there is a very low number of row removals - since not all
VACUUMs are triggered by changes.


> Today, we dirty the pages in
> both the passes and also emit WAL records.

This is exactly the thing I'm suggesting we avoid.

> Just the heap scan can take a
> very long time for large tables, blocking the autovacuum worker threads from
> doing useful work on other tables. If I am not wrong, we use ring buffers
> for vacuum which would most-likely force those buffers to be written/read
> twice to the disk.

I think the problem comes from dirtying too many blocks. Scanning the
tables using the ring buffer is actually fairly cheap. The second scan
only touches the blocks that need secondary cleaning, so the cost of
it is usually much less.

I'm suggesting we write each block at most once, rather than twice as
we do now. Yes, we have to do both scans.

My idea does exactly same number of writes as yours. On read-only I/O,
your idea is clearly more efficient, but overall that's not by enough
to justify the 8 byte per block overhead, IMHO.


> Which part of the patch you think is very complex ? We can try to simplify
> that. Or are you seeing any obvious bugs that I missed ? IMHO taking out a
> phase completely from vacuum (as this patch does) can simplify things.

I have great faith in your talents, just not sure about this
particular use of them. I'm sorry to voice them now you've written the
patch.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


pgsql-hackers by date:

Previous
From: "David E. Wheeler"
Date:
Subject: Re: pg_class.relistemp
Next
From: Florian Pflug
Date:
Subject: Re: Review of patch Bugfix for XPATH() if expression returns a scalar value