Re: [PATCHES] VACUUM Improvements - WIP Patch - Mailing list pgsql-hackers

From Tom Lane
Subject Re: [PATCHES] VACUUM Improvements - WIP Patch
Date
Msg-id 8741.1216049025@sss.pgh.pa.us
Whole thread Raw
In response to Re: [PATCHES] VACUUM Improvements - WIP Patch  ("Pavan Deolasee" <pavan.deolasee@gmail.com>)
Responses Re: [PATCHES] VACUUM Improvements - WIP Patch  (Gregory Stark <stark@enterprisedb.com>)
List pgsql-hackers
"Pavan Deolasee" <pavan.deolasee@gmail.com> writes:
> (taking the discussions to -hackers)
> On Sat, Jul 12, 2008 at 11:02 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> (2) It achieves speedup of VACUUM by pushing work onto subsequent
>> regular accesses of the page, which is exactly the wrong thing.
>> Worse, once you count the disk writes those accesses will induce it's
>> not even clear that there's any genuine savings.

> Well in the worst case that is true. But in most other cases, the
> second pass work will be combined with other normal activities and the
> overhead will be shared, at least there is a chance for that. I think
> there is a chance for delaying the work until there is any real need
> for that e.g. INSERT or UPDATE on the page which would require a free
> line pointer.

That's just arm-waving: right now, pruning will be done by the next
*reader* of the page, whether or not he has any intention of *writing*
it.   With no proposal on the table for improving that situation,
I don't see any credibility in arguing for over-complicating VACUUM
on the grounds that it might happen someday.  In any case, the work
that is supposed to be done by VACUUM is being pushed to a foreground
query, which I find to be completely against our design principles.

>> It strikes me that what you are trying to do here is compensate for
>> a bad decision in the HOT patch, which was to have VACUUM's first
>> pass prune/defrag a page even when we know we are going to have to
>> come back to that page later.  What about trying to fix things so
>> that if the page contains line pointers that need to be removed,
>> the first pass doesn't dirty it at all, but leaves all the work
>> to be done at the second visit?

> I am not against this idea. Just that it still requires us double scan
> of the main table and that's exactly what we are trying to avoid with
> this patch.

The part of the argument that I found convincing was trying to reduce
the write traffic (especially WAL log output), not avoiding a second
read.  And the fundamental point still remains: the work should be done
in background, not foreground.
        regards, tom lane


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: PATCH: CITEXT 2.0 v3
Next
From: "Knight, Doug"
Date:
Subject: Building under Visual Studio 2008 - pqcomm.c compile error