Thread: VACUUM Improvements - WIP Patch

VACUUM Improvements - WIP Patch

From
"Pavan Deolasee"
Date:
Here is a WIP patch based on the discussions here:
http://archives.postgresql.org/pgsql-hackers/2008-05/msg00863.php

The attached WIP patch improves the LAZY VACUUM by limiting or
avoiding the second heap scan. This not only saves considerable time
in VACUUM, but also reduces the double-writes of vacuumed blocks. If
the second heap scan is considerably limited, that should also save
CPU usage and reduce WAL log writing.

With HOT, the first heap scan prunes and defrags every page in the
heap. That truncates all the dead tuples to their DEAD line pointers
and releases all the free space in the page. The second scan only
removes these DEAD line pointers and records the free space in the
FSM. The free space in fact does not change from the first pass. But
to do so, it again calls RepairPageFragmentation on each page, dirties
the page and calls log_heap_clean() again on the page. This clearly
looks like too much work for a small gain.

As this patch stands, the first phase of vacuum prunes the heap pages
as usual. But it marks the DEAD line pointers as DEAD_RECLAIMED to
signal that the index pointers to these line pointers are being
removed, if certain conditions are satisfied. Other backend when
prunes a page, also reclaims DEAD_RECLAIMED line pointers by marking
them UNUSED. We need some additional logic to do this in a safe way:

- An additional boolean pg_class attribute (relvac_inprogress) is used
to track the status of vacuum on a relation. If the attribute is true,
either vacuum is in progress on the relation or the last vacuum did
not complete successfully.

When VACUUM starts, it sets relvac_inprogress to true. The transaction
is committed and a new transaction is started so that all other
backends can see the change. We also note down the transactions which
may already have the table open. VACUUM then starts the first heap
scan. It prunes the page, but it can start marking the DEAD line
pointers as DEAD_RECLAIMED only after it knows that all other backends
can see that VACUUM is in progress on the target relation. Otherwise
there is a danger that backends might reclaim DEAD line pointers
before their index pointers are removed and that would lead to index
corruption. We do that by periodic conditional waits on the noted
transactions ids. Once all old transactions are gone, VACUUM sets the
second scan limit to the current block number and starts marking
subsequent DEAD line pointers as DEAD_RECLAIMED.

In most of the cases where the old transactions quickly go away, and
for large tables, the second scan will be very limited. In the worst
case, we might incur the overhead of conditional waits without any
success.

TODO:

- We can potentially update FSM at the end of first pass. This is not
a significant issue if the second scan is very limited. But if we do
this, we need to handle the truncate case properly.

- As the patch stands, we check of old transactions at every block
iteration. This might not be acceptable for the cases where there are
long running transactions. We probably need some exponential gap here.

- As the patch stands, the heap_page_prune handles reclaiming the
DEAD_RECLAIMED line pointers since it already has ability to WAL log
similar changes. We don't do any extra work to trigger pruning though
(except than setting page_prune_xid). May be we should trigger pruning
if we got a line pointer bloat in a page too.

Please let me know comments/suggestions and any other improvements.

Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com

Attachment

Re: VACUUM Improvements - WIP Patch

From
Simon Riggs
Date:
On Tue, 2008-06-10 at 11:02 +0530, Pavan Deolasee wrote:

> In most of the cases where the old transactions quickly go away, and
> for large tables, the second scan will be very limited. In the worst
> case, we might incur the overhead of conditional waits without any
> success.

Looks good.

> - An additional boolean pg_class attribute (relvac_inprogress) is used
> to track the status of vacuum on a relation. If the attribute is true,
> either vacuum is in progress on the relation or the last vacuum did
> not complete successfully.
>
> When VACUUM starts, it sets relvac_inprogress to true.

What happens if the last VACUUM crashed? Any negative effects? If so,
should autovac be triggered again soon to complete the failed VACUUM?

> - We can potentially update FSM at the end of first pass. This is not
> a significant issue if the second scan is very limited. But if we do
> this, we need to handle the truncate case properly.

Not sure why would we do that? What would that give? To do that you'd
need to completely redesign FSM since it assumes only one update would
take place.

> - As the patch stands, we check of old transactions at every block
> iteration. This might not be acceptable for the cases where there are
> long running transactions. We probably need some exponential gap here.

I would make vacuum_delay_point() return bool rather than void, then you
can do the check each time we do the delay by saying:

if (vacuum_delay_point())
{

Need to change VacuumCostActive so it is always active during a VACUUM,
so we do accounting even when vacuum wait is zero.


--
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


Re: VACUUM Improvements - WIP Patch

From
Tom Lane
Date:
"Pavan Deolasee" <pavan.deolasee@gmail.com> writes:
> Here is a WIP patch based on the discussions here:
> http://archives.postgresql.org/pgsql-hackers/2008-05/msg00863.php

I do not like this patch in any way, shape, or form.

(1) It's enormously overcomplicated, and therefore fragile.

(2) It achieves speedup of VACUUM by pushing work onto subsequent
regular accesses of the page, which is exactly the wrong thing.
Worse, once you count the disk writes those accesses will induce it's
not even clear that there's any genuine savings.

(3) The fact that it doesn't work until concurrent transactions have
gone away makes it of extremely dubious value in real-world scenarios,
as already noted by Simon.


It strikes me that what you are trying to do here is compensate for
a bad decision in the HOT patch, which was to have VACUUM's first
pass prune/defrag a page even when we know we are going to have to
come back to that page later.  What about trying to fix things so
that if the page contains line pointers that need to be removed,
the first pass doesn't dirty it at all, but leaves all the work
to be done at the second visit?  I think that since heap_page_prune
has been refactored into a "scan" followed by an "apply", it'd be
possible to decide before the "apply" step whether this is the case
or not.

            regards, tom lane