Thread: Minimum tuple threshold to decide last pass of VACUUM

Minimum tuple threshold to decide last pass of VACUUM

From
Michael Paquier
Date:
Hi all,

Commit 4046e58c (dated of 2001) has introduced the following comment
in vacuumlazy.c:
+       /* If any tuples need to be deleted, perform final vacuum cycle */
+       /* XXX put a threshold on min nuber of tuples here? */
+       if (vacrelstats->num_dead_tuples > 0)
In short, we may want to have a reloption to decide if we do or not
the last pass of VACUUM or not depending on a given number of
remaining tuples. Is this still something we would like to have?

Regards,
-- 
Michael



Re: Minimum tuple threshold to decide last pass of VACUUM

From
Simon Riggs
Date:
On 2 August 2015 at 13:13, Michael Paquier <michael.paquier@gmail.com> wrote:
Hi all,

Commit 4046e58c (dated of 2001) has introduced the following comment
in vacuumlazy.c:
+       /* If any tuples need to be deleted, perform final vacuum cycle */
+       /* XXX put a threshold on min nuber of tuples here? */
+       if (vacrelstats->num_dead_tuples > 0)
In short, we may want to have a reloption to decide if we do or not
the last pass of VACUUM or not depending on a given number of
remaining tuples. Is this still something we would like to have?

I don't think we want a new user parameter, but we should have an internal limit with a heuristic, similar to how we decide whether to truncate.

I would suggest this internal logic...

* If its a VACUUM FREEZE then index_scan_threshold = 0, i.e. always scan if needed, since the user is requesting maximum vacuum

* For emergency anti-wraparound VACUUMs we shouldn't scan indexes at all, since they aren't critical path activities at that point

* For normal VACUUMs we should scan indexes only if (num_dead_tuples * 20) > (blocks to be scanned in any one index), which allows some index bloat but not much

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Minimum tuple threshold to decide last pass of VACUUM

From
Alvaro Herrera
Date:
Simon Riggs wrote:

> * For emergency anti-wraparound VACUUMs we shouldn't scan indexes at all,
> since they aren't critical path activities at that point

It is not possible to skip scanning indexes completely, unless no tuples
are to be removed from the heap.  Otherwise, index tuples become
lingering pointers (and when such heap address are later refilled, they
become corrupted indexscans).

But actually this is an interesting point and I don't think we do this:
if in emergency mode, maybe we shouldn't try to remove any dead tuples
at all, and instead only freeze very old tuples.  That would make such
vacuums go much quicker.  (More accurately, if the updating xid is older
than the freeze point, then remove the tuple, but otherwise keep it.)

My point is that emergency vacuums are troublesome for various reasons
and it would be better if they did only the minimum possible.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Minimum tuple threshold to decide last pass of VACUUM

From
Tom Lane
Date:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> Simon Riggs wrote:
>> * For emergency anti-wraparound VACUUMs we shouldn't scan indexes at all,
>> since they aren't critical path activities at that point

> It is not possible to skip scanning indexes completely, unless no tuples
> are to be removed from the heap.

Right.

> But actually this is an interesting point and I don't think we do this:
> if in emergency mode, maybe we shouldn't try to remove any dead tuples
> at all, and instead only freeze very old tuples.

+1 ... not sure if that's what Simon had in mind exactly, but it seems
like a correct statement of what he was getting at.
        regards, tom lane



Re: Minimum tuple threshold to decide last pass of VACUUM

From
Simon Riggs
Date:
On 3 August 2015 at 17:36, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> Simon Riggs wrote:
>> * For emergency anti-wraparound VACUUMs we shouldn't scan indexes at all,
>> since they aren't critical path activities at that point

> It is not possible to skip scanning indexes completely, unless no tuples
> are to be removed from the heap.

Right.

> But actually this is an interesting point and I don't think we do this:
> if in emergency mode, maybe we shouldn't try to remove any dead tuples
> at all, and instead only freeze very old tuples.

+1 ... not sure if that's what Simon had in mind exactly, but it seems
like a correct statement of what he was getting at.

Yes, that's what I was thinking, I just didn't say actually it. I'd been thinking about having VACUUM do just Phase 1 for some time, since its so much faster to do that. Will code.

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Minimum tuple threshold to decide last pass of VACUUM

From
Robert Haas
Date:
On Mon, Aug 3, 2015 at 11:52 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> Simon Riggs wrote:
>> * For emergency anti-wraparound VACUUMs we shouldn't scan indexes at all,
>> since they aren't critical path activities at that point
>
> It is not possible to skip scanning indexes completely, unless no tuples
> are to be removed from the heap.  Otherwise, index tuples become
> lingering pointers (and when such heap address are later refilled, they
> become corrupted indexscans).

Well, if we skip the index scans, we can't do the second heap pass
either, but that's OK.  I think we're all talking about the same thing
here, which is to do only the first heap pass in some cases.  That
will prune dead tuples to line pointers, freeze old XIDs, and mark
pages all-visible where appropriate.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Minimum tuple threshold to decide last pass of VACUUM

From
Robert Haas
Date:
On Mon, Aug 3, 2015 at 4:13 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> * For normal VACUUMs we should scan indexes only if (num_dead_tuples * 20) >
> (blocks to be scanned in any one index), which allows some index bloat but
> not much

I think this kind of heuristic is good, but I think we should expose a
setting for it.  There's no way for us to know without testing whether
the right value for that multiplier is 2 or 20 or 200 or 2000, and if
we don't make it easy to tweak, we'll never find out.  It may even be
workload-dependent.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Minimum tuple threshold to decide last pass of VACUUM

From
Jim Nasby
Date:
On 8/3/15 12:04 PM, Simon Riggs wrote:
> Yes, that's what I was thinking, I just didn't say actually it. I'd been
> thinking about having VACUUM do just Phase 1 for some time, since its so
> much faster to do that. Will code.

I'd like to see that exposed as an option as well. There are certain 
situations where you'd really like to just freeze things as fast as 
possible, without waiting for a full vacuum.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com



Re: Minimum tuple threshold to decide last pass of VACUUM

From
Michael Paquier
Date:
On Tue, Aug 4, 2015 at 2:04 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 3 August 2015 at 17:36, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>
>> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
>> > Simon Riggs wrote:
>> >> * For emergency anti-wraparound VACUUMs we shouldn't scan indexes at
>> >> all,
>> >> since they aren't critical path activities at that point
>>
>> > It is not possible to skip scanning indexes completely, unless no tuples
>> > are to be removed from the heap.
>>
>> Right.
>>
>> > But actually this is an interesting point and I don't think we do this:
>> > if in emergency mode, maybe we shouldn't try to remove any dead tuples
>> > at all, and instead only freeze very old tuples.
>>
>> +1 ... not sure if that's what Simon had in mind exactly, but it seems
>> like a correct statement of what he was getting at.
>
>
> Yes, that's what I was thinking, I just didn't say actually it. I'd been
> thinking about having VACUUM do just Phase 1 for some time, since its so
> much faster to do that. Will code.

Interesting. I'll be happy to have a look at any patch produced,
that's surely something we want to improve in emergency mode.
-- 
Michael