Re: VACUUM's ancillary tasks - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: VACUUM's ancillary tasks
Date
Msg-id CAEepm=3RJLbVxFTH9tyc82yR1gU_r6Q7q76xbgKAHo5p+HsmSw@mail.gmail.com
Whole thread Raw
In response to Re: VACUUM's ancillary tasks  (Vik Fearing <vik@2ndquadrant.fr>)
List pgsql-hackers
On Sun, Oct 2, 2016 at 9:34 AM, Vik Fearing <vik@2ndquadrant.fr> wrote:
> On 10/01/2016 09:28 AM, Thomas Munro wrote:
>> On Mon, Aug 29, 2016 at 1:26 PM, Vik Fearing <vik@2ndquadrant.fr> wrote:
>>> The attached two patches scratch two itches I've been having for a
>>> while.  I'm attaching them together because the second depends on the first.
>>>
>>> Both deal with the fact that [auto]vacuum has taken on more roles than
>>> its original purpose.
>>>
>>>
>>> Patch One: autovacuum insert-heavy tables
>>>
>>> If you have a table that mostly receives INSERTs, it will never get
>>> vacuumed because there are no (or few) dead rows.  I have added an
>>> "inserts_since_vacuum" field to PgStat_StatTabEntry which works exactly
>>> the same way as "changes_since_analyze" does.
>>>
>>> The reason such a table needs to be vacuumed is currently twofold: the
>>> visibility map is not updated, slowing down index-only scans; and BRIN
>>> indexes are not maintained, rendering them basically useless.
>>
>> I'm aware of those two problems, but not very familiar with the
>> details.  I don't feel qualified to say whether insert counting is the
>> best approach to the problem at this point.  I looked into it a little
>> bit however, and had the following thoughts:
>>
>> About BRIN indexes:  I couldn't find an explanation of why BRIN
>> indexes don't automatically create new summary tuples when you insert
>> a new tuple in an unsummarised page range.  Is it deferred until
>> VACUUM time in order to untangle some otherwise unresolvable
>> interlocking or crash safety problem, or could that one day be done?
>> Assuming that it must be deferred for some technical reason and there
>> is no way around it, then I wonder if there is a more direct and
>> accurate way to figure out when it's necessary than counting inserts.
>> Counting inserts seems slightly bogus because you can't tell whether
>> those were inserts into an existing summarised block which is
>> self-maintaining or not.  At first glance it looks a bit like
>> unsummarised ranges can only appear at the end of the table, is that
>> right?  If so, couldn't you detect the number of unsummarised BRIN
>> blocks just by comparing the highest summarised BRIN block and the
>> current heap size?
>>
>> About visibility maps:  How crazy would it be to estimate the number
>> of not-all-visible pages instead?  It would be less work to count that
>> since it would only increase when the *first* tuple is inserted into a
>> page that is currently all visible (ie when the bit is cleared), not
>> for every tuple inserted into any page like your inserts_since_vacuum
>> counter.  Another difference is that inserts_since_vacuum is reset
>> even if vacuum finds that it *can't* set the all-visible bit for a
>> given page yet because of some concurrent transaction.  In that case
>> the bit is still not set but autovacuum has no reason to be triggered
>> again.
>
> Sure, I could handle each case separately, but the goal of this patch
> (as hinted at in the Subject) is to generalize all the different tasks
> we've been giving to VACUUM.  The only missing piece is what the first
> patch addresses; which is insert-mostly tables never getting vacuumed.

Yeah, that makes sense.  I just wanted to discuss what the ideal
launch conditions would be for those particular ancillary jobs, and
then figure out whether the difference matters.  Generally, I think
changes to autovacuum heuristics need some consensus-building
discussion, especially in light of other related ideas from Jeff
Janes, and from people involved with BRIN and visibility map design,
including Simon who signed up as a reviewer.  Since we're out of time
I'm going to move this to the November CF, and let's hear from them.

-- 
Thomas Munro
http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Emre Hasegeli
Date:
Subject: Contains and is contained by operators of inet datatypes
Next
From: John Gorman
Date:
Subject: Re: PATCH: two slab-like memory allocators