Re: VACUUM's ancillary tasks - Mailing list pgsql-hackers

From Vik Fearing
Subject Re: VACUUM's ancillary tasks
Date
Msg-id 0449c20e-a9eb-5192-e9e6-46b848427a73@2ndquadrant.fr
Whole thread Raw
In response to Re: VACUUM's ancillary tasks  (Thomas Munro <thomas.munro@enterprisedb.com>)
Responses Re: VACUUM's ancillary tasks  (Thomas Munro <thomas.munro@enterprisedb.com>)
Re: VACUUM's ancillary tasks  (Jeff Janes <jeff.janes@gmail.com>)
List pgsql-hackers
On 10/01/2016 09:28 AM, Thomas Munro wrote:
> On Mon, Aug 29, 2016 at 1:26 PM, Vik Fearing <vik@2ndquadrant.fr> wrote:
>> The attached two patches scratch two itches I've been having for a
>> while.  I'm attaching them together because the second depends on the first.
>>
>> Both deal with the fact that [auto]vacuum has taken on more roles than
>> its original purpose.
>>
>>
>> Patch One: autovacuum insert-heavy tables
>>
>> If you have a table that mostly receives INSERTs, it will never get
>> vacuumed because there are no (or few) dead rows.  I have added an
>> "inserts_since_vacuum" field to PgStat_StatTabEntry which works exactly
>> the same way as "changes_since_analyze" does.
>>
>> The reason such a table needs to be vacuumed is currently twofold: the
>> visibility map is not updated, slowing down index-only scans; and BRIN
>> indexes are not maintained, rendering them basically useless.
> 
> I'm aware of those two problems, but not very familiar with the
> details.  I don't feel qualified to say whether insert counting is the
> best approach to the problem at this point.  I looked into it a little
> bit however, and had the following thoughts:
> 
> About BRIN indexes:  I couldn't find an explanation of why BRIN
> indexes don't automatically create new summary tuples when you insert
> a new tuple in an unsummarised page range.  Is it deferred until
> VACUUM time in order to untangle some otherwise unresolvable
> interlocking or crash safety problem, or could that one day be done?
> Assuming that it must be deferred for some technical reason and there
> is no way around it, then I wonder if there is a more direct and
> accurate way to figure out when it's necessary than counting inserts.
> Counting inserts seems slightly bogus because you can't tell whether
> those were inserts into an existing summarised block which is
> self-maintaining or not.  At first glance it looks a bit like
> unsummarised ranges can only appear at the end of the table, is that
> right?  If so, couldn't you detect the number of unsummarised BRIN
> blocks just by comparing the highest summarised BRIN block and the
> current heap size?
> 
> About visibility maps:  How crazy would it be to estimate the number
> of not-all-visible pages instead?  It would be less work to count that
> since it would only increase when the *first* tuple is inserted into a
> page that is currently all visible (ie when the bit is cleared), not
> for every tuple inserted into any page like your inserts_since_vacuum
> counter.  Another difference is that inserts_since_vacuum is reset
> even if vacuum finds that it *can't* set the all-visible bit for a
> given page yet because of some concurrent transaction.  In that case
> the bit is still not set but autovacuum has no reason to be triggered
> again.

Sure, I could handle each case separately, but the goal of this patch
(as hinted at in the Subject) is to generalize all the different tasks
we've been giving to VACUUM.  The only missing piece is what the first
patch addresses; which is insert-mostly tables never getting vacuumed.

As for the second patch, I would like to withdraw it and redesign it,
based on your comments.  The redesign I have in mind will no longer be
dependent on the first patch, so I'll submit it separately.
-- 
Vik Fearing                                          +33 6 46 75 15 36
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Macro customizable hashtable / bitmapscan & aggregation perf
Next
From: Emre Hasegeli
Date:
Subject: Contains and is contained by operators of inet datatypes