Re: VACUUM's ancillary tasks - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: VACUUM's ancillary tasks
Date
Msg-id CAEepm=2rWqt1mhT9XwEPUdvcZN_f8qb7+CaejU2D34R-VG8b5w@mail.gmail.com
Whole thread Raw
In response to VACUUM's ancillary tasks  (Vik Fearing <vik@2ndquadrant.fr>)
Responses Re: VACUUM's ancillary tasks  (Vik Fearing <vik@2ndquadrant.fr>)
Re: [HACKERS] VACUUM's ancillary tasks  (Alvaro Herrera <alvherre@2ndquadrant.com>)
List pgsql-hackers
On Mon, Aug 29, 2016 at 1:26 PM, Vik Fearing <vik@2ndquadrant.fr> wrote:
> The attached two patches scratch two itches I've been having for a
> while.  I'm attaching them together because the second depends on the first.
>
> Both deal with the fact that [auto]vacuum has taken on more roles than
> its original purpose.
>
>
> Patch One: autovacuum insert-heavy tables
>
> If you have a table that mostly receives INSERTs, it will never get
> vacuumed because there are no (or few) dead rows.  I have added an
> "inserts_since_vacuum" field to PgStat_StatTabEntry which works exactly
> the same way as "changes_since_analyze" does.
>
> The reason such a table needs to be vacuumed is currently twofold: the
> visibility map is not updated, slowing down index-only scans; and BRIN
> indexes are not maintained, rendering them basically useless.

I'm aware of those two problems, but not very familiar with the
details.  I don't feel qualified to say whether insert counting is the
best approach to the problem at this point.  I looked into it a little
bit however, and had the following thoughts:

About BRIN indexes:  I couldn't find an explanation of why BRIN
indexes don't automatically create new summary tuples when you insert
a new tuple in an unsummarised page range.  Is it deferred until
VACUUM time in order to untangle some otherwise unresolvable
interlocking or crash safety problem, or could that one day be done?
Assuming that it must be deferred for some technical reason and there
is no way around it, then I wonder if there is a more direct and
accurate way to figure out when it's necessary than counting inserts.
Counting inserts seems slightly bogus because you can't tell whether
those were inserts into an existing summarised block which is
self-maintaining or not.  At first glance it looks a bit like
unsummarised ranges can only appear at the end of the table, is that
right?  If so, couldn't you detect the number of unsummarised BRIN
blocks just by comparing the highest summarised BRIN block and the
current heap size?

About visibility maps:  How crazy would it be to estimate the number
of not-all-visible pages instead?  It would be less work to count that
since it would only increase when the *first* tuple is inserted into a
page that is currently all visible (ie when the bit is cleared), not
for every tuple inserted into any page like your inserts_since_vacuum
counter.  Another difference is that inserts_since_vacuum is reset
even if vacuum finds that it *can't* set the all-visible bit for a
given page yet because of some concurrent transaction.  In that case
the bit is still not set but autovacuum has no reason to be triggered
again.

-- 
Thomas Munro
http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Pavel Raiskup
Date:
Subject: [PATCH] parallel & isolated makefile for plpython
Next
From: Dmitry Dolgov
Date:
Subject: Re: [PATCH] Generic type subscription