Re: VACUUM's ancillary tasks - Mailing list pgsql-hackers

From Robert Haas
Subject Re: VACUUM's ancillary tasks
Date
Msg-id CA+TgmobKPFJKZ7MQRVfur8FCBfXQKxDN8vz7FOsGG8vjMojZrg@mail.gmail.com
Whole thread Raw
In response to Re: VACUUM's ancillary tasks  (Jeff Janes <jeff.janes@gmail.com>)
Responses Re: VACUUM's ancillary tasks  (Jeff Janes <jeff.janes@gmail.com>)
Re: VACUUM's ancillary tasks  (Masahiko Sawada <sawada.mshk@gmail.com>)
List pgsql-hackers
On Thu, Oct 6, 2016 at 3:56 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
>> Sure, I could handle each case separately, but the goal of this patch
>> (as hinted at in the Subject) is to generalize all the different tasks
>> we've been giving to VACUUM.  The only missing piece is what the first
>> patch addresses; which is insert-mostly tables never getting vacuumed.
>
> I don't buy the argument that we should do this in order to be "general".
> Those other pieces are present to achieve a specific job, not out of
> generality.

+1.

> If we want to have something to vacuum insert-mostly tables for the sake of
> the index-only-scans, then I don't think we can ignore the fact that the
> visibility map is page based, not tuple based.  If we insert 10,000 tuples
> into a full table and they all go into 100 pages at the end, that is very
> different than inserting 10,000 tuples which each go into a separate page
> (because each page has that amount of freespace).  In one case you have
> 10,000 tuples not marked as all visible, in the other case you have
> 1,000,000 not marked as all visible.

+1.

> Also, I think that doing more counts which get amalgamated into the same
> threshold, rather than introducing another parameter, is a bad thing.  I
> have insert-mostly, most of the time, tables which are never going to
> benefit from index-only-scans, and I don't want to pay the cost of them
> getting vacuumed just because of some inserts, when I will never get a
> benefit out of it.  I could turn autovacuum off for those tables, but then I
> would have to remember to manually intervene on the rare occasion they do
> get updates or deletes.  I want to be able to manually pick which tables I
> sign up for this feature (and then forget it), not manually pick which ones
> to exempt from it and have to constantly review that.

Of course, if you do that, then what will happen is eventually it will
be time to advance relfrozenxid for that relation, and you'll get a
giant soul-crushing VACUUM of doom, rather than a bunch of small,
hopefully-innocuous VACUUMs.  I've been wondering if would make sense
to trigger vacuums based on inserts based on a *fixed* number of
pages, rather than a percentage of the table.  Say, for example,
whenever we have 64MB of not-all-visible pages, we VACUUM.

There are some difficulties:

1. We don't want to vacuum too early.  For example, a bulk load may
vastly inflate the size of a table, but vacuuming it while the load is
in progress will be useless.  You can maybe avoid that problem by
basing this on statistics only reported at the end of the transaction,
but there's another, closely-related issue: vacuuming right after the
transaction completes may be useless, too, if there are old snapshots
still in existence that render the newly-inserted tuples
not-all-visible.

2. We don't want to trigger index vacuuming for a handful of dead
tuples, or at least not too often.  I've previously suggested
requiring a certain minimum number of dead index tuples that would be
required before index vacuuming occurs; prior to that, we'd just
truncate to dead line pointers and leave those for the next vacuum
cycle.  This might need some refinement - should it be page-based? -
but the basic idea still seems sound.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Switch to unnamed POSIX semaphores as our preferred sema code?
Next
From: Robert Haas
Date:
Subject: Re: pgbench vs. wait events