Re: VACUUM's ancillary tasks - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: VACUUM's ancillary tasks |
Date | |
Msg-id | CAD21AoDfXVZHTg7uUnsccPEJKkLSNTwSkokJ_=DJpeLtS9+GzA@mail.gmail.com Whole thread Raw |
In response to | Re: VACUUM's ancillary tasks (Robert Haas <robertmhaas@gmail.com>) |
List | pgsql-hackers |
On Fri, Oct 7, 2016 at 6:46 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Thu, Oct 6, 2016 at 3:56 PM, Jeff Janes <jeff.janes@gmail.com> wrote: >>> Sure, I could handle each case separately, but the goal of this patch >>> (as hinted at in the Subject) is to generalize all the different tasks >>> we've been giving to VACUUM. The only missing piece is what the first >>> patch addresses; which is insert-mostly tables never getting vacuumed. >> >> I don't buy the argument that we should do this in order to be "general". >> Those other pieces are present to achieve a specific job, not out of >> generality. > > +1. > >> If we want to have something to vacuum insert-mostly tables for the sake of >> the index-only-scans, then I don't think we can ignore the fact that the >> visibility map is page based, not tuple based. If we insert 10,000 tuples >> into a full table and they all go into 100 pages at the end, that is very >> different than inserting 10,000 tuples which each go into a separate page >> (because each page has that amount of freespace). In one case you have >> 10,000 tuples not marked as all visible, in the other case you have >> 1,000,000 not marked as all visible. > > +1. > >> Also, I think that doing more counts which get amalgamated into the same >> threshold, rather than introducing another parameter, is a bad thing. I >> have insert-mostly, most of the time, tables which are never going to >> benefit from index-only-scans, and I don't want to pay the cost of them >> getting vacuumed just because of some inserts, when I will never get a >> benefit out of it. I could turn autovacuum off for those tables, but then I >> would have to remember to manually intervene on the rare occasion they do >> get updates or deletes. I want to be able to manually pick which tables I >> sign up for this feature (and then forget it), not manually pick which ones >> to exempt from it and have to constantly review that. > > Of course, if you do that, then what will happen is eventually it will > be time to advance relfrozenxid for that relation, and you'll get a > giant soul-crushing VACUUM of doom, rather than a bunch of small, > hopefully-innocuous VACUUMs. I've been wondering if would make sense > to trigger vacuums based on inserts based on a *fixed* number of > pages, rather than a percentage of the table. Say, for example, > whenever we have 64MB of not-all-visible pages, we VACUUM. +1 > > There are some difficulties: > > 1. We don't want to vacuum too early. For example, a bulk load may > vastly inflate the size of a table, but vacuuming it while the load is > in progress will be useless. You can maybe avoid that problem by > basing this on statistics only reported at the end of the transaction, > but there's another, closely-related issue: vacuuming right after the > transaction completes may be useless, too, if there are old snapshots > still in existence that render the newly-inserted tuples > not-all-visible. If the dummy xid can be generated for vacuum before starting vacuum for maintenance vm which is triggered by the amount of the cleared vm page, that vacuum could wait for old transaction finishes. > 2. We don't want to trigger index vacuuming for a handful of dead > tuples, or at least not too often. I've previously suggested > requiring a certain minimum number of dead index tuples that would be > required before index vacuuming occurs; prior to that, we'd just > truncate to dead line pointers and leave those for the next vacuum > cycle. This might need some refinement - should it be page-based? - > but the basic idea still seems sound. > Where do we leave dead line pointers at? Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
pgsql-hackers by date: