Re: [PERFORM] encouraging index-only scans - Mailing list pgsql-hackers

From Jeff Janes
Subject Re: [PERFORM] encouraging index-only scans
Date
Msg-id CAMkU=1xK7yL9SNH2Xprnws2C-fMtgBd07eAQcpDD8Zhj9LJtUQ@mail.gmail.com
Whole thread Raw
In response to Re: [PERFORM] encouraging index-only scans  (Bruce Momjian <bruce@momjian.us>)
Responses Re: [PERFORM] encouraging index-only scans  (Claudio Freire <klaussfreire@gmail.com>)
List pgsql-hackers
On Tue, Feb 11, 2014 at 9:12 AM, Bruce Momjian <bruce@momjian.us> wrote:
On Tue, Feb 11, 2014 at 11:28:36AM -0500, Robert Haas wrote:
> A sequential scan will set hint bits and will prune the page, but
> pruning the page doesn't ever mark it all-visible; that logic is
> entirely in vacuum.  If that could be made cheap enough to be
> negligible, it might well be worth doing in heap_page_prune().  I
> think there might be a way to do that, but it's a bit tricky because
> the pruning logic iterates over the page in a somewhat complex way,
> not just a straightforward scan of all the item pointers the way the
> existing logic doesn't.  It would be pretty cool if we could just use
> a bit out of the heap-prune xlog record to indicate whether the
> all-visible bit should be set; then we'd gain the benefit of marking
> things all-visible much more often without needing vacuum.
>
> That doesn't help insert-only tables much, though, because those won't
> require pruning.  We set hint bits (which dirties the page) but
> currently don't write WAL.  We'd have to change that to set the
> all-visible bit when scanning such a table, and that would be
> expensive.  :-(

Yes, that pretty much sums it up.  We introduced index-only scans in 9.2
(2012) but they still seem to be not usable for insert-only workloads
two years later.  Based on current progress, it doesn't look like this
will be corrected until 9.5 (2015).  I am kind of confused why this has
not generated more urgency.


For insert and select only, they are usable (if your queries are of the type that could benefit from them), you just have to do some manual intervention.  The list of features that sometimes require a DBA to do something to make maximum use of them under some circumstance would be a long one.  It would be nice if it were better, but I don't see why this feature is particularly urgent compared to all the other things that could be improved.  In particular I think the Freezing without IO is much more important.  Freezing is rather unimportant until suddenly it is is the most important thing in the universe.  If we could stop worrying about that, I think it would free up other aspects of vacuum scheduling to have more meddling/optimization done to it.

 

I guess my question is what approach do we want to take to fixing this?
If we are doing pruning, aren't we emitting WAL?  You are right that for
an insert-only workload, we aren't going to prune, but if pruning WAL
overhead is acceptable for a sequential scan, isn't index-only
page-all-visible WAL overhead acceptable?


We often don't find that pruning particularly acceptable in seq scans, and there is a patch pending to conditionally turn it off for them.
 

Do we want to track the number of inserts in statistics and trigger an
auto-vacuum after a specified number of inserts?

We track relpages and relallvisible, which seems like a more direct measure.  Once analyze is done (which is already triggered by inserts) and sets those, it could fire a vacuum based on the ratio of those values, or the autovac process could just look at the ratio after naptime.  So just introduce autovacuum_vacuum_visible_factor. A problem there is that it would be a lot of work to aggressively keep the ratio high, and pointless if the types of queries done on that table don't benefit from IOS anyway, or if pages are dirtied so rapidly that no amount of vacuuming will keep the ratio high.  Would we try to automatically tell which tables were which, or rely on the DBA setting per-table autovacuum_vacuum_visible_factor for tables that differ from the database norm?
 
 The problem there is
that we really don't need to do any index cleanup, which is what vacuum
typically does --- we just want to scan the table and set the
all-visible bits, so that approach seems non-optimal.

In the case of no updates or deletes (or aborted inserts?), there would be nothing to clean up in the indexes and that step would be skipped (already in the current code). And if the indexes do need cleaning up, we certainly can't set the page all visible without doing that clean up.
 
Cheers,

Jeff

pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: [PERFORM] encouraging index-only scans
Next
From: Bruce Momjian
Date:
Subject: Re: PostgreSQL Failback without rebuild