Re: [PERFORM] encouraging index-only scans - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: [PERFORM] encouraging index-only scans
Date
Msg-id 20130906001437.GA29264@momjian.us
Whole thread Raw
In response to Re: [PERFORM] encouraging index-only scans  (Bruce Momjian <bruce@momjian.us>)
Responses Re: [PERFORM] encouraging index-only scans
List pgsql-hackers
On Wed, Sep  4, 2013 at 04:56:55PM -0400, Bruce Momjian wrote:
> > "Add a column pg_class.relallvisible to remember the number of pages
> > that were all-visible according to the visibility map as of the last
> > VACUUM
> > (or ANALYZE, or some other operations that update pg_class.relpages).
> > Use relallvisible/relpages, instead of an arbitrary constant, to
> > estimate how many heap page fetches can be avoided during an
> > index-only scan."
> > 
> > Have I missed some nuance?
> 
> I am looking back at this issue now and I think you are correct.  The
> commit you mention (Oct 7 2011) says ANALYZE updates the visibility map,
> and the code matches that:
> 
>         if (!inh)
>             vac_update_relstats(onerel,
>                                 RelationGetNumberOfBlocks(onerel),
>                                 totalrows,
> -->                                visibilitymap_count(onerel),
>                                 hasindex,
>                                 InvalidTransactionId);
> 
> so if an index scan was not being used after an ANALYZE, it isn't a bad
> allvisibile estimate but something else.  This code was in PG 9.2.

Actually, I now realize it is more complex than that, and worse.  There
are several questions to study to understand when pg_class.relallvisible
is updated (which is used to determine if index-only scans are a good
optimization choice), and when VM all-visible bits are set so heap pages
can be skipped during index-only scans:
1)  When are VM bits set:    vacuum (non-full)    analyze (only some random pages)2)  When are massive rows added but
VMbits not set:    copy3) When are VM bits cleared:    insert/update/delete    vacuum (non-full)4)  When are VM map
filescleared:    vacuum full    cluster5) When is pg_class.relallvisible updated via a VM map file scan:    vacuum
(non-full)   analyze
 

Vacuums run by autovacuum are driven by n_dead_tuples, which is only
update and delete.  Therefore, any scenario where vacuum (non-full) is
never run will not have significant VM bits set.  The only bits that
will be set will be by pages visited randomly by analyze.

The following table activities will not set proper VM bits:
               vacuum full               cluster               copy    insert-only

If updates and deletes happen, there will eventually be sufficient
reason for autovacuum to vacuum the table and set proper VM bits, and
pg_class.relallvisible.

The calculus we should use to determine when we need to run vacuum has
changed with index-only scans, and I am not sure we ever fully addressed
this.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +



pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: proposal: Set effective_cache_size to greater of .conf value, shared_buffers
Next
From: Robert Haas
Date:
Subject: Re: [PERFORM] encouraging index-only scans