Thread: Better estimates of index correlation

Better estimates of index correlation

From
Tom Lane
Date:
Currently, we don't measure any statistics about the ordering
correlation of multi-column indexes, which means that btcostestimate
has to pick a number out of the air if there's more than one column.
We've been around on that at least once already: it used to use first
column's correlation divided by number of columns, and now uses first
column's correlation times 0.75, and neither of those approaches have
any theoretical basis whatsoever.  I got started thinking about this
again in connection with John Surcombe's recent complaint on
pgsql-performance, which seems likely to be related to having a good
correlation estimate for one index but not another.

It strikes me that it'd be possible to have btvacuumcleanup directly
measure order correlation when it's processing a btree index, yielding a
reliable answer for any btree index regardless of number of columns.
We could do that by comparing the heap block numbers of adjacent
index entries' TIDs and counting the number of up-transitions (block
number larger than previous index entry) versus down-transitions (block
number smaller than previous).  Since btvacuumcleanup scans the index in
physical order not logical order, we could not meaningfully make
comparisons between the last entry on an index page and the first entry
on the next, but I think it's quite reasonable to assume that the
statistics of those comparisons would be the same as the statistics of
the intra-page comparisons.  So at the end of the cleanup scan we would
have total numbers of up-transitions and down-transitions.  It's clear
that a perfectly correlated index would have many up-transitions and no
down-transitions; a perfectly reverse-correlated index would have the
opposite; and an uncorrelated index would have about equal numbers of
them.  So we could approximate the index correlation with something like
(up_transition_count - down_transition_count) /(up_transition_count + down_transition_count)

My statistics are a bit rusty so I'm not sure if we need a square or
square root or something in there, but it seems like this would work,
and would add only a negligible amount of overhead.  We could then have
VACUUM save the number into pg_statistic or pg_index, and away we go.

I'm not planning to do anything about this idea right now, since I'm
still hip-deep in collations, but I thought I'd throw it out to get
it on the record.

Comments?
        regards, tom lane


Re: Better estimates of index correlation

From
"Joshua D. Drake"
Date:
On Sun, 2011-03-13 at 19:40 -0400, Tom Lane wrote:

> I'm not planning to do anything about this idea right now, since I'm
> still hip-deep in collations, but I thought I'd throw it out to get
> it on the record.
> 
> Comments?

One question: Where is the overhead increase? 

JD

> 
>             regards, tom lane
> 

-- 
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering
http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt



Re: Better estimates of index correlation

From
Alvaro Herrera
Date:
Excerpts from Joshua D. Drake's message of dom mar 13 23:20:01 -0300 2011:
> On Sun, 2011-03-13 at 19:40 -0400, Tom Lane wrote:
> 
> > I'm not planning to do anything about this idea right now, since I'm
> > still hip-deep in collations, but I thought I'd throw it out to get
> > it on the record.
> > 
> > Comments?
> 
> One question: Where is the overhead increase? 

During VACUUM, in the pass that processes indexes.

I think Tom is sligthly confused though: AFAICT this must happen in
btvacuumscan (which does the actual scan), not btvacuumcleanup (which
may not do it, if btbulkdelete did it previously).  Which means it would
be done for each pass over the index when vacuuming a relation, because
I don't see any way for this function to determine whether this is the
last pass we'll do over the index.

It sure would be nice to be able to do it only during the last scan.

-- 
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: Better estimates of index correlation

From
Heikki Linnakangas
Date:
On 14.03.2011 16:09, Alvaro Herrera wrote:
> Excerpts from Joshua D. Drake's message of dom mar 13 23:20:01 -0300 2011:
>> On Sun, 2011-03-13 at 19:40 -0400, Tom Lane wrote:
>>
>>> I'm not planning to do anything about this idea right now, since I'm
>>> still hip-deep in collations, but I thought I'd throw it out to get
>>> it on the record.
>>>
>>> Comments?
>>
>> One question: Where is the overhead increase?
>
> During VACUUM, in the pass that processes indexes.
>
> I think Tom is sligthly confused though: AFAICT this must happen in
> btvacuumscan (which does the actual scan), not btvacuumcleanup (which
> may not do it, if btbulkdelete did it previously).  Which means it would
> be done for each pass over the index when vacuuming a relation, because
> I don't see any way for this function to determine whether this is the
> last pass we'll do over the index.
>
> It sure would be nice to be able to do it only during the last scan.

Can't we do it at ANALYZE? If the estimate is only based on intra-page 
comparisons anyway, a sample of random pages ought to be enough.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Better estimates of index correlation

From
Robert Haas
Date:
On Mon, Mar 14, 2011 at 10:09 AM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:
> Excerpts from Joshua D. Drake's message of dom mar 13 23:20:01 -0300 2011:
>> On Sun, 2011-03-13 at 19:40 -0400, Tom Lane wrote:
>>
>> > I'm not planning to do anything about this idea right now, since I'm
>> > still hip-deep in collations, but I thought I'd throw it out to get
>> > it on the record.
>> >
>> > Comments?
>>
>> One question: Where is the overhead increase?
>
> During VACUUM, in the pass that processes indexes.
>
> I think Tom is sligthly confused though: AFAICT this must happen in
> btvacuumscan (which does the actual scan), not btvacuumcleanup (which
> may not do it, if btbulkdelete did it previously).  Which means it would
> be done for each pass over the index when vacuuming a relation, because
> I don't see any way for this function to determine whether this is the
> last pass we'll do over the index.
>
> It sure would be nice to be able to do it only during the last scan.

Does it really matter?  What Tom was describing sounded embarassingly cheap.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Better estimates of index correlation

From
Alvaro Herrera
Date:
Excerpts from Robert Haas's message of lun mar 14 11:18:24 -0300 2011:
> On Mon, Mar 14, 2011 at 10:09 AM, Alvaro Herrera
> <alvherre@commandprompt.com> wrote:

> > It sure would be nice to be able to do it only during the last scan.
> 
> Does it really matter?  What Tom was describing sounded embarassingly cheap.

Well, you only do multiple passes for tables that are really large, so
it's precisely there that you want to save the extra overhead of having
to do it multiple times.

As Heikki says, maybe this wouldn't be an issue at all if we can do it
during ANALYZE instead, but I don't know if that works.

-- 
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: Better estimates of index correlation

From
Robert Haas
Date:
On Mon, Mar 14, 2011 at 10:25 AM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:
> Excerpts from Robert Haas's message of lun mar 14 11:18:24 -0300 2011:
>> On Mon, Mar 14, 2011 at 10:09 AM, Alvaro Herrera
>> <alvherre@commandprompt.com> wrote:
>
>> > It sure would be nice to be able to do it only during the last scan.
>>
>> Does it really matter?  What Tom was describing sounded embarassingly cheap.
>
> Well, you only do multiple passes for tables that are really large, so
> it's precisely there that you want to save the extra overhead of having
> to do it multiple times.

Right but if the overhead is 0.0001% then who cares?  I have no idea
what the real number is but it doesn't seem unrealistic to me that it
could be on that order.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Better estimates of index correlation

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Excerpts from Robert Haas's message of lun mar 14 11:18:24 -0300 2011:
>> Does it really matter?  What Tom was describing sounded embarassingly cheap.

That was my thought exactly.  If you could even measure the added cost
of doing that, I'd be astonished.  It'd be adding one comparison-and-
possible-assignment to a loop that also has to invoke a binary search
of a TID array --- a very large array, in the cases we're worried about.
I'd put the actual update of pg_statistic somewhere where it only
happened once, but I don't especially care if the stat gets computed on
each index scan.

> As Heikki says, maybe this wouldn't be an issue at all if we can do it
> during ANALYZE instead, but I don't know if that works.

I'm not convinced you can get a sufficiently good estimate from a small
subset of pages.

I actually started with the idea of having ANALYZE try to calculate
correlation for multi-column indexes the same way it now calculates it
for individual data columns, but when this idea occurred to me it just
seemed a whole lot better.  Note that we could remove the correlation
calculations from ANALYZE altogether.
        regards, tom lane


Re: Better estimates of index correlation

From
Robert Haas
Date:
On Mon, Mar 14, 2011 at 10:38 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Note that we could remove the correlation
> calculations from ANALYZE altogether.

Only if you don't mind having them only get updated when somebody
vacuums.  If a table is mostly getting inserted into, it may not get
vacuumed very often (or possibly even - never).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Better estimates of index correlation

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Mon, Mar 14, 2011 at 10:38 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Note that we could remove the correlation
>> calculations from ANALYZE altogether.

> Only if you don't mind having them only get updated when somebody
> vacuums.  If a table is mostly getting inserted into, it may not get
> vacuumed very often (or possibly even - never).

No, I don't mind that.  Index correlation is a pretty second-order stat,
and most of the time it'll be just fine if the estimate stays at the
default zero.  The situation that's problematic is where you have a
very-far-from-zero number for one index and no estimate for another,
because that can incorrectly bias the planner to prefer the first index;
which is what I think is happening in Surcombe's case.  The approach
I'm describing would guarantee that all indexes of a table get their
correlation estimates updated at the same time.  (Memo to self: we'd
also want btbuild to compute this stat, so that a newly created index
doesn't start out at a disadvantage compared to others.)

[ thinks for a bit... ]  Although there *is* a small practical problem
with having VACUUM update pg_statistic: a plain VACUUM never takes out
an XID.  I guess we could have it do the update nontransactionally, the
same way it updates relpages/reltuples.
        regards, tom lane


Re: Better estimates of index correlation

From
Josh Berkus
Date:
>> As Heikki says, maybe this wouldn't be an issue at all if we can do it
>> during ANALYZE instead, but I don't know if that works.
> 
> I'm not convinced you can get a sufficiently good estimate from a small
> subset of pages.

Note that if this requires VACUUM rather than ANALYZE, it introduces a
problem for data warehousing users, who can go years between vacuums of
their largest tables.

--                                  -- Josh Berkus                                    PostgreSQL Experts Inc.
                        http://www.pgexperts.com
 


Re: Better estimates of index correlation

From
Tom Lane
Date:
Josh Berkus <josh@agliodbs.com> writes:
>> I'm not convinced you can get a sufficiently good estimate from a small
>> subset of pages.

> Note that if this requires VACUUM rather than ANALYZE, it introduces a
> problem for data warehousing users, who can go years between vacuums of
> their largest tables.

It's likely that the default estimate of zero index correlation will
work just fine for such users ...
        regards, tom lane


Re: Better estimates of index correlation

From
Josh Berkus
Date:
On 3/14/11 5:51 PM, Tom Lane wrote:
> Josh Berkus <josh@agliodbs.com> writes:
>>> I'm not convinced you can get a sufficiently good estimate from a small
>>> subset of pages.
> 
>> Note that if this requires VACUUM rather than ANALYZE, it introduces a
>> problem for data warehousing users, who can go years between vacuums of
>> their largest tables.
> 
> It's likely that the default estimate of zero index correlation will
> work just fine for such users ...

No, it isn't.  In fact, users with very large tables are the ones
hardest hit by our lack of correlation stats.

--                                  -- Josh Berkus                                    PostgreSQL Experts Inc.
                        http://www.pgexperts.com
 


Re: Better estimates of index correlation

From
Greg Stark
Date:
On Tue, Mar 15, 2011 at 12:27 AM, Josh Berkus <josh@agliodbs.com> wrote:
> Note that if this requires VACUUM rather than ANALYZE, it introduces a
> problem for data warehousing users, who can go years between vacuums of
> their largest tables.

I don't understand, are they going years between vacuums because their
data is static? In which case the index correlation won't change. Or
is it append-only, in which case I suspect the newly appended data is
likely to have the same correlation as the old data. But is there
anything stopping us from doing some sort of ANALYZE-style sample of
the index pages as well?

I think the bigger problems here are that a) correlation isn't
actually a useful statistic for estimating random seeks and b) I'm not
sure what counting back-transitions has to do with correlation.

If we're lucky it may be that counting back-transitions is a more
useful stat than correlation anyways. It does seem to have more to do
with random seeks than correlation. It might need some refinement
though or some other metrics to go along with it to get a real basis
for an estimate of random seeks though. I'm wondering about how far
the back-transitions and forward transitions actually go. If you're
skipping every other block that's twice as much i/o as reading every
block, but if you skip n blocks where n >
random_page_cost/seq_page_cost then that's one random read per block.
If you skip 1 block backwards to a block you've already read then
that's free, but if you skip backwards to a block that isn't recently
referenced that's a random seek.

There are also niggling questions about taking statistics based on
tuples that are already dead or were never committed.

-- 
greg


Re: Better estimates of index correlation

From
Josh Berkus
Date:
> I don't understand, are they going years between vacuums because their
> data is static? In which case the index correlation won't change. Or
> is it append-only, in which case I suspect the newly appended data is
> likely to have the same correlation as the old data. 

Append-only.  And yes, one could assume that correlation wouldn't change
frequently.   However, it may change more frequently than vacuums occur
-- I'm not exaggerating about "years".  I have several clients with
large databases where they have log tables which only get vacuumed for
XID wraparound, once every 2 years or so.

There's also the question of how we get correlation stats for a new
index/table, or one which has just been upgraded.  Requiring a full DB
vacuum isn't practical for those using pg_upgrade.

> But is there
> anything stopping us from doing some sort of ANALYZE-style sample of
> the index pages as well?

This would be ideal.  Or even a separate command to scan the indexes
only to collect correlation data.  Since the indexes are 20X to 100X
smaller than the tables (usually), it may be practical to full-scan them
even if we can't do the same with the table.

--                                  -- Josh Berkus                                    PostgreSQL Experts Inc.
                        http://www.pgexperts.com
 


Re: Better estimates of index correlation

From
Jeff Davis
Date:
On Sun, 2011-03-13 at 19:40 -0400, Tom Lane wrote:
> It strikes me that it'd be possible to have btvacuumcleanup directly
> measure order correlation when it's processing a btree index, yielding a
> reliable answer for any btree index regardless of number of columns.
> We could do that by comparing the heap block numbers of adjacent
> index entries' TIDs and counting the number of up-transitions (block
> number larger than previous index entry) versus down-transitions (block
> number smaller than previous). 

Link to previous discussion:

http://archives.postgresql.org/pgsql-hackers/2008-10/msg01279.php

Regards,Jeff Davis