pgsql: Discount the metapage when estimating number of index pages visi - Mailing list pgsql-committers

From Tom Lane
Subject pgsql: Discount the metapage when estimating number of index pages visi
Date
Msg-id E1w3ew1-000gLf-0x@gemulon.postgresql.org
Whole thread Raw
List pgsql-committers
Discount the metapage when estimating number of index pages visited.

genericcostestimate() estimates the number of index leaf pages to
be visited as a pro-rata fraction of the total number of leaf pages.
Or at least that was the intention.  What it actually used in the
calculation was the total number of index pages, so that non-leaf
pages were also counted.  In a decent-sized index the error is
probably small, since we expect upper page fanout to be high.
But in a small index that's not true; in the worst case with one
data-bearing page plus a metapage, we had 100% relative error.
This led to surprising planning choices such as not using a small
partial index.

To fix, ask genericcostestimate's caller to supply an estimate of
the number of non-leaf pages, and subtract that.  For the built-in
index AMs, it seems sufficient to count the index metapage (if the
AM uses one) as non-leaf.  Per the above argument, counting upper
index pages shouldn't change the estimate much, and in most cases
we don't have any easy way of estimating the number of upper pages.
This might be an area for further research in future.

Any external genericcostestimate callers that do not set the new field
GenericCosts.numNonLeafPages will see the same behavior as before,
assuming they followed the advice to zero out that whole struct.

Unsurprisingly, this change affects a number of plans seen in the
core regression tests.  I hacked up the existing tests to keep the
tests' plans the same, since in each case it appeared that the
test's intent was to test exactly that plan.  Also add one new
test case demonstrating that a better index choice is now made.

Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Henson Choi <assam258@gmail.com>
Discussion: https://postgr.es/m/870521.1745860752@sss.pgh.pa.us

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/733f20df53721b5be0481afe9f26b9c2a4d51712

Modified Files
--------------
contrib/bloom/blcost.c                |  3 +++
src/backend/utils/adt/selfuncs.c      | 32 ++++++++++++++++++++++++++++----
src/include/utils/selfuncs.h          |  7 +++++++
src/test/regress/expected/join.out    |  8 +++++---
src/test/regress/expected/memoize.out |  2 ++
src/test/regress/expected/select.out  | 15 ++++++++++++++-
src/test/regress/sql/join.sql         | 10 ++++++----
src/test/regress/sql/memoize.sql      |  2 ++
src/test/regress/sql/select.sql       | 11 ++++++++++-
9 files changed, 77 insertions(+), 13 deletions(-)


pgsql-committers by date:

Previous
From: Alexander Korotkov
Date:
Subject: pgsql: Fix self-join removal to update bare Var references in join clau
Next
From: Nathan Bossart
Date:
Subject: pgsql: Bump transaction/multixact ID warning limits to 100M.