Proposed cleanup of index-related planner estimation procedures - Mailing list pgsql-hackers

From Tom Lane
Subject Proposed cleanup of index-related planner estimation procedures
Date
Msg-id 20065.947132708@sss.pgh.pa.us
Whole thread Raw
Responses RE: [HACKERS] Proposed cleanup of index-related planner estimation procedures
List pgsql-hackers
I am thinking about redefining and simplifying the planner's interface
to index-type-dependent estimation routines.

Currently, each defined index type is supposed to supply two routines:
an "amopselect" routine and an "amopnpages" routine.  (The existing
actual routines of this kind are btreesel, btreenpages, etc in
src/backend/utils/adt/selfuncs.c.)  These things are called by
index_selectivity() in src/backend/optimizer/util/plancat.c.  amopselect
tries to determine the selectivity of an indexqual (the fraction of
main-table tuples it will select) and amopnpages tries to determine
the number of index pages that will be read to do it.

Now, this collection of code is largely redundant with
optimizer/path/clausesel.c, which also tries to estimate the selectivity
of qualification conditions.  Furthermore, the interface to these
routines is fundamentally misdesigned, because there is no way to deal
with interrelated qualification conditions --- for example, if we have
a range query like "... WHERE x > 10 AND x < 20", the code estimates
the selectivity as the product of the selectivities of the two terms
independently, but the actual selectivity is very different from that.
I am working on fixing clausesel.c to be smarter about correlated
conditions, but it won't do much good to fix that code without fixing
the index-related code.

What I'm thinking about doing is replacing these two per-index-type
routines with a single routine, which is called once per proposed
indexscan rather than once per qual clause.  It would receive the
whole indexqual list as a parameter, instead of just one qual.
A typical implementation would just call clausesel.c's general-purpose
code to estimate the selectivity, and then do a little bit of extra
work to derive the estimated number of index pages from that number.

I suppose the original reason for having amopselect at all was to allow
exploitation of index-specific knowledge during selectivity estimation
--- but none of the existing index types actually provide any such
knowledge in their amopselect routines.  Still, this redesign preserves
the flexibility for an index type to do something specialized.

A possible objection to this scheme is that the inputs and outputs
of these routines would be structs that aren't full-fledged SQL types
(and no, I'm not willing to promote parser expression trees into an
SQL type ;-)).  But I don't think that's a real problem.  No one is
going to be inventing new index types without doing a lot of C coding,
so having to write the amopselect routines in C doesn't seem like a
big drawback.

Comments, objections, better ideas?
        regards, tom lane


pgsql-hackers by date:

Previous
From: Stephen Birch
Date:
Subject: Re: [HACKERS] Re: ECPG documentation
Next
From: The Hermit Hacker
Date:
Subject: UdmSearch: tables vs indices ...