Re: [HACKERS] Proposed cleanup of index-related planner estimation procedures - Mailing list pgsql-hackers

From Bernard Adrian Frankpitt
Subject Re: [HACKERS] Proposed cleanup of index-related planner estimation procedures
Date
Msg-id 3874D8D8.3534D782@pop.dn.net
Whole thread Raw
In response to Proposed cleanup of index-related planner estimation procedures  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Tom Lane wrote:
> 
> I am thinking about redefining and simplifying the planner's interface
> to index-type-dependent estimation routines.

Good Idea. I looked at that code quite closely in the past year, and I
agree that, though harmless, much of it seems bogus --- or at least not
well motivated by thorough analysis.  One reason for this is that the
problem of
computing index search costs is very difficult, the theoretical papers
that I could find on the subject are not terribly satisfactory. 
Currently only equals operators for hash and linear comparison operators
for nbtree really implement anything, the other access method operator
classes merely copy the nbtree operator algorithms.

> 
> What I'm thinking about doing is replacing these two per-index-type
> routines with a single routine, which is called once per proposed
> indexscan rather than once per qual clause.  It would receive the
> whole indexqual list as a parameter, instead of just one qual.
> A typical implementation would just call clausesel.c's general-purpose
> code to estimate the selectivity, and then do a little bit of extra
> work to derive the estimated number of index pages from that number.
> 
> I suppose the original reason for having amopselect at all was to allow
> exploitation of index-specific knowledge during selectivity estimation
> --- but none of the existing index types actually provide any such
> knowledge in their amopselect routines.  Still, this redesign preserves
> the flexibility for an index type to do something specialized.
>

Good, I have a special access method that needs to subvert the normal
optimizer in a way that ensures an index scan is used for every heap
access predicated on a particular attribute.  The reason for this is
that the data (representations of cells in a partition of a high
dimensional space) that would normally sit in heap tuples are actually
distributed through the index structure.  A query predicated on a
property of my special attributes needs to be executed with an index
scan, so I subvert the optimizer by coding amopselect and amopnpages to
always give zero cost for an index scan.  Since index scans are always
considered first, this hairy hack works. I would certainly breath more
easily at each new release of PostgreSQl if the ability of the system to
support this type of hack were a recognized feature.

> 
> A possible objection to this scheme is that the inputs and outputs
> of these routines would be structs that aren't full-fledged SQL types
> (and no, I'm not willing to promote parser expression trees into an
> SQL type ;-)).  But I don't think that's a real problem.  No one is
> going to be inventing new index types without doing a lot of C coding,

Yes, a lot of C-code, I have 10K lines in my access method.  What is
remarkable about Postgresql is that the interface between index access
methods and the rest of the system is so clean that this sort of project
is feasible. 

> so having to write the amopselect routines in C doesn't seem like a
> big drawback.

One thing that I have on my `really cool ideas' list would be to link
somehing like a Python interpreter and compiler into the backend.  one
would use the scripting language to write and debug stuff like this, and
then compile and dynamically link the debugged code into the backend. 
What I ended up doing in my index scheme was to code all the
mathematical algorithms in MATLAB and get them working there, and then
hand translate the MATLAB code to C (there is a lot of linear algebra in
the algorithms).  Debugging was a major pain.  With my idea you could
write the whole access method in a high-level language, and the low
level backend interfaces to things like buffer locking, MVCC and
postgresql memory management would be encapsulated by interfaces in the
high-level language. If there were something like that in PostgreSQL I
bet a lot more people would be rolling their own access methods.  I am
not sure that people who value stability over new features would see
this as a step in the right direction. ;-)

> 
> Comments, objections, better ideas?
> 
>                         regards, tom lane
> 
> ************


pgsql-hackers by date:

Previous
From: Don Baccus
Date:
Subject: Re: [HACKERS] Enhancing PGSQL to be compatible with Informix SQL
Next
From: Bruce Momjian
Date:
Subject: Re: [HACKERS] Enhancing PGSQL to be compatible with Informix SQL