Re: [HACKERS] Proposed cleanup of index-related planner estimation procedures - Mailing list pgsql-hackers
From | Bernard Adrian Frankpitt |
---|---|
Subject | Re: [HACKERS] Proposed cleanup of index-related planner estimation procedures |
Date | |
Msg-id | 3874D8D8.3534D782@pop.dn.net Whole thread Raw |
In response to | Proposed cleanup of index-related planner estimation procedures (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-hackers |
Tom Lane wrote: > > I am thinking about redefining and simplifying the planner's interface > to index-type-dependent estimation routines. Good Idea. I looked at that code quite closely in the past year, and I agree that, though harmless, much of it seems bogus --- or at least not well motivated by thorough analysis. One reason for this is that the problem of computing index search costs is very difficult, the theoretical papers that I could find on the subject are not terribly satisfactory. Currently only equals operators for hash and linear comparison operators for nbtree really implement anything, the other access method operator classes merely copy the nbtree operator algorithms. > > What I'm thinking about doing is replacing these two per-index-type > routines with a single routine, which is called once per proposed > indexscan rather than once per qual clause. It would receive the > whole indexqual list as a parameter, instead of just one qual. > A typical implementation would just call clausesel.c's general-purpose > code to estimate the selectivity, and then do a little bit of extra > work to derive the estimated number of index pages from that number. > > I suppose the original reason for having amopselect at all was to allow > exploitation of index-specific knowledge during selectivity estimation > --- but none of the existing index types actually provide any such > knowledge in their amopselect routines. Still, this redesign preserves > the flexibility for an index type to do something specialized. > Good, I have a special access method that needs to subvert the normal optimizer in a way that ensures an index scan is used for every heap access predicated on a particular attribute. The reason for this is that the data (representations of cells in a partition of a high dimensional space) that would normally sit in heap tuples are actually distributed through the index structure. A query predicated on a property of my special attributes needs to be executed with an index scan, so I subvert the optimizer by coding amopselect and amopnpages to always give zero cost for an index scan. Since index scans are always considered first, this hairy hack works. I would certainly breath more easily at each new release of PostgreSQl if the ability of the system to support this type of hack were a recognized feature. > > A possible objection to this scheme is that the inputs and outputs > of these routines would be structs that aren't full-fledged SQL types > (and no, I'm not willing to promote parser expression trees into an > SQL type ;-)). But I don't think that's a real problem. No one is > going to be inventing new index types without doing a lot of C coding, Yes, a lot of C-code, I have 10K lines in my access method. What is remarkable about Postgresql is that the interface between index access methods and the rest of the system is so clean that this sort of project is feasible. > so having to write the amopselect routines in C doesn't seem like a > big drawback. One thing that I have on my `really cool ideas' list would be to link somehing like a Python interpreter and compiler into the backend. one would use the scripting language to write and debug stuff like this, and then compile and dynamically link the debugged code into the backend. What I ended up doing in my index scheme was to code all the mathematical algorithms in MATLAB and get them working there, and then hand translate the MATLAB code to C (there is a lot of linear algebra in the algorithms). Debugging was a major pain. With my idea you could write the whole access method in a high-level language, and the low level backend interfaces to things like buffer locking, MVCC and postgresql memory management would be encapsulated by interfaces in the high-level language. If there were something like that in PostgreSQL I bet a lot more people would be rolling their own access methods. I am not sure that people who value stability over new features would see this as a step in the right direction. ;-) > > Comments, objections, better ideas? > > regards, tom lane > > ************
pgsql-hackers by date: