Thread: Implied Functional index use (redux)
In a thread in July last year, I raised the possibility of transforming a query to allow functional indexes to be utilised automatically. http://archives.postgresql.org/pgsql-hackers/2006-07/msg00323.php This idea can work and has many benefits, but there are some complexities. I want to summarise those issues first, then make a more practical and hopefully more acceptable proposal. Taken together the complexities would have lead us to have additional TRANSFORMABLE clauses on TYPEs, FUNCTIONs and potentially encoding schemes. All of which, I agree, just too much complexity to allow this to be specified. One example of this was FLOAT, where -0 and +0 are equal but not the same in a binary form. That would normally mean we couldn't use FLOAT for TRANSFORMABLE indexes, but of course what happens if we specify a partial functional index, where we only index values > 0. In that case, we *can* use the transform technique again. Worse still we may have a full (non-partial) index where there is a constraint on the column(s) such as CHECK (value > 0). So we'd need another heavy dose of catalog-complexity to catch all the special cases. Yuck and double Yuck. Even if we did that, it isn't easy for a data type author to tell whether their type is transformable, or not **in all cases**. That would probably lead to people saying DISABLE TRANSFORM for their data type, just in case. Which means no benefit in practice with this feature. - - - A simpler, alternate proposal is to allow the user to specify whether a functional index is transformable or not using CREATE or ALTER INDEX, with a default of not transformable. That then leaves the responsibility for specifying this with the user, who as we have seen is the really only person really capable of judging the whole case on its merits. e.g. CREATE INDEX fooidx ON foo (foofunc(foocol1)) [TABLESPACE ...] [ENABLE|DISABLE TRANSFORM] [WHERE ...]; ENABLE TRANSFORM is only possible for functional indexes. Suggestions for better syntax/naming welcome. Placing the TRANSFORM clause on the index as a simple boolean makes utilising the feature more streamlined at planning time too. This would be an extra initial check in create_index_paths() to see if the query might benefit from transform. Most indexable WHERE clauses would be able to be transformed, if the index allows. The feature would be enabled by default with a GUC, but as stated above, the default for each index would be to *not* transform unless specifically requested by the user. enable_index_transform = on (default)| off EXPLAIN would not need alteration, since the modified query would show up clearly in the output. (I can add explicit visibility if people want that). Overall, a fairly isolated patch, with little user interface changes. All of the complexities would be very clearly documented as part of this feature. That is essential to avoid user error, of which I am mindful. But the technique has much promise, so I would like to make this option available to designers and DBAs. If we can agree this smoothly, then it seems possible for 8.3. Comments? -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
"Simon Riggs" <simon@2ndquadrant.com> writes: > A simpler, alternate proposal is to allow the user to specify whether a > functional index is transformable or not using CREATE or ALTER INDEX, > with a default of not transformable. That then leaves the responsibility > for specifying this with the user, who as we have seen is the really > only person really capable of judging the whole case on its merits. > e.g. CREATE INDEX fooidx ON foo (foofunc(foocol1)) > [TABLESPACE ...] [ENABLE|DISABLE TRANSFORM] [WHERE ...]; This is a foot-gun and nothing else. I hardly think the average DBA will realize such subtleties as "numeric equality doesn't guarantee that such-and-such works". If it's not specified by the datatype author it's not going to be safe. In fact, this doesn't work anyway, since it still doesn't address the question of which "equality" operators we think permit us to apply the transform. regards, tom lane
On Thu, 2007-01-25 at 16:20 -0500, Tom Lane wrote: > "Simon Riggs" <simon@2ndquadrant.com> writes: > > A simpler, alternate proposal is to allow the user to specify whether a > > functional index is transformable or not using CREATE or ALTER INDEX, > > with a default of not transformable. That then leaves the responsibility > > for specifying this with the user, who as we have seen is the really > > only person really capable of judging the whole case on its merits. > > > e.g. CREATE INDEX fooidx ON foo (foofunc(foocol1)) > > [TABLESPACE ...] [ENABLE|DISABLE TRANSFORM] [WHERE ...]; > > This is a foot-gun and nothing else. I hardly think the average DBA > will realize such subtleties as "numeric equality doesn't guarantee that > such-and-such works". If it's not specified by the datatype author > it's not going to be safe. OK, no problem. The most beneficial use case is for string handling: name lookups, case insensitive indexing and index size reduction generally. If, for some reason, bpchar were to be excluded then it would take away a great chunk of benefit. Two questions: - Will bpchar be transformable? - Do you see a clear way forward for specifying the information required to allow the transform? We need to specify the operator, which might be taken to include the datatype. (Peter suggested placing this on the function itself, though I think current precedent is to place on the operator.) If you can say where you want the info to live, I can work out the details and repost. If there's clear benefit and a clear way forward, then we might just be OK for 8.3. If not, I'll put this back on the shelf again in favour of other ideas. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
"Simon Riggs" <simon@2ndquadrant.com> writes: > If there's clear benefit and a clear way forward, then we might just be > OK for 8.3. If not, I'll put this back on the shelf again in favour of > other ideas. I think this is still a long way off, and there are probably more useful things to work on for 8.3. Part of my antagonism stems from the fact that by means of the operator family rewrite I've been getting rid of some longstanding but really quite unacceptable assumptions about "this operator does that". I don't want to see us start putting unsupported semantic assumptions back into the optimizer; rather its assumptions about operator behavior need to be clearly specified. As an example, without some careful preliminary thinking I'd have probably folded all the numeric types into one big opfamily and thereby broken transitivity :-(, leading to bugs that would be devilish to figure out. regards, tom lane
On Fri, 2007-01-26 at 10:58 -0500, Tom Lane wrote: > "Simon Riggs" <simon@2ndquadrant.com> writes: > > If there's clear benefit and a clear way forward, then we might just be > > OK for 8.3. If not, I'll put this back on the shelf again in favour of > > other ideas. > > I think this is still a long way off, and there are probably more useful > things to work on for 8.3. > > Part of my antagonism stems from the fact that by means of the operator > family rewrite I've been getting rid of some longstanding but really > quite unacceptable assumptions about "this operator does that". I don't > want to see us start putting unsupported semantic assumptions back into > the optimizer; rather its assumptions about operator behavior need to be > clearly specified. As an example, without some careful preliminary > thinking I'd have probably folded all the numeric types into one big > opfamily and thereby broken transitivity :-(, leading to bugs that would > be devilish to figure out. OK, no problems. All of the above says "time", which is becoming rare as we approach 8.3 anyways. I'll pick it up again in 8.4. Some notes-to-self for the future: - ideally want to be able to decide transformability at CREATE INDEX time; this will reduce planning time for functional index usage when there is no possible transforms. - may want to do this by having a special catalog table that holds the cases that *will* work, to make it both safer and faster to look up. Sort of like pg_autovacuum -> absence means No. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
"Simon Riggs" <simon@2ndquadrant.com> writes: > In a thread in July last year, I raised the possibility of transforming > a query to allow functional indexes to be utilised automatically. > http://archives.postgresql.org/pgsql-hackers/2006-07/msg00323.php > > This idea can work and has many benefits, but there are some > complexities. I want to summarise those issues first, then make a more > practical and hopefully more acceptable proposal. > > Taken together the complexities would have lead us to have additional > TRANSFORMABLE clauses on TYPEs, FUNCTIONs and potentially encoding > schemes. All of which, I agree, just too much complexity to allow this > to be specified. I've thought further about this and I believe the problem is simpler than we were thinking previously. All we need is one boolean flag on the equality operator for the data type (or perhaps it would be more convenient to have it on the operator class) that indicates that two objects will never compare equal unless they're binary equal. As long as we know that binary unequal implies operator class unequal then we can know that any immutable function will always return the same value for any two equal data. And therefore we can be guaranteed that any elements we want to look up with "col = $1" will be included in the elements returned by looking up "foo(col) = foo($1)". We don't need to know anything about foo() beyond its immutability. If we want to be able to do inequality lookups as well then we'll have to have a flag on the function indicating it's "order preserving". That is for any values a, b the property a<b ==> f(a)<=f(b) holds. The problem is that you can't guarantee that for all operator classes since someone can always come along and define a new operator class with a new ordering that breaks your guarantee. I suppose we could just have the flag indicate that the property holds for the default operator class for the argument data types. But even if we only handled equality I think it would still be a very useful feature. It would allow doing things like creating an index on soundex(lastname) or crc32(url) and have them be automatically useful without altering queries. And it would mean you wouldn't have to create redundant indexes on lastname and lower(lastname). We could always look to generalize it to inequalities later. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Gregory Stark <stark@enterprisedb.com> writes: > I've thought further about this and I believe the problem is simpler than we > were thinking previously. All we need is one boolean flag on the equality > operator for the data type (or perhaps it would be more convenient to have it > on the operator class) that indicates that two objects will never compare > equal unless they're binary equal. Well, we could simplify it that far, but that lets out float, numeric, most or all of the geometry types, and I'm not too sure I care to promise it for text either (think Unicode combining characters...). So really that's too simple IMHO. regards, tom lane