Thread: Implied Functional index use (redux)

Implied Functional index use (redux)

From

"Simon Riggs"

Date:

25 January 2007, 14:58:59

In a thread in July last year, I raised the possibility of transforming
a query to allow functional indexes to be utilised automatically.
http://archives.postgresql.org/pgsql-hackers/2006-07/msg00323.php

This idea can work and has many benefits, but there are some
complexities. I want to summarise those issues first, then make a more
practical and hopefully more acceptable proposal.

Taken together the complexities would have lead us to have additional
TRANSFORMABLE clauses on TYPEs, FUNCTIONs and potentially encoding
schemes. All of which, I agree, just too much complexity to allow this
to be specified.

One example of this was FLOAT, where -0 and +0 are equal but not the
same in a binary form. That would normally mean we couldn't use FLOAT
for TRANSFORMABLE indexes, but of course what happens if we specify a
partial functional index, where we only index values > 0. In that case,
we *can* use the transform technique again. Worse still we may have a
full (non-partial) index where there is a constraint on the column(s)
such as CHECK (value > 0). So we'd need another heavy dose of
catalog-complexity to catch all the special cases.
Yuck and double Yuck.

Even if we did that, it isn't easy for a data type author to tell
whether their type is transformable, or not **in all cases**. That would
probably lead to people saying DISABLE TRANSFORM for their data type,
just in case. Which means no benefit in practice with this feature.

- - -

A simpler, alternate proposal is to allow the user to specify whether a
functional index is transformable or not using CREATE or ALTER INDEX,
with a default of not transformable. That then leaves the responsibility
for specifying this with the user, who as we have seen is the really
only person really capable of judging the whole case on its merits.

e.g. CREATE INDEX fooidx ON foo (foofunc(foocol1)) [TABLESPACE ...] [ENABLE|DISABLE TRANSFORM] [WHERE ...];

ENABLE TRANSFORM is only possible for functional indexes.

Suggestions for better syntax/naming welcome.

Placing the TRANSFORM clause on the index as a simple boolean makes
utilising the feature more streamlined at planning time too. This would
be an extra initial check in create_index_paths() to see if the query
might benefit from transform. Most indexable WHERE clauses would be able
to be transformed, if the index allows.

The feature would be enabled by default with a GUC, but as stated above,
the default for each index would be to *not* transform unless
specifically requested by the user.
enable_index_transform = on (default)| off

EXPLAIN would not need alteration, since the modified query would show
up clearly in the output. (I can add explicit visibility if people want
that).

Overall, a fairly isolated patch, with little user interface changes.

All of the complexities would be very clearly documented as part of this
feature. That is essential to avoid user error, of which I am mindful.
But the technique has much promise, so I would like to make this option
available to designers and DBAs.

If we can agree this smoothly, then it seems possible for 8.3.

Comments?

-- Simon Riggs EnterpriseDB http://www.enterprisedb.com

Re: Implied Functional index use (redux)

From

Tom Lane

Date:

25 January 2007, 17:20:56

"Simon Riggs" <simon@2ndquadrant.com> writes:
> A simpler, alternate proposal is to allow the user to specify whether a
> functional index is transformable or not using CREATE or ALTER INDEX,
> with a default of not transformable. That then leaves the responsibility
> for specifying this with the user, who as we have seen is the really
> only person really capable of judging the whole case on its merits.

> e.g. CREATE INDEX fooidx ON foo (foofunc(foocol1)) 
>     [TABLESPACE ...] [ENABLE|DISABLE TRANSFORM] [WHERE ...];

This is a foot-gun and nothing else.  I hardly think the average DBA
will realize such subtleties as "numeric equality doesn't guarantee that
such-and-such works".  If it's not specified by the datatype author
it's not going to be safe.

In fact, this doesn't work anyway, since it still doesn't address the
question of which "equality" operators we think permit us to apply
the transform.
        regards, tom lane

Re: Implied Functional index use (redux)

From

"Simon Riggs"

Date:

26 January 2007, 11:42:08

On Thu, 2007-01-25 at 16:20 -0500, Tom Lane wrote:
> "Simon Riggs" <simon@2ndquadrant.com> writes:
> > A simpler, alternate proposal is to allow the user to specify whether a
> > functional index is transformable or not using CREATE or ALTER INDEX,
> > with a default of not transformable. That then leaves the responsibility
> > for specifying this with the user, who as we have seen is the really
> > only person really capable of judging the whole case on its merits.
> 
> > e.g. CREATE INDEX fooidx ON foo (foofunc(foocol1)) 
> >     [TABLESPACE ...] [ENABLE|DISABLE TRANSFORM] [WHERE ...];
> 
> This is a foot-gun and nothing else.  I hardly think the average DBA
> will realize such subtleties as "numeric equality doesn't guarantee that
> such-and-such works".  If it's not specified by the datatype author
> it's not going to be safe.

OK, no problem.

The most beneficial use case is for string handling: name lookups, case
insensitive indexing and index size reduction generally. If, for some
reason, bpchar were to be excluded then it would take away a great chunk
of benefit.

Two questions:

- Will bpchar be transformable?

- Do you see a clear way forward for specifying the information required
to allow the transform? We need to specify the operator, which might be
taken to include the datatype. (Peter suggested placing this on the
function itself, though I think current precedent is to place on the
operator.) If you can say where you want the info to live, I can work
out the details and repost.

If there's clear benefit and a clear way forward, then we might just be
OK for 8.3. If not, I'll put this back on the shelf again in favour of
other ideas.

--  Simon Riggs              EnterpriseDB   http://www.enterprisedb.com

Re: Implied Functional index use (redux)

From

Tom Lane

Date:

26 January 2007, 11:58:53

"Simon Riggs" <simon@2ndquadrant.com> writes:
> If there's clear benefit and a clear way forward, then we might just be
> OK for 8.3. If not, I'll put this back on the shelf again in favour of
> other ideas.

I think this is still a long way off, and there are probably more useful
things to work on for 8.3.

Part of my antagonism stems from the fact that by means of the operator
family rewrite I've been getting rid of some longstanding but really
quite unacceptable assumptions about "this operator does that".  I don't
want to see us start putting unsupported semantic assumptions back into
the optimizer; rather its assumptions about operator behavior need to be
clearly specified.  As an example, without some careful preliminary
thinking I'd have probably folded all the numeric types into one big
opfamily and thereby broken transitivity :-(, leading to bugs that would
be devilish to figure out.
        regards, tom lane

Re: Implied Functional index use (redux)

From

"Simon Riggs"

Date:

26 January 2007, 12:39:55

On Fri, 2007-01-26 at 10:58 -0500, Tom Lane wrote:
> "Simon Riggs" <simon@2ndquadrant.com> writes:
> > If there's clear benefit and a clear way forward, then we might just be
> > OK for 8.3. If not, I'll put this back on the shelf again in favour of
> > other ideas.
> 
> I think this is still a long way off, and there are probably more useful
> things to work on for 8.3.
> 
> Part of my antagonism stems from the fact that by means of the operator
> family rewrite I've been getting rid of some longstanding but really
> quite unacceptable assumptions about "this operator does that".  I don't
> want to see us start putting unsupported semantic assumptions back into
> the optimizer; rather its assumptions about operator behavior need to be
> clearly specified.  As an example, without some careful preliminary
> thinking I'd have probably folded all the numeric types into one big
> opfamily and thereby broken transitivity :-(, leading to bugs that would
> be devilish to figure out.

OK, no problems. All of the above says "time", which is becoming rare as
we approach 8.3 anyways. 

I'll pick it up again in 8.4. 

Some notes-to-self for the future:

- ideally want to be able to decide transformability at CREATE INDEX
time; this will reduce planning time for functional index usage when
there is no possible transforms.

- may want to do this by having a special catalog table that holds the
cases that *will* work, to make it both safer and faster to look up.
Sort of like pg_autovacuum -> absence means No.

--  Simon Riggs              EnterpriseDB   http://www.enterprisedb.com

Re: Implied Functional index use (redux)

From

Gregory Stark

Date:

28 January 2007, 13:09:15

"Simon Riggs" <simon@2ndquadrant.com> writes:

> In a thread in July last year, I raised the possibility of transforming
> a query to allow functional indexes to be utilised automatically.
> http://archives.postgresql.org/pgsql-hackers/2006-07/msg00323.php
>
> This idea can work and has many benefits, but there are some
> complexities. I want to summarise those issues first, then make a more
> practical and hopefully more acceptable proposal.
>
> Taken together the complexities would have lead us to have additional
> TRANSFORMABLE clauses on TYPEs, FUNCTIONs and potentially encoding
> schemes. All of which, I agree, just too much complexity to allow this
> to be specified.

I've thought further about this and I believe the problem is simpler than we
were thinking previously. All we need is one boolean flag on the equality
operator for the data type (or perhaps it would be more convenient to have it
on the operator class) that indicates that two objects will never compare
equal unless they're binary equal.

As long as we know that binary unequal implies operator class unequal then we
can know that any immutable function will always return the same value for any
two equal data. And therefore we can be guaranteed that any elements we want
to look up with "col = $1" will be included in the elements returned by
looking up "foo(col) = foo($1)". We don't need to know anything about foo()
beyond its immutability.

If we want to be able to do inequality lookups as well then we'll have to have
a flag on the function indicating it's "order preserving". That is for any
values a, b the property a<b ==> f(a)<=f(b) holds. The problem is that you
can't guarantee that for all operator classes since someone can always come
along and define a new operator class with a new ordering that breaks your
guarantee. I suppose we could just have the flag indicate that the property
holds for the default operator class for the argument data types.

But even if we only handled equality I think it would still be a very useful
feature. It would allow doing things like creating an index on
soundex(lastname) or crc32(url) and have them be automatically useful without
altering queries. And it would mean you wouldn't have to create redundant
indexes on lastname and lower(lastname). We could always look to generalize it
to inequalities later.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com

Re: Implied Functional index use (redux)

From

Tom Lane

Date:

28 January 2007, 14:17:08

Gregory Stark <stark@enterprisedb.com> writes:
> I've thought further about this and I believe the problem is simpler than we
> were thinking previously. All we need is one boolean flag on the equality
> operator for the data type (or perhaps it would be more convenient to have it
> on the operator class) that indicates that two objects will never compare
> equal unless they're binary equal.

Well, we could simplify it that far, but that lets out float, numeric,
most or all of the geometry types, and I'm not too sure I care to
promise it for text either (think Unicode combining characters...).
So really that's too simple IMHO.
        regards, tom lane