Implied Functional Index use - Mailing list pgsql-hackers

From Simon Riggs
Subject Implied Functional Index use
Date
Msg-id 1152648843.2465.73.camel@localhost.localdomain
Whole thread Raw
Responses Re: Implied Functional Index use  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Currently, functional indexes can be used by a query that explicitly
mentions the exact phrasing of the functional index within the WHERE
clause.

IMHO it is feasible to extend the range of WHERE clauses for which
functional indexes can be used by using implication, much in the same
way that we use transitive closure in join queries now. This has some
powerful and important uses, just one of which is index prefix
compression (so read on).

If we have a subclause within a WHERE clause of the form
col1 = k1 AND col2 = k2 AND ... colN = kN

this implies that the following clause is also true:
col1 = k1 AND col2 = k2 AND ... colN = kN
AND    F(col1, col2 ... colN) = F(k1, k2 ... kN)

iff:
- the function F is IMMUTABLE, since that definition implies that the
output is always the same for any set of constant inputs.
- the operator that connects each col1,k1 pair must be defined as true
equality (note that in the above example what looks like the same
equality operator is in fact context dependent on the datatypes)
- the datatypes of each col1, k1 pair must match

If we have a Functional Index F(col1, col2...colN) then we can use the
second implied form of the subclause to recognise that the index can 
be useful for the query. This would then allow a query plan that uses an
IndexScan on the functional index with a re-check filter of the original
query clause.

An example might be a query using the clausesurname = 'RIGGS' 
would allow us to use an index defined asmetaphone(surname)
even though the original application was written before we added the
index. Note that the index would be significantly smaller than an index
on the full surname, as well as avoiding some of the hotspots caused by
certain most-frequent-values in the data.

An alternative example might be to define an index onsubstr(text_col, 1, 100)
which still allows searching for longer strings without the need to
store the complete string in the index. This is effectively index prefix
compression, which is not directly possible with pgsql because of the
requirements of datatype encapsulation. (Note that this avoids the need
to have very long index keys also).

One difficulty to this is defining which operators represent
true-equality. There isn't a definition of this currently for operators
and we cannot assume that an operator called "=" has the required
properties, even if that is true for most built-in types. An example of
true-equality and its difficulties is for the FLOAT type minus-zero (-0)
and plus-zero (+0) have different byte representations yet when compared
with the standard FLOAT comparison operator +0 and -0 would be
considered equal. If we were to put each value through a hash-like
function such as md5() then we would clearly get different answers.

To take advantage of this, I propose to
- add code to be called from allpaths.c: when we check functional
indexes, if they exist and yet are not usable because of a lack of
explicit clauses we will check to see if there any clauses that can be
used to imply the correct use of the functional index. This is in many
ways similar to existing constraint exclusion code.
- add a new boolean to pg_operator to allow us to define which operators
offer true equality
- add a new keyword EQUALITY to CREATE/ALTER OPERATOR (which I think
implies HASHES and MERGES if they are not mentioned explicitly).

No promises for 8.2, but does anyone have further input?

--  Simon Riggs EnterpriseDB          http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Robert Treat
Date:
Subject: Re: More nuclear options
Next
From: Josh Berkus
Date:
Subject: Re: More nuclear options