Thread: query planner does not canonicalize infix operators

query planner does not canonicalize infix operators

From

Will Leinweber

Date:

12 March 2012, 11:05:31

I created an index on an hstore function, fetchval(hstore, text), however when I use the -> infix operator which resolves to the very same function, this index is not used. It should be used.

I have included an example:

Table with hstore index:

de10keipt01939=> \d log_data

Table "public.log_data"

Column | Type | Modifiers

--------+--------------------------+-------------------------------------------------------

id | bigint | not null default nextval('log_data_id_seq'::regclass)

time | timestamp with time zone |

data | hstore |

Indexes:

"index_log_data_by_time" btree ("time")

"index_participant_id" btree (fetchval(data, 'participant_id'::text))

query with function notation:

de10keipt01939=> explain ANALYZE select * from log_data where (data->'participant_id')='2851' order by id desc;

QUERY PLAN

---------------------------------------------------------------------------------------------------------------------

Sort (cost=16432.56..16433.36 rows=1583 width=315) (actual time=198.643..198.777 rows=183 loops=1)

Sort Key: id

Sort Method: quicksort Memory: 119kB

-> Seq Scan on log_data (cost=0.00..16415.74 rows=1583 width=315) (actual time=6.926..198.297 rows=183 loops=1)

Filter: ((data -> 'participant_id'::text) = '2851'::text)

Total runtime: 198.922 ms

(6 rows)

query with infix notation:

de10keipt01939=> explain ANALYZE select * from log_data where fetchval(data,'participant_id')='2851' order by id desc;

QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------

Sort (cost=341.14..341.23 rows=179 width=315) (actual time=0.724..0.841 rows=183 loops=1)

Sort Key: id

Sort Method: quicksort Memory: 119kB

-> Bitmap Heap Scan on log_data (cost=2.35..339.80 rows=179 width=315) (actual time=0.091..0.489 rows=183 loops=1)

Recheck Cond: (fetchval(data, 'participant_id'::text) = '2851'::text)

-> Bitmap Index Scan on index_participant_id (cost=0.00..2.34 rows=179 width=0) (actual time=0.060..0.060 rows=183 loops=1)

Index Cond: (fetchval(data, 'participant_id'::text) = '2851'::text)

Total runtime: 1.010 ms

(8 rows)

—Will

Re: query planner does not canonicalize infix operators

From

Tom Lane

Date:

12 March 2012, 11:52:50

Will Leinweber <will@heroku.com> writes:
> I created an index on an hstore function, fetchval(hstore, text), however
> when I use the -> infix operator which resolves to the very same function,
> this index is not used. It should be used.

Don't hold your breath.  Create an index on the expression you intend to
use, not random respellings of it.
        regards, tom lane

Re: query planner does not canonicalize infix operators

From

Daniel Farina

Date:

12 March 2012, 14:12:55

On Mon, Mar 12, 2012 at 7:52 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Will Leinweber <will@heroku.com> writes:
>> I created an index on an hstore function, fetchval(hstore, text), however
>> when I use the -> infix operator which resolves to the very same function,
>> this index is not used. It should be used.
>
> Don't hold your breath.  Create an index on the expression you intend to
> use, not random respellings of it.

Is this saying "there no need for that" or "no one is working on it,
and I certainly don't intend to", or "definitely not in the next
version" or something else entirely?

I disagreement with the former, and am in total understanding of the latter two.

--
fdr

Re: query planner does not canonicalize infix operators

From

Tom Lane

Date:

12 March 2012, 16:23:08

Daniel Farina <daniel@heroku.com> writes:
> On Mon, Mar 12, 2012 at 7:52 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Will Leinweber <will@heroku.com> writes:
>>> I created an index on an hstore function, fetchval(hstore, text), however
>>> when I use the -> infix operator which resolves to the very same function,
>>> this index is not used. It should be used.

>> Don't hold your breath.  Create an index on the expression you intend to
>> use, not random respellings of it.

> Is this saying "there no need for that" or "no one is working on it,
> and I certainly don't intend to", or "definitely not in the next
> version" or something else entirely?

The reason this is difficult is that the bulk of the planner's
optimization knowledge is attached to operators, not functions.  It'd be
easy enough to smash operators down to their underlying functions during
expression preprocessing, sure.  But then we would fail to recognize index
applicability at all --- for instance an index on an integer column can
get matched to "indexcol = 42", but not to "int4eq(indexcol, 42)".  And
we'd have little clue about the selectivity of the expression, either,
since selectivity estimators are attached to operators not functions.

Arguably the Berkeley guys got this wrong 25 years ago, and they should
have defined indexes and selectivity in terms of functions not
operators.  However I really don't foresee us reinventing all the
"operator class" infrastructure to make that happen; the amount of work
required is far out of proportion to the benefit, even without
considering the number of external projects that would get impacted.
(Inventing selectivity estimators for functions is a less daunting task,
and we might do that sometime ... but I think it would likely live
alongside operator selectivity rather than replace it.)

More generally, I'm not prepared to buy into the idea that the planner
should be expected to recognize alternate spellings of "the same"
expression.  There are too many variants of that idea that are
infeasible either because the planner doesn't have the necessary
knowledge, or it does but trying to recognize the equivalence would cost
an impractical number of planning cycles.  Neither of those objections
apply to "replace an operator by its underlying function"; but they do
apply to other comparable requests we've gotten such as "recognize that
x + 0 is the same as x" or "x + 1 is the same as 1 + x".
        regards, tom lane

Re: query planner does not canonicalize infix operators

From

Daniel Farina

Date:

12 March 2012, 17:05:47

On Mon, Mar 12, 2012 at 12:22 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Daniel Farina <daniel@heroku.com> writes:
>> On Mon, Mar 12, 2012 at 7:52 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Will Leinweber <will@heroku.com> writes:
>>>> I created an index on an hstore function, fetchval(hstore, text), however
>>>> when I use the -> infix operator which resolves to the very same function,
>>>> this index is not used. It should be used.
>
>>> Don't hold your breath.  Create an index on the expression you intend to
>>> use, not random respellings of it.
>
>> Is this saying "there no need for that" or "no one is working on it,
>> and I certainly don't intend to", or "definitely not in the next
>> version" or something else entirely?
>
> The reason this is difficult is that the bulk of the planner's
> optimization knowledge is attached to operators, not functions.  It'd be
> easy enough to smash operators down to their underlying functions during
> expression preprocessing, sure.  But then we would fail to recognize index
> applicability at all --- for instance an index on an integer column can
> get matched to "indexcol = 42", but not to "int4eq(indexcol, 42)".  And
> we'd have little clue about the selectivity of the expression, either,
> since selectivity estimators are attached to operators not functions.

Argh. That is very sad. Given that an operator definitely resolves to
a function (right?) in the function catalog it seems a little insane.

> Arguably the Berkeley guys got this wrong 25 years ago, and they should
> have defined indexes and selectivity in terms of functions not
> operators.  However I really don't foresee us reinventing all the
> "operator class" infrastructure to make that happen; the amount of work
> required is far out of proportion to the benefit, even without
> considering the number of external projects that would get impacted.
> (Inventing selectivity estimators for functions is a less daunting task,
> and we might do that sometime ... but I think it would likely live
> alongside operator selectivity rather than replace it.)

I see. That is ugly.

> More generally, I'm not prepared to buy into the idea that the planner
> should be expected to recognize alternate spellings of "the same"
> expression.  There are too many variants of that idea that are
> infeasible either because the planner doesn't have the necessary
> knowledge, or it does but trying to recognize the equivalence would cost
> an impractical number of planning cycles.  Neither of those objections
> apply to "replace an operator by its underlying function"; but they do
> apply to other comparable requests we've gotten such as "recognize that
> x + 0 is the same as x" or "x + 1 is the same as 1 + x".

I think that the case of infix-operator-to-function stands out in that
it doesn't really rely on knowing any algebriac properties of the
underlying function such as commutativity, associativity, the notion
of a nil-value in the algebra, et al.  It's a bit concerning because
ORMs are starting to pick up on the extra data types in Postgres and
they'll have to all spell the "preferred" spelling of the
functionality to properly compose.  If SQLAlchemy generates "fetchval"
and ActiveRecord 4 uses "->", tough times are ahead.

Thanks for your detailed response.  As a small point of criticism, I
think your response to Will came off as a bit strong, even though
there are good reasons why this deceptively small request turns into a
snarl that may only be of interest to optimizer hackers.

On the more constructive side, if I were to till the fields to change
this aspect of the optimizer, is there any interest in rectifying the
operator-function confusion?

--
fdr

Re: query planner does not canonicalize infix operators

From

Peter Eisentraut

Date:

13 March 2012, 00:21:51

On mån, 2012-03-12 at 13:04 -0700, Daniel Farina wrote:
> On the more constructive side, if I were to till the fields to change
> this aspect of the optimizer, is there any interest in rectifying the
> operator-function confusion?

I once proposed to do that [1], but there was not much enthusiasm
elsewhere.  I still think it'd be worthwhile, for the reasons you
mentioned.  But it would be a lot of detail work.

[1] http://archives.postgresql.org/message-id/1309640525.26660.22.camel@vanquo.pezone.net

Re: query planner does not canonicalize infix operators

From

Dimitri Fontaine

Date:

15 April 2012, 06:01:12

Tom Lane <tgl@sss.pgh.pa.us> writes:
> More generally, I'm not prepared to buy into the idea that the planner
> should be expected to recognize alternate spellings of "the same"
> expression.  There are too many variants of that idea that are
> infeasible either because the planner doesn't have the necessary
> knowledge, or it does but trying to recognize the equivalence would cost
> an impractical number of planning cycles.  Neither of those objections
> apply to "replace an operator by its underlying function"; but they do
> apply to other comparable requests we've gotten such as "recognize that
> x + 0 is the same as x" or "x + 1 is the same as 1 + x".

Looks like we're missing out some operator properties, like the neutral
element and if the operator is transitive, commutative or associative. I
think I remember us talking about how knowing about operators being
associative would also help optimize a class of join problems.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support

Re: query planner does not canonicalize infix operators

From

Tom Lane

Date:

15 April 2012, 12:22:59

Dimitri Fontaine <dimitri@2ndQuadrant.fr> writes:
> Looks like we're missing out some operator properties, like the neutral
> element and if the operator is transitive, commutative or associative. I
> think I remember us talking about how knowing about operators being
> associative would also help optimize a class of join problems.

We do understand, and use, transitivity for btree equality operators
(cf mergejoin planning, EquivalenceClasses, etc).  I have limited
enthusiasm for introducing a more general concept, because it seems like
doing anything with it would add a great deal more planning effort for
(typically) little reward.  I can imagine cases where, say, deducing
"a < c" from "a < b and b < c" would be helpful ... but they don't come
up in common queries.
        regards, tom lane