Thread: query planner does not canonicalize infix operators
I created an index on an hstore function, fetchval(hstore, text), however when I use the -> infix operator which resolves to the very same function, this index is not used. It should be used.
I have included an example:
Table with hstore index:
de10keipt01939=> \d log_data
Table "public.log_data"
Column | Type | Modifiers
--------+--------------------------+-------------------------------------------------------
id | bigint | not null default nextval('log_data_id_seq'::regclass)
time | timestamp with time zone |
data | hstore |
Indexes:
"index_log_data_by_time" btree ("time")
"index_participant_id" btree (fetchval(data, 'participant_id'::text))
query with function notation:
de10keipt01939=> explain ANALYZE select * from log_data where (data->'participant_id')='2851' order by id desc;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
Sort (cost=16432.56..16433.36 rows=1583 width=315) (actual time=198.643..198.777 rows=183 loops=1)
Sort Key: id
Sort Method: quicksort Memory: 119kB
-> Seq Scan on log_data (cost=0.00..16415.74 rows=1583 width=315) (actual time=6.926..198.297 rows=183 loops=1)
Filter: ((data -> 'participant_id'::text) = '2851'::text)
Total runtime: 198.922 ms
(6 rows)
query with infix notation:
de10keipt01939=> explain ANALYZE select * from log_data where fetchval(data,'participant_id')='2851' order by id desc;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=341.14..341.23 rows=179 width=315) (actual time=0.724..0.841 rows=183 loops=1)
Sort Key: id
Sort Method: quicksort Memory: 119kB
-> Bitmap Heap Scan on log_data (cost=2.35..339.80 rows=179 width=315) (actual time=0.091..0.489 rows=183 loops=1)
Recheck Cond: (fetchval(data, 'participant_id'::text) = '2851'::text)
-> Bitmap Index Scan on index_participant_id (cost=0.00..2.34 rows=179 width=0) (actual time=0.060..0.060 rows=183 loops=1)
Index Cond: (fetchval(data, 'participant_id'::text) = '2851'::text)
Total runtime: 1.010 ms
(8 rows)
—Will
Will Leinweber <will@heroku.com> writes: > I created an index on an hstore function, fetchval(hstore, text), however > when I use the -> infix operator which resolves to the very same function, > this index is not used. It should be used. Don't hold your breath. Create an index on the expression you intend to use, not random respellings of it. regards, tom lane
On Mon, Mar 12, 2012 at 7:52 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Will Leinweber <will@heroku.com> writes: >> I created an index on an hstore function, fetchval(hstore, text), however >> when I use the -> infix operator which resolves to the very same function, >> this index is not used. It should be used. > > Don't hold your breath. Create an index on the expression you intend to > use, not random respellings of it. Is this saying "there no need for that" or "no one is working on it, and I certainly don't intend to", or "definitely not in the next version" or something else entirely? I disagreement with the former, and am in total understanding of the latter two. -- fdr
Daniel Farina <daniel@heroku.com> writes: > On Mon, Mar 12, 2012 at 7:52 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Will Leinweber <will@heroku.com> writes: >>> I created an index on an hstore function, fetchval(hstore, text), however >>> when I use the -> infix operator which resolves to the very same function, >>> this index is not used. It should be used. >> Don't hold your breath. Create an index on the expression you intend to >> use, not random respellings of it. > Is this saying "there no need for that" or "no one is working on it, > and I certainly don't intend to", or "definitely not in the next > version" or something else entirely? The reason this is difficult is that the bulk of the planner's optimization knowledge is attached to operators, not functions. It'd be easy enough to smash operators down to their underlying functions during expression preprocessing, sure. But then we would fail to recognize index applicability at all --- for instance an index on an integer column can get matched to "indexcol = 42", but not to "int4eq(indexcol, 42)". And we'd have little clue about the selectivity of the expression, either, since selectivity estimators are attached to operators not functions. Arguably the Berkeley guys got this wrong 25 years ago, and they should have defined indexes and selectivity in terms of functions not operators. However I really don't foresee us reinventing all the "operator class" infrastructure to make that happen; the amount of work required is far out of proportion to the benefit, even without considering the number of external projects that would get impacted. (Inventing selectivity estimators for functions is a less daunting task, and we might do that sometime ... but I think it would likely live alongside operator selectivity rather than replace it.) More generally, I'm not prepared to buy into the idea that the planner should be expected to recognize alternate spellings of "the same" expression. There are too many variants of that idea that are infeasible either because the planner doesn't have the necessary knowledge, or it does but trying to recognize the equivalence would cost an impractical number of planning cycles. Neither of those objections apply to "replace an operator by its underlying function"; but they do apply to other comparable requests we've gotten such as "recognize that x + 0 is the same as x" or "x + 1 is the same as 1 + x". regards, tom lane
On Mon, Mar 12, 2012 at 12:22 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Daniel Farina <daniel@heroku.com> writes: >> On Mon, Mar 12, 2012 at 7:52 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Will Leinweber <will@heroku.com> writes: >>>> I created an index on an hstore function, fetchval(hstore, text), however >>>> when I use the -> infix operator which resolves to the very same function, >>>> this index is not used. It should be used. > >>> Don't hold your breath. Create an index on the expression you intend to >>> use, not random respellings of it. > >> Is this saying "there no need for that" or "no one is working on it, >> and I certainly don't intend to", or "definitely not in the next >> version" or something else entirely? > > The reason this is difficult is that the bulk of the planner's > optimization knowledge is attached to operators, not functions. It'd be > easy enough to smash operators down to their underlying functions during > expression preprocessing, sure. But then we would fail to recognize index > applicability at all --- for instance an index on an integer column can > get matched to "indexcol = 42", but not to "int4eq(indexcol, 42)". And > we'd have little clue about the selectivity of the expression, either, > since selectivity estimators are attached to operators not functions. Argh. That is very sad. Given that an operator definitely resolves to a function (right?) in the function catalog it seems a little insane. > Arguably the Berkeley guys got this wrong 25 years ago, and they should > have defined indexes and selectivity in terms of functions not > operators. However I really don't foresee us reinventing all the > "operator class" infrastructure to make that happen; the amount of work > required is far out of proportion to the benefit, even without > considering the number of external projects that would get impacted. > (Inventing selectivity estimators for functions is a less daunting task, > and we might do that sometime ... but I think it would likely live > alongside operator selectivity rather than replace it.) I see. That is ugly. > More generally, I'm not prepared to buy into the idea that the planner > should be expected to recognize alternate spellings of "the same" > expression. There are too many variants of that idea that are > infeasible either because the planner doesn't have the necessary > knowledge, or it does but trying to recognize the equivalence would cost > an impractical number of planning cycles. Neither of those objections > apply to "replace an operator by its underlying function"; but they do > apply to other comparable requests we've gotten such as "recognize that > x + 0 is the same as x" or "x + 1 is the same as 1 + x". I think that the case of infix-operator-to-function stands out in that it doesn't really rely on knowing any algebriac properties of the underlying function such as commutativity, associativity, the notion of a nil-value in the algebra, et al. It's a bit concerning because ORMs are starting to pick up on the extra data types in Postgres and they'll have to all spell the "preferred" spelling of the functionality to properly compose. If SQLAlchemy generates "fetchval" and ActiveRecord 4 uses "->", tough times are ahead. Thanks for your detailed response. As a small point of criticism, I think your response to Will came off as a bit strong, even though there are good reasons why this deceptively small request turns into a snarl that may only be of interest to optimizer hackers. On the more constructive side, if I were to till the fields to change this aspect of the optimizer, is there any interest in rectifying the operator-function confusion? -- fdr
On mån, 2012-03-12 at 13:04 -0700, Daniel Farina wrote: > On the more constructive side, if I were to till the fields to change > this aspect of the optimizer, is there any interest in rectifying the > operator-function confusion? I once proposed to do that [1], but there was not much enthusiasm elsewhere. I still think it'd be worthwhile, for the reasons you mentioned. But it would be a lot of detail work. [1] http://archives.postgresql.org/message-id/1309640525.26660.22.camel@vanquo.pezone.net
Tom Lane <tgl@sss.pgh.pa.us> writes: > More generally, I'm not prepared to buy into the idea that the planner > should be expected to recognize alternate spellings of "the same" > expression. There are too many variants of that idea that are > infeasible either because the planner doesn't have the necessary > knowledge, or it does but trying to recognize the equivalence would cost > an impractical number of planning cycles. Neither of those objections > apply to "replace an operator by its underlying function"; but they do > apply to other comparable requests we've gotten such as "recognize that > x + 0 is the same as x" or "x + 1 is the same as 1 + x". Looks like we're missing out some operator properties, like the neutral element and if the operator is transitive, commutative or associative. I think I remember us talking about how knowing about operators being associative would also help optimize a class of join problems. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Dimitri Fontaine <dimitri@2ndQuadrant.fr> writes: > Looks like we're missing out some operator properties, like the neutral > element and if the operator is transitive, commutative or associative. I > think I remember us talking about how knowing about operators being > associative would also help optimize a class of join problems. We do understand, and use, transitivity for btree equality operators (cf mergejoin planning, EquivalenceClasses, etc). I have limited enthusiasm for introducing a more general concept, because it seems like doing anything with it would add a great deal more planning effort for (typically) little reward. I can imagine cases where, say, deducing "a < c" from "a < b and b < c" would be helpful ... but they don't come up in common queries. regards, tom lane