Thread: Transforming IN (...) to ORs, volatility
We sometimes transform IN-clauses to a list of ORs: postgres=# explain SELECT * FROM foo WHERE a IN (b, c); QUERY PLAN ------------------------------------------------------ Seq Scan on foo (cost=0.00..39.10 rows=19 width=12) Filter: ((a= b) OR (a = c)) (2 rows) But what if you replace "a" with a volatile function? It doesn't seem legal to do that transformation in that case, but we do it: postgres=# explain SELECT * FROM foo WHERE (random()*2)::integer IN (b, c); QUERY PLAN ------------------------------------------------------------------------------------------------- ------------------- Seq Scan on foo (cost=0.00..68.20 rows=19 width=12) Filter: ((((random() * 2::double precision))::integer= b) OR (((random() * 2::double precision))::integer = c)) (2 rows) I tried to read the SQL spec to see if it has anything to say about that, but I couldn't find anything. My common sense says that that transformation is not legal. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Fri, Apr 01, 2011 at 02:24:53PM +0300, Heikki Linnakangas wrote: > I tried to read the SQL spec to see if it has anything to say about > that, but I couldn't find anything. My common sense says that that > transformation is not legal. Your feeling is correct; I would motivate it as follows. random() IN (b,c) is not equivalent to (random() = b) OR (random() = c) because the two random() will evaluate to two different numbers. So, for instance, if you define random_boolean() as either true or false randomly (and VOLATILEly), then random_boolean() IN (true, false) is always true, while (random_boolean() = true) OR (random_boolean() = false) is not (has probability 75%). For instance, the first random_boolean() might return false while the second one returns true. Best regards, Dr. Gianni Ciolli - 2ndQuadrant Italia PostgreSQL Training, Services and Support gianni.ciolli@2ndquadrant.it | www.2ndquadrant.it
On Fri, Apr 1, 2011 at 7:24 AM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > My common sense says that that transformation > is not legal. +1. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: > We sometimes transform IN-clauses to a list of ORs: > postgres=# explain SELECT * FROM foo WHERE a IN (b, c); > QUERY PLAN > ------------------------------------------------------ > Seq Scan on foo (cost=0.00..39.10 rows=19 width=12) > Filter: ((a = b) OR (a = c)) > (2 rows) > But what if you replace "a" with a volatile function? It doesn't seem > legal to do that transformation in that case, but we do it: This is the fault of transformAExprIn(). But please let's *not* fix this by adding volatility to the set of heuristics used there. Looking at this again, it seems to me that most of the problem with this code is that we're trying to make optimization decisions in the parser. I think what we ought to do is have the parser emit a full-fledged InExpr node type (with semantics rather like CaseExpr) and then teach the planner to optimize that to something else when it seems safe/prudent to do so. One nontrivial advantage of that is that rules/views containing IN constructs would start to reverse-parse in the same fashion, instead of introducing weird substitute expressions. regards, tom lane
On 02.04.2011 20:48, Tom Lane wrote: > Heikki Linnakangas<heikki.linnakangas@enterprisedb.com> writes: >> We sometimes transform IN-clauses to a list of ORs: >> postgres=# explain SELECT * FROM foo WHERE a IN (b, c); >> QUERY PLAN >> ------------------------------------------------------ >> Seq Scan on foo (cost=0.00..39.10 rows=19 width=12) >> Filter: ((a = b) OR (a = c)) >> (2 rows) > >> But what if you replace "a" with a volatile function? It doesn't seem >> legal to do that transformation in that case, but we do it: > > This is the fault of transformAExprIn(). But please let's *not* fix > this by adding volatility to the set of heuristics used there. Looking > at this again, it seems to me that most of the problem with this code > is that we're trying to make optimization decisions in the parser. Agreed. The history of this is that before 8.2 all IN clauses were transformed to OR clauses straight in the grammar. 8.2 added the code to represent IN clause as a ScalarArrayOpExpr, but it was changed in 8.2.10 to use the OR-form again for Vars (http://archives.postgresql.org/pgsql-hackers/2008-10/msg01269.php) > I think what we ought to do is have the parser emit a full-fledged > InExpr node type (with semantics rather like CaseExpr) and then teach > the planner to optimize that to something else when it seems > safe/prudent to do so. One nontrivial advantage of that is that > rules/views containing IN constructs would start to reverse-parse > in the same fashion, instead of introducing weird substitute > expressions. Here's my first cut at that. The lefthand expression is now evaluated only once, and stored in econtext->caseValue. Parse analysis turns the righthand expressions into a list of comparison expressions like "CaseTestExpr = value1". It's perhaps time that we rename CaseTestExpr into something more generic, now that it's used not only in CASE expressions, but also in IN and in UPDATE targets, but I didn't do that in this patch. eval_const_expressions checks the lefthand expression for volatile functions. If there aren't any, it transform the InExprs to a list of ORs. This isn't finished, because it doesn't yet do the transformation to ScalarArrayOpExpr. The OR form is much slower to plan, which is why the ScalarArrayOpExpr transformation was introduced in 8.2. I'll continue hacking on that, but please let me know if you have a better idea on how to handle that. One alternative is to teach the machinery that matches restrictinfos to usable indexes to handle InExpr like it does ScalarArrayOpExprs, but I don't know that code very well. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Attachment
On Fri, Apr 1, 2011 at 14:24, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > We sometimes transform IN-clauses to a list of ORs: > > postgres=# explain SELECT * FROM foo WHERE a IN (b, c); > QUERY PLAN > Seq Scan on foo (cost=0.00..39.10 rows=19 width=12) > Filter: ((a = b) OR (a = c)) > > But what if you replace "a" with a volatile function? It doesn't seem legal > to do that transformation in that case, but we do it: > > postgres=# explain SELECT * FROM foo WHERE (random()*2)::integer IN (b, c); > QUERY PLAN > > Seq Scan on foo (cost=0.00..68.20 rows=19 width=12) > Filter: ((((random() * 2::double precision))::integer = b) OR (((random() > * 2::double precision))::integer = c)) Is there a similar problem with the BETWEEN clause transformation into AND expressions? marti=> explain verbose select random() between 0.25 and 0.75;Result (cost=0.00..0.02 rows=1 width=0) Output: ((random()>= 0.25::double precision) AND (random() <= 0.75::double precision)) As expected, I get a statistical skew of 0.4375 / 0.5625, whereas the "correct" would be 50/50: marti=> select random() between 0.25 and 0.75 as result, count(*) from generate_series(1,1000000) i group by 1;result | count --------+--------f | 437262t | 562738 I also always noticed that BETWEEN with subqueries produces two subplan nodes, this seems suboptimal. marti=> explain verbose select (select random()) between 0.25 and 0.75;Result (cost=0.03..0.04 rows=1 width=0) Output:(($0 >= 0.25::double precision) AND ($1 <= 0.75::double precision)) InitPlan 1 (returns $0) -> Result (cost=0.00..0.01rows=1 width=0) Output: random() InitPlan 2 (returns $1) -> Result (cost=0.00..0.01 rows=1width=0) Output: random() Regards, Marti
On 05.04.2011 13:19, Marti Raudsepp wrote: > On Fri, Apr 1, 2011 at 14:24, Heikki Linnakangas > <heikki.linnakangas@enterprisedb.com> wrote: >> We sometimes transform IN-clauses to a list of ORs: >> >> postgres=# explain SELECT * FROM foo WHERE a IN (b, c); >> QUERY PLAN >> Seq Scan on foo (cost=0.00..39.10 rows=19 width=12) >> Filter: ((a = b) OR (a = c)) >> >> But what if you replace "a" with a volatile function? It doesn't seem legal >> to do that transformation in that case, but we do it: >> >> postgres=# explain SELECT * FROM foo WHERE (random()*2)::integer IN (b, c); >> QUERY PLAN >> >> Seq Scan on foo (cost=0.00..68.20 rows=19 width=12) >> Filter: ((((random() * 2::double precision))::integer = b) OR (((random() >> * 2::double precision))::integer = c)) > > Is there a similar problem with the BETWEEN clause transformation into > AND expressions? > > marti=> explain verbose select random() between 0.25 and 0.75; > Result (cost=0.00..0.02 rows=1 width=0) > Output: ((random()>= 0.25::double precision) AND (random()<= > 0.75::double precision)) Yes, good point. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On 05.04.2011 18:42, Heikki Linnakangas wrote: > On 05.04.2011 13:19, Marti Raudsepp wrote: >> On Fri, Apr 1, 2011 at 14:24, Heikki Linnakangas >> <heikki.linnakangas@enterprisedb.com> wrote: >>> We sometimes transform IN-clauses to a list of ORs: >>> >>> postgres=# explain SELECT * FROM foo WHERE a IN (b, c); >>> QUERY PLAN >>> Seq Scan on foo (cost=0.00..39.10 rows=19 width=12) >>> Filter: ((a = b) OR (a = c)) >>> >>> But what if you replace "a" with a volatile function? It doesn't seem >>> legal >>> to do that transformation in that case, but we do it: >>> >>> postgres=# explain SELECT * FROM foo WHERE (random()*2)::integer IN >>> (b, c); >>> QUERY PLAN >>> >>> Seq Scan on foo (cost=0.00..68.20 rows=19 width=12) >>> Filter: ((((random() * 2::double precision))::integer = b) OR >>> (((random() >>> * 2::double precision))::integer = c)) >> >> Is there a similar problem with the BETWEEN clause transformation into >> AND expressions? >> >> marti=> explain verbose select random() between 0.25 and 0.75; >> Result (cost=0.00..0.02 rows=1 width=0) >> Output: ((random()>= 0.25::double precision) AND (random()<= >> 0.75::double precision)) > > Yes, good point. Hmm, the SQL specification explicitly says that X BETWEEN Y AND Z is equal to X >= Y AND X <= Z It doesn't say anything about side-effects of X. Seems like an oversight in the specification. I would not expect X to be evaluated twice, and I think we should change BETWEEN to not do that. Does anyone object to making BETWEEN and IN more strict about the data types? At the moment, you can do this: postgres=# SELECT '1234' BETWEEN '10001'::text AND 10002::int4; ?column? ---------- t (1 row) I'm thinking that it should throw an error. Same with IN, if the values in the IN-list can't be coerced to a common type. That will probably simplify the code a lot, and is what the SQL standard assumes anyway AFAICS. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > On 05.04.2011 18:42, Heikki Linnakangas wrote: >> On 05.04.2011 13:19, Marti Raudsepp wrote: >>> On Fri, Apr 1, 2011 at 14:24, Heikki Linnakangas >>> <heikki.linnakangas@enterprisedb.com> wrote: >>>> We sometimes transform IN-clauses to a list of ORs: >>>> >>>> postgres=# explain SELECT * FROM foo WHERE a IN (b, c); >>>> QUERY PLAN >>>> Seq Scan on foo (cost=0.00..39.10 rows=19 width=12) >>>> Filter: ((a = b) OR (a = c)) >>>> >>>> But what if you replace "a" with a volatile function? It >>>> doesn't seem legal to do that transformation in that case, but >>>> we do it: >>>> >>>> postgres=# explain SELECT * FROM foo WHERE >>>> (random()*2)::integer IN (b, c); >>>> QUERY PLAN >>>> >>>> Seq Scan on foo (cost=0.00..68.20 rows=19 width=12) >>>> Filter: ((((random() * 2::double precision))::integer = b) OR >>>> (((random() >>>> * 2::double precision))::integer = c)) >>> >>> Is there a similar problem with the BETWEEN clause >>> transformation into AND expressions? >>> >>> marti=> explain verbose select random() between 0.25 and 0.75; >>> Result (cost=0.00..0.02 rows=1 width=0) >>> Output: ((random()>= 0.25::double precision) AND (random()<= >>> 0.75::double precision)) >> >> Yes, good point. > > Hmm, the SQL specification explicitly says that > > X BETWEEN Y AND Z > > is equal to > > X >= Y AND X <= Z > > It doesn't say anything about side-effects of X. Seems like an > oversight in the specification. I would not expect X to be > evaluated twice, and I think we should change BETWEEN to not do > that. Does the SQL spec explicitly say anything about how many times X should be evaluated if you were to code it as?: X >= Y AND X <= Z If it does, evaluating it a different number of times for BETWEEN would seem to be a deviation from standard. Evaluating it once seem less surprising, but if we're going to deviate from the standard in doing that, it at least deserves a clear note to that effect in the docs. Evaluating X once for BETWEEN seems better from a POLA perspective, unless you happen to be massaging a query to another form and trusting that the equivalence defined in the standard will always hold. > Does anyone object to making BETWEEN and IN more strict about the > data types? At the moment, you can do this: > > postgres=# SELECT '1234' BETWEEN '10001'::text AND 10002::int4; > ?column? > ---------- > t > (1 row) > > I'm thinking that it should throw an error. Same with IN, if the > values in the IN-list can't be coerced to a common type. That will > probably simplify the code a lot, and is what the SQL standard > assumes anyway AFAICS. +1 for more strict. -Kevin
On 11.04.2011 19:06, Kevin Grittner wrote: > Heikki Linnakangas<heikki.linnakangas@enterprisedb.com> wrote: >> On 05.04.2011 18:42, Heikki Linnakangas wrote: >>> On 05.04.2011 13:19, Marti Raudsepp wrote: >>>> On Fri, Apr 1, 2011 at 14:24, Heikki Linnakangas >>>> <heikki.linnakangas@enterprisedb.com> wrote: >>>>> We sometimes transform IN-clauses to a list of ORs: >>>>> >>>>> postgres=# explain SELECT * FROM foo WHERE a IN (b, c); >>>>> QUERY PLAN >>>>> Seq Scan on foo (cost=0.00..39.10 rows=19 width=12) >>>>> Filter: ((a = b) OR (a = c)) >>>>> >>>>> But what if you replace "a" with a volatile function? It >>>>> doesn't seem legal to do that transformation in that case, but >>>>> we do it: >>>>> >>>>> postgres=# explain SELECT * FROM foo WHERE >>>>> (random()*2)::integer IN (b, c); >>>>> QUERY PLAN >>>>> >>>>> Seq Scan on foo (cost=0.00..68.20 rows=19 width=12) >>>>> Filter: ((((random() * 2::double precision))::integer = b) OR >>>>> (((random() >>>>> * 2::double precision))::integer = c)) >>>> >>>> Is there a similar problem with the BETWEEN clause >>>> transformation into AND expressions? >>>> >>>> marti=> explain verbose select random() between 0.25 and 0.75; >>>> Result (cost=0.00..0.02 rows=1 width=0) >>>> Output: ((random()>= 0.25::double precision) AND (random()<= >>>> 0.75::double precision)) >>> >>> Yes, good point. >> >> Hmm, the SQL specification explicitly says that >> >> X BETWEEN Y AND Z >> >> is equal to >> >> X>= Y AND X<= Z >> >> It doesn't say anything about side-effects of X. Seems like an >> oversight in the specification. I would not expect X to be >> evaluated twice, and I think we should change BETWEEN to not do >> that. > > Does the SQL spec explicitly say anything about how many times X > should be evaluated if you were to code it as?: > > X>= Y AND X<= Z Not explicitly. However, it does say that: " NOTE 258 — Since <between predicate> is an ordering operation, the Conformance Rules of Subclause 9.12, “Ordering operations”, also apply. " If I'm reading those ordering operation conformance rules correctly, it only allows the operand to be a simple column or an expression that's specified in the ORDER BY or similar, not an arbitrary expression. Which seems quite restrictive, but it would dodge the whole issue.. The spec also has that: “X BETWEEN SYMMETRIC Y AND Z” is equivalent to “((X BETWEEN ASYMMETRIC Y AND Z) OR (X BETWEEN ASYMMETRIC Z AND Y))”. So if you take that into account too, X is evaluated four times. The SQL standard can be funny sometimes, but I can't believe that they intended that. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: > Does anyone object to making BETWEEN and IN more strict about the data > types? At the moment, you can do this: > postgres=# SELECT '1234' BETWEEN '10001'::text AND 10002::int4; > ?column? > ---------- > t > (1 row) > I'm thinking that it should throw an error. Same with IN, if the values > in the IN-list can't be coerced to a common type. You *will* get push-back on that ... maybe from people with badly coded applications, but I guarantee there will be complaints. regards, tom lane
On 11.04.2011 19:33, Heikki Linnakangas wrote: > On 11.04.2011 19:06, Kevin Grittner wrote: >> Heikki Linnakangas<heikki.linnakangas@enterprisedb.com> wrote: >>> Hmm, the SQL specification explicitly says that >>> >>> X BETWEEN Y AND Z >>> >>> is equal to >>> >>> X>= Y AND X<= Z >>> >>> It doesn't say anything about side-effects of X. Seems like an >>> oversight in the specification. I would not expect X to be >>> evaluated twice, and I think we should change BETWEEN to not do >>> that. >> >> Does the SQL spec explicitly say anything about how many times X >> should be evaluated if you were to code it as?: >> >> X>= Y AND X<= Z > > Not explicitly. However, it does say that: > > " > NOTE 258 — Since <between predicate> is an ordering operation, the > Conformance Rules of Subclause 9.12, “Ordering > operations”, also apply. > " > > If I'm reading those ordering operation conformance rules correctly, it > only allows the operand to be a simple column or an expression that's > specified in the ORDER BY or similar, not an arbitrary expression. Which > seems quite restrictive, but it would dodge the whole issue.. Another data point on this: DB2 disallow volatile left-operand to BETWEEN db2 => SELECT * FROM atable WHERE smallint(rand()*10) BETWEEN 4 AND 5 SQL0583N The use of routine or expression "SYSFUN.RAND" is invalid because it is not deterministic or has an external action. SQLSTATE=42845 I'd like us to still fix this so that there's no multiple evaluation - that would actually make BETWEEN more useful than it is today. I'm working on a patch to handle both BETWEEN and IN. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: > I'd like us to still fix this so that there's no multiple evaluation - > that would actually make BETWEEN more useful than it is today. I'm > working on a patch to handle both BETWEEN and IN. One other issue here is that the parser has traditionally handled BETWEEN by multiply linking the same expression tree. I've wanted to get rid of that behavior for a long time, but never got round to it. It causes a lot of headaches for later processing, because you have to be wary of multiply processing the same tree. If we switch BETWEEN to something with a dedicated representation we could probably get rid of all multiple-linking in the parser's output, allowing ensuing simplifications downstream. regards, tom lane
Uh, have we addressed this? I don't think so. --------------------------------------------------------------------------- Heikki Linnakangas wrote: > On 02.04.2011 20:48, Tom Lane wrote: > > Heikki Linnakangas<heikki.linnakangas@enterprisedb.com> writes: > >> We sometimes transform IN-clauses to a list of ORs: > >> postgres=# explain SELECT * FROM foo WHERE a IN (b, c); > >> QUERY PLAN > >> ------------------------------------------------------ > >> Seq Scan on foo (cost=0.00..39.10 rows=19 width=12) > >> Filter: ((a = b) OR (a = c)) > >> (2 rows) > > > >> But what if you replace "a" with a volatile function? It doesn't seem > >> legal to do that transformation in that case, but we do it: > > > > This is the fault of transformAExprIn(). But please let's *not* fix > > this by adding volatility to the set of heuristics used there. Looking > > at this again, it seems to me that most of the problem with this code > > is that we're trying to make optimization decisions in the parser. > > Agreed. The history of this is that before 8.2 all IN clauses were > transformed to OR clauses straight in the grammar. 8.2 added the code to > represent IN clause as a ScalarArrayOpExpr, but it was changed in 8.2.10 > to use the OR-form again for Vars > (http://archives.postgresql.org/pgsql-hackers/2008-10/msg01269.php) > > > I think what we ought to do is have the parser emit a full-fledged > > InExpr node type (with semantics rather like CaseExpr) and then teach > > the planner to optimize that to something else when it seems > > safe/prudent to do so. One nontrivial advantage of that is that > > rules/views containing IN constructs would start to reverse-parse > > in the same fashion, instead of introducing weird substitute > > expressions. > > Here's my first cut at that. The lefthand expression is now evaluated > only once, and stored in econtext->caseValue. Parse analysis turns the > righthand expressions into a list of comparison expressions like > "CaseTestExpr = value1". It's perhaps time that we rename CaseTestExpr > into something more generic, now that it's used not only in CASE > expressions, but also in IN and in UPDATE targets, but I didn't do that > in this patch. > > eval_const_expressions checks the lefthand expression for volatile > functions. If there aren't any, it transform the InExprs to a list of ORs. > > This isn't finished, because it doesn't yet do the transformation to > ScalarArrayOpExpr. The OR form is much slower to plan, which is why the > ScalarArrayOpExpr transformation was introduced in 8.2. I'll continue > hacking on that, but please let me know if you have a better idea on how > to handle that. One alternative is to teach the machinery that matches > restrictinfos to usable indexes to handle InExpr like it does > ScalarArrayOpExprs, but I don't know that code very well. > > -- > Heikki Linnakangas > EnterpriseDB http://www.enterprisedb.com [ Attachment, skipping... ] > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
Nope, this hasn't been addressed. FWIW, I put it on the todo list when I stopped working on it. On 06.09.2011 20:48, Bruce Momjian wrote: > > Uh, have we addressed this? I don't think so. > > --------------------------------------------------------------------------- > > Heikki Linnakangas wrote: >> On 02.04.2011 20:48, Tom Lane wrote: >>> Heikki Linnakangas<heikki.linnakangas@enterprisedb.com> writes: >>>> We sometimes transform IN-clauses to a list of ORs: >>>> postgres=# explain SELECT * FROM foo WHERE a IN (b, c); >>>> QUERY PLAN >>>> ------------------------------------------------------ >>>> Seq Scan on foo (cost=0.00..39.10 rows=19 width=12) >>>> Filter: ((a = b) OR (a = c)) >>>> (2 rows) >>> >>>> But what if you replace "a" with a volatile function? It doesn't seem >>>> legal to do that transformation in that case, but we do it: >>> >>> This is the fault of transformAExprIn(). But please let's *not* fix >>> this by adding volatility to the set of heuristics used there. Looking >>> at this again, it seems to me that most of the problem with this code >>> is that we're trying to make optimization decisions in the parser. >> >> Agreed. The history of this is that before 8.2 all IN clauses were >> transformed to OR clauses straight in the grammar. 8.2 added the code to >> represent IN clause as a ScalarArrayOpExpr, but it was changed in 8.2.10 >> to use the OR-form again for Vars >> (http://archives.postgresql.org/pgsql-hackers/2008-10/msg01269.php) >> >>> I think what we ought to do is have the parser emit a full-fledged >>> InExpr node type (with semantics rather like CaseExpr) and then teach >>> the planner to optimize that to something else when it seems >>> safe/prudent to do so. One nontrivial advantage of that is that >>> rules/views containing IN constructs would start to reverse-parse >>> in the same fashion, instead of introducing weird substitute >>> expressions. >> >> Here's my first cut at that. The lefthand expression is now evaluated >> only once, and stored in econtext->caseValue. Parse analysis turns the >> righthand expressions into a list of comparison expressions like >> "CaseTestExpr = value1". It's perhaps time that we rename CaseTestExpr >> into something more generic, now that it's used not only in CASE >> expressions, but also in IN and in UPDATE targets, but I didn't do that >> in this patch. >> >> eval_const_expressions checks the lefthand expression for volatile >> functions. If there aren't any, it transform the InExprs to a list of ORs. >> >> This isn't finished, because it doesn't yet do the transformation to >> ScalarArrayOpExpr. The OR form is much slower to plan, which is why the >> ScalarArrayOpExpr transformation was introduced in 8.2. I'll continue >> hacking on that, but please let me know if you have a better idea on how >> to handle that. One alternative is to teach the machinery that matches >> restrictinfos to usable indexes to handle InExpr like it does >> ScalarArrayOpExprs, but I don't know that code very well. >> >> -- >> Heikki Linnakangas >> EnterpriseDB http://www.enterprisedb.com > > [ Attachment, skipping... ] > >> >> -- >> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) >> To make changes to your subscription: >> http://www.postgresql.org/mailpref/pgsql-hackers > -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas wrote: > Nope, this hasn't been addressed. FWIW, I put it on the todo list when I > stopped working on it. Oh, I see it now. Thanks. --------------------------------------------------------------------------- > > On 06.09.2011 20:48, Bruce Momjian wrote: > > > > Uh, have we addressed this? I don't think so. > > > > --------------------------------------------------------------------------- > > > > Heikki Linnakangas wrote: > >> On 02.04.2011 20:48, Tom Lane wrote: > >>> Heikki Linnakangas<heikki.linnakangas@enterprisedb.com> writes: > >>>> We sometimes transform IN-clauses to a list of ORs: > >>>> postgres=# explain SELECT * FROM foo WHERE a IN (b, c); > >>>> QUERY PLAN > >>>> ------------------------------------------------------ > >>>> Seq Scan on foo (cost=0.00..39.10 rows=19 width=12) > >>>> Filter: ((a = b) OR (a = c)) > >>>> (2 rows) > >>> > >>>> But what if you replace "a" with a volatile function? It doesn't seem > >>>> legal to do that transformation in that case, but we do it: > >>> > >>> This is the fault of transformAExprIn(). But please let's *not* fix > >>> this by adding volatility to the set of heuristics used there. Looking > >>> at this again, it seems to me that most of the problem with this code > >>> is that we're trying to make optimization decisions in the parser. > >> > >> Agreed. The history of this is that before 8.2 all IN clauses were > >> transformed to OR clauses straight in the grammar. 8.2 added the code to > >> represent IN clause as a ScalarArrayOpExpr, but it was changed in 8.2.10 > >> to use the OR-form again for Vars > >> (http://archives.postgresql.org/pgsql-hackers/2008-10/msg01269.php) > >> > >>> I think what we ought to do is have the parser emit a full-fledged > >>> InExpr node type (with semantics rather like CaseExpr) and then teach > >>> the planner to optimize that to something else when it seems > >>> safe/prudent to do so. One nontrivial advantage of that is that > >>> rules/views containing IN constructs would start to reverse-parse > >>> in the same fashion, instead of introducing weird substitute > >>> expressions. > >> > >> Here's my first cut at that. The lefthand expression is now evaluated > >> only once, and stored in econtext->caseValue. Parse analysis turns the > >> righthand expressions into a list of comparison expressions like > >> "CaseTestExpr = value1". It's perhaps time that we rename CaseTestExpr > >> into something more generic, now that it's used not only in CASE > >> expressions, but also in IN and in UPDATE targets, but I didn't do that > >> in this patch. > >> > >> eval_const_expressions checks the lefthand expression for volatile > >> functions. If there aren't any, it transform the InExprs to a list of ORs. > >> > >> This isn't finished, because it doesn't yet do the transformation to > >> ScalarArrayOpExpr. The OR form is much slower to plan, which is why the > >> ScalarArrayOpExpr transformation was introduced in 8.2. I'll continue > >> hacking on that, but please let me know if you have a better idea on how > >> to handle that. One alternative is to teach the machinery that matches > >> restrictinfos to usable indexes to handle InExpr like it does > >> ScalarArrayOpExprs, but I don't know that code very well. > >> > >> -- > >> Heikki Linnakangas > >> EnterpriseDB http://www.enterprisedb.com > > > > [ Attachment, skipping... ] > > > >> > >> -- > >> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > >> To make changes to your subscription: > >> http://www.postgresql.org/mailpref/pgsql-hackers > > > > > -- > Heikki Linnakangas > EnterpriseDB http://www.enterprisedb.com -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
Bruce Momjian <bruce@momjian.us> writes: > Uh, have we addressed this? I don't think so. No. IIRC, I didn't like Heikki's proposed patch, so it's on my head to come up with something better. regards, tom lane
Tom Lane wrote: > Bruce Momjian <bruce@momjian.us> writes: > > Uh, have we addressed this? I don't think so. > > No. IIRC, I didn't like Heikki's proposed patch, so it's on my head > to come up with something better. You can blame me for getting it into the parser. It used to be in gram.y! -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On 06.09.2011 20:53, Bruce Momjian wrote: > Tom Lane wrote: >> Bruce Momjian<bruce@momjian.us> writes: >>> Uh, have we addressed this? I don't think so. >> >> No. IIRC, I didn't like Heikki's proposed patch, so it's on my head >> to come up with something better. > > You can blame me for getting it into the parser. It used to be in > gram.y! Huh? Isn't "the parser" and "gram.y" more or less the same thing? Anyway, it needs to be somewhere else.. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas wrote: > On 06.09.2011 20:53, Bruce Momjian wrote: > > Tom Lane wrote: > >> Bruce Momjian<bruce@momjian.us> writes: > >>> Uh, have we addressed this? I don't think so. > >> > >> No. IIRC, I didn't like Heikki's proposed patch, so it's on my head > >> to come up with something better. > > > > You can blame me for getting it into the parser. It used to be in > > gram.y! > > Huh? Isn't "the parser" and "gram.y" more or less the same thing? > Anyway, it needs to be somewhere else.. I meant the '/parser' directory. It actually created AND nodes in gram.y so the rest of the parser didn't even know a BETWEEN was used. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +