Thread: Transforming IN (...) to ORs, volatility

Transforming IN (...) to ORs, volatility

From
Heikki Linnakangas
Date:
We sometimes transform IN-clauses to a list of ORs:

postgres=# explain SELECT * FROM foo WHERE a IN (b, c);                      QUERY PLAN
------------------------------------------------------ Seq Scan on foo  (cost=0.00..39.10 rows=19 width=12)   Filter:
((a= b) OR (a = c))
 
(2 rows)

But what if you replace "a" with a volatile function? It doesn't seem 
legal to do that transformation in that case, but we do it:

postgres=# explain SELECT * FROM foo WHERE (random()*2)::integer IN (b, c);
       QUERY PLAN 
 


-------------------------------------------------------------------------------------------------
------------------- Seq Scan on foo  (cost=0.00..68.20 rows=19 width=12)   Filter: ((((random() * 2::double
precision))::integer= b) OR 
 
(((random() * 2::double precision))::integer = c))
(2 rows)

I tried to read the SQL spec to see if it has anything to say about 
that, but I couldn't find anything. My common sense says that that 
transformation is not legal.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Transforming IN (...) to ORs, volatility

From
Gianni Ciolli
Date:
On Fri, Apr 01, 2011 at 02:24:53PM +0300, Heikki Linnakangas wrote:

> I tried to read the SQL spec to see if it has anything to say about
> that, but I couldn't find anything. My common sense says that that
> transformation is not legal.

Your feeling is correct; I would motivate it as follows.
 random() IN (b,c)

is not equivalent to
 (random() = b) OR (random() = c)

because the two random() will evaluate to two different numbers.  So,
for instance, if you define random_boolean() as either true or false
randomly (and VOLATILEly), then
 random_boolean() IN (true, false)

is always true, while
 (random_boolean() = true) OR (random_boolean() = false)

is not (has probability 75%). For instance, the first random_boolean()
might return false while the second one returns true.

Best regards,
Dr. Gianni Ciolli - 2ndQuadrant Italia
PostgreSQL Training, Services and Support
gianni.ciolli@2ndquadrant.it | www.2ndquadrant.it


Re: Transforming IN (...) to ORs, volatility

From
Robert Haas
Date:
On Fri, Apr 1, 2011 at 7:24 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> My common sense says that that transformation
> is not legal.

+1.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Transforming IN (...) to ORs, volatility

From
Tom Lane
Date:
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> We sometimes transform IN-clauses to a list of ORs:
> postgres=# explain SELECT * FROM foo WHERE a IN (b, c);
>                        QUERY PLAN
> ------------------------------------------------------
>   Seq Scan on foo  (cost=0.00..39.10 rows=19 width=12)
>     Filter: ((a = b) OR (a = c))
> (2 rows)

> But what if you replace "a" with a volatile function? It doesn't seem 
> legal to do that transformation in that case, but we do it:

This is the fault of transformAExprIn().  But please let's *not* fix
this by adding volatility to the set of heuristics used there.  Looking
at this again, it seems to me that most of the problem with this code
is that we're trying to make optimization decisions in the parser.

I think what we ought to do is have the parser emit a full-fledged
InExpr node type (with semantics rather like CaseExpr) and then teach
the planner to optimize that to something else when it seems
safe/prudent to do so.  One nontrivial advantage of that is that
rules/views containing IN constructs would start to reverse-parse
in the same fashion, instead of introducing weird substitute
expressions.
        regards, tom lane


Re: Transforming IN (...) to ORs, volatility

From
Heikki Linnakangas
Date:
On 02.04.2011 20:48, Tom Lane wrote:
> Heikki Linnakangas<heikki.linnakangas@enterprisedb.com>  writes:
>> We sometimes transform IN-clauses to a list of ORs:
>> postgres=# explain SELECT * FROM foo WHERE a IN (b, c);
>>                         QUERY PLAN
>> ------------------------------------------------------
>>    Seq Scan on foo  (cost=0.00..39.10 rows=19 width=12)
>>      Filter: ((a = b) OR (a = c))
>> (2 rows)
>
>> But what if you replace "a" with a volatile function? It doesn't seem
>> legal to do that transformation in that case, but we do it:
>
> This is the fault of transformAExprIn().  But please let's *not* fix
> this by adding volatility to the set of heuristics used there.  Looking
> at this again, it seems to me that most of the problem with this code
> is that we're trying to make optimization decisions in the parser.

Agreed. The history of this is that before 8.2 all IN clauses were
transformed to OR clauses straight in the grammar. 8.2 added the code to
represent IN clause as a ScalarArrayOpExpr, but it was changed in 8.2.10
to use the OR-form again for Vars
(http://archives.postgresql.org/pgsql-hackers/2008-10/msg01269.php)

> I think what we ought to do is have the parser emit a full-fledged
> InExpr node type (with semantics rather like CaseExpr) and then teach
> the planner to optimize that to something else when it seems
> safe/prudent to do so.  One nontrivial advantage of that is that
> rules/views containing IN constructs would start to reverse-parse
> in the same fashion, instead of introducing weird substitute
> expressions.

Here's my first cut at that. The lefthand expression is now evaluated
only once, and stored in econtext->caseValue. Parse analysis turns the
righthand expressions into a list of comparison expressions like
"CaseTestExpr = value1". It's perhaps time that we rename CaseTestExpr
into something more generic, now that it's used not only in CASE
expressions, but also in IN and in UPDATE targets, but I didn't do that
in this patch.

eval_const_expressions checks the lefthand expression for volatile
functions. If there aren't any, it transform the InExprs to a list of ORs.

This isn't finished, because it doesn't yet do the transformation to
ScalarArrayOpExpr. The OR form is much slower to plan, which is why the
ScalarArrayOpExpr transformation was introduced in 8.2. I'll continue
hacking on that, but please let me know if you have a better idea on how
to handle that. One alternative is to teach the machinery that matches
restrictinfos to usable indexes to handle InExpr like it does
ScalarArrayOpExprs, but I don't know that code very well.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Attachment

Re: Transforming IN (...) to ORs, volatility

From
Marti Raudsepp
Date:
On Fri, Apr 1, 2011 at 14:24, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> We sometimes transform IN-clauses to a list of ORs:
>
> postgres=# explain SELECT * FROM foo WHERE a IN (b, c);
>                      QUERY PLAN
>  Seq Scan on foo  (cost=0.00..39.10 rows=19 width=12)
>   Filter: ((a = b) OR (a = c))
>
> But what if you replace "a" with a volatile function? It doesn't seem legal
> to do that transformation in that case, but we do it:
>
> postgres=# explain SELECT * FROM foo WHERE (random()*2)::integer IN (b, c);
>                                                     QUERY PLAN
>
>  Seq Scan on foo  (cost=0.00..68.20 rows=19 width=12)
>   Filter: ((((random() * 2::double precision))::integer = b) OR (((random()
> * 2::double precision))::integer = c))

Is there a similar problem with the BETWEEN clause transformation into
AND expressions?

marti=> explain verbose select random() between 0.25 and 0.75;Result  (cost=0.00..0.02 rows=1 width=0)  Output:
((random()>= 0.25::double precision) AND (random() <= 
0.75::double precision))

As expected, I get a statistical skew of 0.4375 / 0.5625, whereas the
"correct" would be 50/50:

marti=> select random() between 0.25 and 0.75 as result, count(*) from
generate_series(1,1000000) i group by 1;result | count
--------+--------f      | 437262t      | 562738

I also always noticed that BETWEEN with subqueries produces two
subplan nodes, this seems suboptimal.

marti=> explain verbose select (select random()) between 0.25 and 0.75;Result  (cost=0.03..0.04 rows=1 width=0)
Output:(($0 >= 0.25::double precision) AND ($1 <= 0.75::double precision))  InitPlan 1 (returns $0)    ->  Result
(cost=0.00..0.01rows=1 width=0)          Output: random()  InitPlan 2 (returns $1)    ->  Result  (cost=0.00..0.01
rows=1width=0)          Output: random() 


Regards,
Marti


Re: Transforming IN (...) to ORs, volatility

From
Heikki Linnakangas
Date:
On 05.04.2011 13:19, Marti Raudsepp wrote:
> On Fri, Apr 1, 2011 at 14:24, Heikki Linnakangas
> <heikki.linnakangas@enterprisedb.com>  wrote:
>> We sometimes transform IN-clauses to a list of ORs:
>>
>> postgres=# explain SELECT * FROM foo WHERE a IN (b, c);
>>                       QUERY PLAN
>>   Seq Scan on foo  (cost=0.00..39.10 rows=19 width=12)
>>    Filter: ((a = b) OR (a = c))
>>
>> But what if you replace "a" with a volatile function? It doesn't seem legal
>> to do that transformation in that case, but we do it:
>>
>> postgres=# explain SELECT * FROM foo WHERE (random()*2)::integer IN (b, c);
>>                                                      QUERY PLAN
>>
>>   Seq Scan on foo  (cost=0.00..68.20 rows=19 width=12)
>>    Filter: ((((random() * 2::double precision))::integer = b) OR (((random()
>> * 2::double precision))::integer = c))
>
> Is there a similar problem with the BETWEEN clause transformation into
> AND expressions?
>
> marti=>  explain verbose select random() between 0.25 and 0.75;
>   Result  (cost=0.00..0.02 rows=1 width=0)
>     Output: ((random()>= 0.25::double precision) AND (random()<=
> 0.75::double precision))

Yes, good point.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Transforming IN (...) to ORs, volatility

From
Heikki Linnakangas
Date:
On 05.04.2011 18:42, Heikki Linnakangas wrote:
> On 05.04.2011 13:19, Marti Raudsepp wrote:
>> On Fri, Apr 1, 2011 at 14:24, Heikki Linnakangas
>> <heikki.linnakangas@enterprisedb.com> wrote:
>>> We sometimes transform IN-clauses to a list of ORs:
>>>
>>> postgres=# explain SELECT * FROM foo WHERE a IN (b, c);
>>> QUERY PLAN
>>> Seq Scan on foo (cost=0.00..39.10 rows=19 width=12)
>>> Filter: ((a = b) OR (a = c))
>>>
>>> But what if you replace "a" with a volatile function? It doesn't seem
>>> legal
>>> to do that transformation in that case, but we do it:
>>>
>>> postgres=# explain SELECT * FROM foo WHERE (random()*2)::integer IN
>>> (b, c);
>>> QUERY PLAN
>>>
>>> Seq Scan on foo (cost=0.00..68.20 rows=19 width=12)
>>> Filter: ((((random() * 2::double precision))::integer = b) OR
>>> (((random()
>>> * 2::double precision))::integer = c))
>>
>> Is there a similar problem with the BETWEEN clause transformation into
>> AND expressions?
>>
>> marti=> explain verbose select random() between 0.25 and 0.75;
>> Result (cost=0.00..0.02 rows=1 width=0)
>> Output: ((random()>= 0.25::double precision) AND (random()<=
>> 0.75::double precision))
>
> Yes, good point.

Hmm, the SQL specification explicitly says that

X BETWEEN Y AND Z

is equal to

X >= Y AND X <= Z

It doesn't say anything about side-effects of X. Seems like an oversight 
in the specification. I would not expect X to be evaluated twice, and I 
think we should change BETWEEN to not do that.


Does anyone object to making BETWEEN and IN more strict about the data 
types? At the moment, you can do this:

postgres=# SELECT '1234' BETWEEN '10001'::text AND 10002::int4; ?column?
---------- t
(1 row)

I'm thinking that it should throw an error. Same with IN, if the values 
in the IN-list can't be coerced to a common type. That will probably 
simplify the code a lot, and is what the SQL standard assumes anyway AFAICS.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Transforming IN (...) to ORs, volatility

From
"Kevin Grittner"
Date:
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote:
> On 05.04.2011 18:42, Heikki Linnakangas wrote:
>> On 05.04.2011 13:19, Marti Raudsepp wrote:
>>> On Fri, Apr 1, 2011 at 14:24, Heikki Linnakangas
>>> <heikki.linnakangas@enterprisedb.com> wrote:
>>>> We sometimes transform IN-clauses to a list of ORs:
>>>>
>>>> postgres=# explain SELECT * FROM foo WHERE a IN (b, c);
>>>> QUERY PLAN
>>>> Seq Scan on foo (cost=0.00..39.10 rows=19 width=12)
>>>> Filter: ((a = b) OR (a = c))
>>>>
>>>> But what if you replace "a" with a volatile function? It
>>>> doesn't seem legal to do that transformation in that case, but
>>>> we do it:
>>>>
>>>> postgres=# explain SELECT * FROM foo WHERE
>>>> (random()*2)::integer IN (b, c);
>>>> QUERY PLAN
>>>>
>>>> Seq Scan on foo (cost=0.00..68.20 rows=19 width=12)
>>>> Filter: ((((random() * 2::double precision))::integer = b) OR
>>>> (((random()
>>>> * 2::double precision))::integer = c))
>>>
>>> Is there a similar problem with the BETWEEN clause
>>> transformation into AND expressions?
>>>
>>> marti=> explain verbose select random() between 0.25 and 0.75;
>>> Result (cost=0.00..0.02 rows=1 width=0)
>>> Output: ((random()>= 0.25::double precision) AND (random()<=
>>> 0.75::double precision))
>>
>> Yes, good point.
> 
> Hmm, the SQL specification explicitly says that
> 
> X BETWEEN Y AND Z
> 
> is equal to
> 
> X >= Y AND X <= Z
> 
> It doesn't say anything about side-effects of X. Seems like an
> oversight in the specification. I would not expect X to be
> evaluated twice, and I think we should change BETWEEN to not do
> that.
Does the SQL spec explicitly say anything about how many times X
should be evaluated if you were to code it as?:
X >= Y AND X <= Z
If it does, evaluating it a different number of times for BETWEEN
would seem to be a deviation from standard.  Evaluating it once seem
less surprising, but if we're going to deviate from the standard in
doing that, it at least deserves a clear note to that effect in the
docs.
Evaluating X once for BETWEEN seems better from a POLA perspective,
unless you happen to be massaging a query to another form and
trusting that the equivalence defined in the standard will always
hold.
> Does anyone object to making BETWEEN and IN more strict about the
> data types? At the moment, you can do this:
> 
> postgres=# SELECT '1234' BETWEEN '10001'::text AND 10002::int4;
>   ?column?
> ----------
>   t
> (1 row)
> 
> I'm thinking that it should throw an error. Same with IN, if the
> values in the IN-list can't be coerced to a common type. That will
> probably simplify the code a lot, and is what the SQL standard
> assumes anyway AFAICS.
+1 for more strict.
-Kevin


Re: Transforming IN (...) to ORs, volatility

From
Heikki Linnakangas
Date:
On 11.04.2011 19:06, Kevin Grittner wrote:
> Heikki Linnakangas<heikki.linnakangas@enterprisedb.com>  wrote:
>> On 05.04.2011 18:42, Heikki Linnakangas wrote:
>>> On 05.04.2011 13:19, Marti Raudsepp wrote:
>>>> On Fri, Apr 1, 2011 at 14:24, Heikki Linnakangas
>>>> <heikki.linnakangas@enterprisedb.com>  wrote:
>>>>> We sometimes transform IN-clauses to a list of ORs:
>>>>>
>>>>> postgres=# explain SELECT * FROM foo WHERE a IN (b, c);
>>>>> QUERY PLAN
>>>>> Seq Scan on foo (cost=0.00..39.10 rows=19 width=12)
>>>>> Filter: ((a = b) OR (a = c))
>>>>>
>>>>> But what if you replace "a" with a volatile function? It
>>>>> doesn't seem legal to do that transformation in that case, but
>>>>> we do it:
>>>>>
>>>>> postgres=# explain SELECT * FROM foo WHERE
>>>>> (random()*2)::integer IN (b, c);
>>>>> QUERY PLAN
>>>>>
>>>>> Seq Scan on foo (cost=0.00..68.20 rows=19 width=12)
>>>>> Filter: ((((random() * 2::double precision))::integer = b) OR
>>>>> (((random()
>>>>> * 2::double precision))::integer = c))
>>>>
>>>> Is there a similar problem with the BETWEEN clause
>>>> transformation into AND expressions?
>>>>
>>>> marti=>  explain verbose select random() between 0.25 and 0.75;
>>>> Result (cost=0.00..0.02 rows=1 width=0)
>>>> Output: ((random()>= 0.25::double precision) AND (random()<=
>>>> 0.75::double precision))
>>>
>>> Yes, good point.
>>
>> Hmm, the SQL specification explicitly says that
>>
>> X BETWEEN Y AND Z
>>
>> is equal to
>>
>> X>= Y AND X<= Z
>>
>> It doesn't say anything about side-effects of X. Seems like an
>> oversight in the specification. I would not expect X to be
>> evaluated twice, and I think we should change BETWEEN to not do
>> that.
>
> Does the SQL spec explicitly say anything about how many times X
> should be evaluated if you were to code it as?:
>
> X>= Y AND X<= Z

Not explicitly. However, it does say that:

"
NOTE 258 — Since <between predicate> is an ordering operation, the 
Conformance Rules of Subclause 9.12, “Ordering
operations”, also apply.
"

If I'm reading those ordering operation conformance rules correctly, it 
only allows the operand to be a simple column or an expression that's 
specified in the ORDER BY or similar, not an arbitrary expression. Which 
seems quite restrictive, but it would dodge the whole issue..

The spec also has that:

“X BETWEEN SYMMETRIC Y AND Z” is equivalent to “((X BETWEEN ASYMMETRIC Y 
AND Z)
OR (X BETWEEN ASYMMETRIC Z AND Y))”.

So if you take that into account too, X is evaluated four times. The SQL 
standard can be funny sometimes, but I can't believe that they intended 
that.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Transforming IN (...) to ORs, volatility

From
Tom Lane
Date:
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> Does anyone object to making BETWEEN and IN more strict about the data 
> types? At the moment, you can do this:

> postgres=# SELECT '1234' BETWEEN '10001'::text AND 10002::int4;
>   ?column?
> ----------
>   t
> (1 row)

> I'm thinking that it should throw an error. Same with IN, if the values 
> in the IN-list can't be coerced to a common type.

You *will* get push-back on that ... maybe from people with badly coded
applications, but I guarantee there will be complaints.
        regards, tom lane


Re: Transforming IN (...) to ORs, volatility

From
Heikki Linnakangas
Date:
On 11.04.2011 19:33, Heikki Linnakangas wrote:
> On 11.04.2011 19:06, Kevin Grittner wrote:
>> Heikki Linnakangas<heikki.linnakangas@enterprisedb.com> wrote:
>>> Hmm, the SQL specification explicitly says that
>>>
>>> X BETWEEN Y AND Z
>>>
>>> is equal to
>>>
>>> X>= Y AND X<= Z
>>>
>>> It doesn't say anything about side-effects of X. Seems like an
>>> oversight in the specification. I would not expect X to be
>>> evaluated twice, and I think we should change BETWEEN to not do
>>> that.
>>
>> Does the SQL spec explicitly say anything about how many times X
>> should be evaluated if you were to code it as?:
>>
>> X>= Y AND X<= Z
>
> Not explicitly. However, it does say that:
>
> "
> NOTE 258 — Since <between predicate> is an ordering operation, the
> Conformance Rules of Subclause 9.12, “Ordering
> operations”, also apply.
> "
>
> If I'm reading those ordering operation conformance rules correctly, it
> only allows the operand to be a simple column or an expression that's
> specified in the ORDER BY or similar, not an arbitrary expression. Which
> seems quite restrictive, but it would dodge the whole issue..

Another data point on this: DB2 disallow volatile left-operand to BETWEEN

db2 => SELECT * FROM atable WHERE smallint(rand()*10) BETWEEN 4 AND 5
SQL0583N  The use of routine or expression "SYSFUN.RAND" is invalid 
because it
is not deterministic or has an external action.  SQLSTATE=42845

I'd like us to still fix this so that there's no multiple evaluation - 
that would actually make BETWEEN more useful than it is today. I'm 
working on a patch to handle both BETWEEN and IN.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Transforming IN (...) to ORs, volatility

From
Tom Lane
Date:
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> I'd like us to still fix this so that there's no multiple evaluation - 
> that would actually make BETWEEN more useful than it is today. I'm 
> working on a patch to handle both BETWEEN and IN.

One other issue here is that the parser has traditionally handled
BETWEEN by multiply linking the same expression tree.  I've wanted to
get rid of that behavior for a long time, but never got round to it.
It causes a lot of headaches for later processing, because you have to
be wary of multiply processing the same tree.  If we switch BETWEEN
to something with a dedicated representation we could probably get rid
of all multiple-linking in the parser's output, allowing ensuing
simplifications downstream.
        regards, tom lane


Re: Transforming IN (...) to ORs, volatility

From
Bruce Momjian
Date:
Uh, have we addressed this?  I don't think so.

---------------------------------------------------------------------------

Heikki Linnakangas wrote:
> On 02.04.2011 20:48, Tom Lane wrote:
> > Heikki Linnakangas<heikki.linnakangas@enterprisedb.com>  writes:
> >> We sometimes transform IN-clauses to a list of ORs:
> >> postgres=# explain SELECT * FROM foo WHERE a IN (b, c);
> >>                         QUERY PLAN
> >> ------------------------------------------------------
> >>    Seq Scan on foo  (cost=0.00..39.10 rows=19 width=12)
> >>      Filter: ((a = b) OR (a = c))
> >> (2 rows)
> >
> >> But what if you replace "a" with a volatile function? It doesn't seem
> >> legal to do that transformation in that case, but we do it:
> >
> > This is the fault of transformAExprIn().  But please let's *not* fix
> > this by adding volatility to the set of heuristics used there.  Looking
> > at this again, it seems to me that most of the problem with this code
> > is that we're trying to make optimization decisions in the parser.
> 
> Agreed. The history of this is that before 8.2 all IN clauses were 
> transformed to OR clauses straight in the grammar. 8.2 added the code to 
> represent IN clause as a ScalarArrayOpExpr, but it was changed in 8.2.10 
> to use the OR-form again for Vars 
> (http://archives.postgresql.org/pgsql-hackers/2008-10/msg01269.php)
> 
> > I think what we ought to do is have the parser emit a full-fledged
> > InExpr node type (with semantics rather like CaseExpr) and then teach
> > the planner to optimize that to something else when it seems
> > safe/prudent to do so.  One nontrivial advantage of that is that
> > rules/views containing IN constructs would start to reverse-parse
> > in the same fashion, instead of introducing weird substitute
> > expressions.
> 
> Here's my first cut at that. The lefthand expression is now evaluated 
> only once, and stored in econtext->caseValue. Parse analysis turns the 
> righthand expressions into a list of comparison expressions like 
> "CaseTestExpr = value1". It's perhaps time that we rename CaseTestExpr 
> into something more generic, now that it's used not only in CASE 
> expressions, but also in IN and in UPDATE targets, but I didn't do that 
> in this patch.
> 
> eval_const_expressions checks the lefthand expression for volatile 
> functions. If there aren't any, it transform the InExprs to a list of ORs.
> 
> This isn't finished, because it doesn't yet do the transformation to 
> ScalarArrayOpExpr. The OR form is much slower to plan, which is why the 
> ScalarArrayOpExpr transformation was introduced in 8.2. I'll continue 
> hacking on that, but please let me know if you have a better idea on how 
> to handle that. One alternative is to teach the machinery that matches 
> restrictinfos to usable indexes to handle InExpr like it does 
> ScalarArrayOpExprs, but I don't know that code very well.
> 
> -- 
>    Heikki Linnakangas
>    EnterpriseDB   http://www.enterprisedb.com

[ Attachment, skipping... ]

> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: Transforming IN (...) to ORs, volatility

From
Heikki Linnakangas
Date:
Nope, this hasn't been addressed. FWIW, I put it on the todo list when I 
stopped working on it.

On 06.09.2011 20:48, Bruce Momjian wrote:
>
> Uh, have we addressed this?  I don't think so.
>
> ---------------------------------------------------------------------------
>
> Heikki Linnakangas wrote:
>> On 02.04.2011 20:48, Tom Lane wrote:
>>> Heikki Linnakangas<heikki.linnakangas@enterprisedb.com>   writes:
>>>> We sometimes transform IN-clauses to a list of ORs:
>>>> postgres=# explain SELECT * FROM foo WHERE a IN (b, c);
>>>>                          QUERY PLAN
>>>> ------------------------------------------------------
>>>>     Seq Scan on foo  (cost=0.00..39.10 rows=19 width=12)
>>>>       Filter: ((a = b) OR (a = c))
>>>> (2 rows)
>>>
>>>> But what if you replace "a" with a volatile function? It doesn't seem
>>>> legal to do that transformation in that case, but we do it:
>>>
>>> This is the fault of transformAExprIn().  But please let's *not* fix
>>> this by adding volatility to the set of heuristics used there.  Looking
>>> at this again, it seems to me that most of the problem with this code
>>> is that we're trying to make optimization decisions in the parser.
>>
>> Agreed. The history of this is that before 8.2 all IN clauses were
>> transformed to OR clauses straight in the grammar. 8.2 added the code to
>> represent IN clause as a ScalarArrayOpExpr, but it was changed in 8.2.10
>> to use the OR-form again for Vars
>> (http://archives.postgresql.org/pgsql-hackers/2008-10/msg01269.php)
>>
>>> I think what we ought to do is have the parser emit a full-fledged
>>> InExpr node type (with semantics rather like CaseExpr) and then teach
>>> the planner to optimize that to something else when it seems
>>> safe/prudent to do so.  One nontrivial advantage of that is that
>>> rules/views containing IN constructs would start to reverse-parse
>>> in the same fashion, instead of introducing weird substitute
>>> expressions.
>>
>> Here's my first cut at that. The lefthand expression is now evaluated
>> only once, and stored in econtext->caseValue. Parse analysis turns the
>> righthand expressions into a list of comparison expressions like
>> "CaseTestExpr = value1". It's perhaps time that we rename CaseTestExpr
>> into something more generic, now that it's used not only in CASE
>> expressions, but also in IN and in UPDATE targets, but I didn't do that
>> in this patch.
>>
>> eval_const_expressions checks the lefthand expression for volatile
>> functions. If there aren't any, it transform the InExprs to a list of ORs.
>>
>> This isn't finished, because it doesn't yet do the transformation to
>> ScalarArrayOpExpr. The OR form is much slower to plan, which is why the
>> ScalarArrayOpExpr transformation was introduced in 8.2. I'll continue
>> hacking on that, but please let me know if you have a better idea on how
>> to handle that. One alternative is to teach the machinery that matches
>> restrictinfos to usable indexes to handle InExpr like it does
>> ScalarArrayOpExprs, but I don't know that code very well.
>>
>> --
>>     Heikki Linnakangas
>>     EnterpriseDB   http://www.enterprisedb.com
>
> [ Attachment, skipping... ]
>
>>
>> --
>> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-hackers
>


--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Transforming IN (...) to ORs, volatility

From
Bruce Momjian
Date:
Heikki Linnakangas wrote:
> Nope, this hasn't been addressed. FWIW, I put it on the todo list when I 
> stopped working on it.

Oh, I see it now. Thanks.

---------------------------------------------------------------------------


> 
> On 06.09.2011 20:48, Bruce Momjian wrote:
> >
> > Uh, have we addressed this?  I don't think so.
> >
> > ---------------------------------------------------------------------------
> >
> > Heikki Linnakangas wrote:
> >> On 02.04.2011 20:48, Tom Lane wrote:
> >>> Heikki Linnakangas<heikki.linnakangas@enterprisedb.com>   writes:
> >>>> We sometimes transform IN-clauses to a list of ORs:
> >>>> postgres=# explain SELECT * FROM foo WHERE a IN (b, c);
> >>>>                          QUERY PLAN
> >>>> ------------------------------------------------------
> >>>>     Seq Scan on foo  (cost=0.00..39.10 rows=19 width=12)
> >>>>       Filter: ((a = b) OR (a = c))
> >>>> (2 rows)
> >>>
> >>>> But what if you replace "a" with a volatile function? It doesn't seem
> >>>> legal to do that transformation in that case, but we do it:
> >>>
> >>> This is the fault of transformAExprIn().  But please let's *not* fix
> >>> this by adding volatility to the set of heuristics used there.  Looking
> >>> at this again, it seems to me that most of the problem with this code
> >>> is that we're trying to make optimization decisions in the parser.
> >>
> >> Agreed. The history of this is that before 8.2 all IN clauses were
> >> transformed to OR clauses straight in the grammar. 8.2 added the code to
> >> represent IN clause as a ScalarArrayOpExpr, but it was changed in 8.2.10
> >> to use the OR-form again for Vars
> >> (http://archives.postgresql.org/pgsql-hackers/2008-10/msg01269.php)
> >>
> >>> I think what we ought to do is have the parser emit a full-fledged
> >>> InExpr node type (with semantics rather like CaseExpr) and then teach
> >>> the planner to optimize that to something else when it seems
> >>> safe/prudent to do so.  One nontrivial advantage of that is that
> >>> rules/views containing IN constructs would start to reverse-parse
> >>> in the same fashion, instead of introducing weird substitute
> >>> expressions.
> >>
> >> Here's my first cut at that. The lefthand expression is now evaluated
> >> only once, and stored in econtext->caseValue. Parse analysis turns the
> >> righthand expressions into a list of comparison expressions like
> >> "CaseTestExpr = value1". It's perhaps time that we rename CaseTestExpr
> >> into something more generic, now that it's used not only in CASE
> >> expressions, but also in IN and in UPDATE targets, but I didn't do that
> >> in this patch.
> >>
> >> eval_const_expressions checks the lefthand expression for volatile
> >> functions. If there aren't any, it transform the InExprs to a list of ORs.
> >>
> >> This isn't finished, because it doesn't yet do the transformation to
> >> ScalarArrayOpExpr. The OR form is much slower to plan, which is why the
> >> ScalarArrayOpExpr transformation was introduced in 8.2. I'll continue
> >> hacking on that, but please let me know if you have a better idea on how
> >> to handle that. One alternative is to teach the machinery that matches
> >> restrictinfos to usable indexes to handle InExpr like it does
> >> ScalarArrayOpExprs, but I don't know that code very well.
> >>
> >> --
> >>     Heikki Linnakangas
> >>     EnterpriseDB   http://www.enterprisedb.com
> >
> > [ Attachment, skipping... ]
> >
> >>
> >> --
> >> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> >> To make changes to your subscription:
> >> http://www.postgresql.org/mailpref/pgsql-hackers
> >
> 
> 
> -- 
>    Heikki Linnakangas
>    EnterpriseDB   http://www.enterprisedb.com

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: Transforming IN (...) to ORs, volatility

From
Tom Lane
Date:
Bruce Momjian <bruce@momjian.us> writes:
> Uh, have we addressed this?  I don't think so.

No.  IIRC, I didn't like Heikki's proposed patch, so it's on my head
to come up with something better.
        regards, tom lane


Re: Transforming IN (...) to ORs, volatility

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > Uh, have we addressed this?  I don't think so.
> 
> No.  IIRC, I didn't like Heikki's proposed patch, so it's on my head
> to come up with something better.

You can blame me for getting it into the parser.  It used to be in
gram.y!

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: Transforming IN (...) to ORs, volatility

From
Heikki Linnakangas
Date:
On 06.09.2011 20:53, Bruce Momjian wrote:
> Tom Lane wrote:
>> Bruce Momjian<bruce@momjian.us>  writes:
>>> Uh, have we addressed this?  I don't think so.
>>
>> No.  IIRC, I didn't like Heikki's proposed patch, so it's on my head
>> to come up with something better.
>
> You can blame me for getting it into the parser.  It used to be in
> gram.y!

Huh? Isn't "the parser" and "gram.y" more or less the same thing? 
Anyway, it needs to be somewhere else..

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Transforming IN (...) to ORs, volatility

From
Bruce Momjian
Date:
Heikki Linnakangas wrote:
> On 06.09.2011 20:53, Bruce Momjian wrote:
> > Tom Lane wrote:
> >> Bruce Momjian<bruce@momjian.us>  writes:
> >>> Uh, have we addressed this?  I don't think so.
> >>
> >> No.  IIRC, I didn't like Heikki's proposed patch, so it's on my head
> >> to come up with something better.
> >
> > You can blame me for getting it into the parser.  It used to be in
> > gram.y!
> 
> Huh? Isn't "the parser" and "gram.y" more or less the same thing? 
> Anyway, it needs to be somewhere else..

I meant the '/parser' directory.  It actually created AND nodes in gram.y
so the rest of the parser didn't even know a BETWEEN was used.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +