Thread: BUG #8771: Query execution plan broken after upgrade from 9.1.9

BUG #8771: Query execution plan broken after upgrade from 9.1.9

From
martin.junek@tracmap.co.nz
Date:
The following bug has been logged on the website:

Bug reference:      8771
Logged by:          Martin Junek
Email address:      martin.junek@tracmap.co.nz
PostgreSQL version: 9.1.11
Operating system:   Ubuntu 12.04
Description:

Hi there,
after upgrading from 9.1.9 to 9.1.11 one of our queries started to run
terribly slow (went from few miliseconds to hours). The problem is better
explained in the following SQL snippet (which is a very simplified version
of the problem). If you run it on 9.1.9, all the SELECTs evaluate in few
miliseconds, if you run it on 9.1.11, it will take probably hours (I didn't
have the patience to wait for it).
Because it works fine in the older version, and also because there's nothing
conceptually wrong with the query, I believe it's a bug that should be
reported and fixed.
Use the following SQL script to reproduce the problem.


-- prepare underlying table with 1m records
create table t1(id int primary key, val double precision);
insert into t1 select generate_series(1, 1000000), random() * 10000000;


-- a function that just takes long (will be used in the sub-query)
create or replace function takes_long(double precision) returns int as $$
select max(id) from t1 where val = $1; -- just to force a slow seq_scan
$$ language sql;


-- create view, that calls the slow function for each row
create view v as
  select
    id, val, (select takes_long(val)) as whatever
  from t1;


-- this works fine (calls the slow function only once, as expected)
select * from v
where id in (50000)


-- in 9.1.9 it will limit the one row and evaluates the slow function only
once (=expected)
-- in 9.1.11 this will run for hours - for some reason evaluates the slow
function for each row (million times)
select * from v
where id in (select 50000)


-- also this will run for hours in 9.1.11 (even though it doesn't even have
to call the function at all - it's not in the select list)
select id from v
where id in (select 50000)

Re: BUG #8771: Query execution plan broken after upgrade from 9.1.9

From
Tom Lane
Date:
martin.junek@tracmap.co.nz writes:
> after upgrading from 9.1.9 to 9.1.11 one of our queries started to run
> terribly slow (went from few miliseconds to hours). The problem is better
> explained in the following SQL snippet (which is a very simplified version
> of the problem). If you run it on 9.1.9, all the SELECTs evaluate in few
> miliseconds, if you run it on 9.1.11, it will take probably hours (I didn't
> have the patience to wait for it).

I believe this is a result of this 9.1.11 change:

  * Avoid flattening a subquery whose SELECT list contains a volatile function wrapped inside a sub-SELECT (Tom Lane)

    This avoids unexpected results due to extra evaluations of the volatile function.

full details of which can be found here:
http://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=af38d140c71c21eda422fedc838525525d155cac

That's an intentional change that is not going to get undone.  If you
don't like the results, I'd suggest fixing the "slow function" to get it
marked as stable or immutable as appropriate.  (The given example would be
best marked stable, but I suppose it's just an example and not your real
problem function.)  When it's marked volatile, as this is by default,
that discourages the planner from rearranging the query in ways that would
change the number of function executions from what a naive implementation
would suggest.  9.1.11 is a bit more discouraged than previous releases,
but it's also less likely to produce surprising results when the function
is genuinely volatile.

            regards, tom lane

Re: BUG #8771: Query execution plan broken after upgrade from 9.1.9

From
Martin Junek
Date:
Hi Tom,
thanks for the explanation. However, it sounds like a terrible side
effect of the change you mentioned. Also in this example the function is
not volatile.
Couple more questions:
- why it works as expected when I use " where id in (50000)" and does
not work if used "where id in (select 50000)"? Will it still work that
way in the foreseeable future?
- why the function gets executed when the column is not in the select
list at all? i.e. when I use "select id from...."?

re "optimizing the slow function" - this example is very simplified just
to demonstrate the problem (so the function is intentionally slow). In
our case it's a big view containing tens of subqueries. The view would
contain many records but we only need 50 (=one page of results) so our
query was designed to evaluate all the subqueries only for the 50
records. Now if we upgrade to 9.1.11... you see the problem.

In our case there's a workaround (split the query into two separate
queries), but it just feels wrong. I'm also surprised that other users
haven't encountered this issue.

I'd appreciate if you reconsider this to be a bug in the query optimizer.

Thanks for your help.
Martin.



On 11/01/14 14:32, Tom Lane wrote:
> martin.junek@tracmap.co.nz writes:
>> after upgrading from 9.1.9 to 9.1.11 one of our queries started to run
>> terribly slow (went from few miliseconds to hours). The problem is better
>> explained in the following SQL snippet (which is a very simplified version
>> of the problem). If you run it on 9.1.9, all the SELECTs evaluate in few
>> miliseconds, if you run it on 9.1.11, it will take probably hours (I didn't
>> have the patience to wait for it).
> I believe this is a result of this 9.1.11 change:
>
>    * Avoid flattening a subquery whose SELECT list contains a volatile function wrapped inside a sub-SELECT (Tom
Lane)
>
>      This avoids unexpected results due to extra evaluations of the volatile function.
>
> full details of which can be found here:
> http://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=af38d140c71c21eda422fedc838525525d155cac
>
> That's an intentional change that is not going to get undone.  If you
> don't like the results, I'd suggest fixing the "slow function" to get it
> marked as stable or immutable as appropriate.  (The given example would be
> best marked stable, but I suppose it's just an example and not your real
> problem function.)  When it's marked volatile, as this is by default,
> that discourages the planner from rearranging the query in ways that would
> change the number of function executions from what a naive implementation
> would suggest.  9.1.11 is a bit more discouraged than previous releases,
> but it's also less likely to produce surprising results when the function
> is genuinely volatile.
>
>             regards, tom lane

Re: BUG #8771: Query execution plan broken after upgrade from 9.1.9

From
Martin Junek
Date:
Hi Tom,
I just noticed that the function in the example wasn't explicitly
defined as 'stable'. If I change the function definition to STABLE, the
problem goes away. And the same problem was in our code.
Thank you for pointing me in the right direction, that was very helpful.
Martin.

On 11/01/14 14:32, Tom Lane wrote:
> martin.junek@tracmap.co.nz writes:
>> after upgrading from 9.1.9 to 9.1.11 one of our queries started to run
>> terribly slow (went from few miliseconds to hours). The problem is better
>> explained in the following SQL snippet (which is a very simplified version
>> of the problem). If you run it on 9.1.9, all the SELECTs evaluate in few
>> miliseconds, if you run it on 9.1.11, it will take probably hours (I didn't
>> have the patience to wait for it).
> I believe this is a result of this 9.1.11 change:
>
>    * Avoid flattening a subquery whose SELECT list contains a volatile function wrapped inside a sub-SELECT (Tom
Lane)
>
>      This avoids unexpected results due to extra evaluations of the volatile function.
>
> full details of which can be found here:
> http://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=af38d140c71c21eda422fedc838525525d155cac
>
> That's an intentional change that is not going to get undone.  If you
> don't like the results, I'd suggest fixing the "slow function" to get it
> marked as stable or immutable as appropriate.  (The given example would be
> best marked stable, but I suppose it's just an example and not your real
> problem function.)  When it's marked volatile, as this is by default,
> that discourages the planner from rearranging the query in ways that would
> change the number of function executions from what a naive implementation
> would suggest.  9.1.11 is a bit more discouraged than previous releases,
> but it's also less likely to produce surprising results when the function
> is genuinely volatile.
>
>             regards, tom lane