Thread: BUG #8771: Query execution plan broken after upgrade from 9.1.9
BUG #8771: Query execution plan broken after upgrade from 9.1.9
From
martin.junek@tracmap.co.nz
Date:
The following bug has been logged on the website: Bug reference: 8771 Logged by: Martin Junek Email address: martin.junek@tracmap.co.nz PostgreSQL version: 9.1.11 Operating system: Ubuntu 12.04 Description: Hi there, after upgrading from 9.1.9 to 9.1.11 one of our queries started to run terribly slow (went from few miliseconds to hours). The problem is better explained in the following SQL snippet (which is a very simplified version of the problem). If you run it on 9.1.9, all the SELECTs evaluate in few miliseconds, if you run it on 9.1.11, it will take probably hours (I didn't have the patience to wait for it). Because it works fine in the older version, and also because there's nothing conceptually wrong with the query, I believe it's a bug that should be reported and fixed. Use the following SQL script to reproduce the problem. -- prepare underlying table with 1m records create table t1(id int primary key, val double precision); insert into t1 select generate_series(1, 1000000), random() * 10000000; -- a function that just takes long (will be used in the sub-query) create or replace function takes_long(double precision) returns int as $$ select max(id) from t1 where val = $1; -- just to force a slow seq_scan $$ language sql; -- create view, that calls the slow function for each row create view v as select id, val, (select takes_long(val)) as whatever from t1; -- this works fine (calls the slow function only once, as expected) select * from v where id in (50000) -- in 9.1.9 it will limit the one row and evaluates the slow function only once (=expected) -- in 9.1.11 this will run for hours - for some reason evaluates the slow function for each row (million times) select * from v where id in (select 50000) -- also this will run for hours in 9.1.11 (even though it doesn't even have to call the function at all - it's not in the select list) select id from v where id in (select 50000)
martin.junek@tracmap.co.nz writes: > after upgrading from 9.1.9 to 9.1.11 one of our queries started to run > terribly slow (went from few miliseconds to hours). The problem is better > explained in the following SQL snippet (which is a very simplified version > of the problem). If you run it on 9.1.9, all the SELECTs evaluate in few > miliseconds, if you run it on 9.1.11, it will take probably hours (I didn't > have the patience to wait for it). I believe this is a result of this 9.1.11 change: * Avoid flattening a subquery whose SELECT list contains a volatile function wrapped inside a sub-SELECT (Tom Lane) This avoids unexpected results due to extra evaluations of the volatile function. full details of which can be found here: http://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=af38d140c71c21eda422fedc838525525d155cac That's an intentional change that is not going to get undone. If you don't like the results, I'd suggest fixing the "slow function" to get it marked as stable or immutable as appropriate. (The given example would be best marked stable, but I suppose it's just an example and not your real problem function.) When it's marked volatile, as this is by default, that discourages the planner from rearranging the query in ways that would change the number of function executions from what a naive implementation would suggest. 9.1.11 is a bit more discouraged than previous releases, but it's also less likely to produce surprising results when the function is genuinely volatile. regards, tom lane
Hi Tom, thanks for the explanation. However, it sounds like a terrible side effect of the change you mentioned. Also in this example the function is not volatile. Couple more questions: - why it works as expected when I use " where id in (50000)" and does not work if used "where id in (select 50000)"? Will it still work that way in the foreseeable future? - why the function gets executed when the column is not in the select list at all? i.e. when I use "select id from...."? re "optimizing the slow function" - this example is very simplified just to demonstrate the problem (so the function is intentionally slow). In our case it's a big view containing tens of subqueries. The view would contain many records but we only need 50 (=one page of results) so our query was designed to evaluate all the subqueries only for the 50 records. Now if we upgrade to 9.1.11... you see the problem. In our case there's a workaround (split the query into two separate queries), but it just feels wrong. I'm also surprised that other users haven't encountered this issue. I'd appreciate if you reconsider this to be a bug in the query optimizer. Thanks for your help. Martin. On 11/01/14 14:32, Tom Lane wrote: > martin.junek@tracmap.co.nz writes: >> after upgrading from 9.1.9 to 9.1.11 one of our queries started to run >> terribly slow (went from few miliseconds to hours). The problem is better >> explained in the following SQL snippet (which is a very simplified version >> of the problem). If you run it on 9.1.9, all the SELECTs evaluate in few >> miliseconds, if you run it on 9.1.11, it will take probably hours (I didn't >> have the patience to wait for it). > I believe this is a result of this 9.1.11 change: > > * Avoid flattening a subquery whose SELECT list contains a volatile function wrapped inside a sub-SELECT (Tom Lane) > > This avoids unexpected results due to extra evaluations of the volatile function. > > full details of which can be found here: > http://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=af38d140c71c21eda422fedc838525525d155cac > > That's an intentional change that is not going to get undone. If you > don't like the results, I'd suggest fixing the "slow function" to get it > marked as stable or immutable as appropriate. (The given example would be > best marked stable, but I suppose it's just an example and not your real > problem function.) When it's marked volatile, as this is by default, > that discourages the planner from rearranging the query in ways that would > change the number of function executions from what a naive implementation > would suggest. 9.1.11 is a bit more discouraged than previous releases, > but it's also less likely to produce surprising results when the function > is genuinely volatile. > > regards, tom lane
Hi Tom, I just noticed that the function in the example wasn't explicitly defined as 'stable'. If I change the function definition to STABLE, the problem goes away. And the same problem was in our code. Thank you for pointing me in the right direction, that was very helpful. Martin. On 11/01/14 14:32, Tom Lane wrote: > martin.junek@tracmap.co.nz writes: >> after upgrading from 9.1.9 to 9.1.11 one of our queries started to run >> terribly slow (went from few miliseconds to hours). The problem is better >> explained in the following SQL snippet (which is a very simplified version >> of the problem). If you run it on 9.1.9, all the SELECTs evaluate in few >> miliseconds, if you run it on 9.1.11, it will take probably hours (I didn't >> have the patience to wait for it). > I believe this is a result of this 9.1.11 change: > > * Avoid flattening a subquery whose SELECT list contains a volatile function wrapped inside a sub-SELECT (Tom Lane) > > This avoids unexpected results due to extra evaluations of the volatile function. > > full details of which can be found here: > http://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=af38d140c71c21eda422fedc838525525d155cac > > That's an intentional change that is not going to get undone. If you > don't like the results, I'd suggest fixing the "slow function" to get it > marked as stable or immutable as appropriate. (The given example would be > best marked stable, but I suppose it's just an example and not your real > problem function.) When it's marked volatile, as this is by default, > that discourages the planner from rearranging the query in ways that would > change the number of function executions from what a naive implementation > would suggest. 9.1.11 is a bit more discouraged than previous releases, > but it's also less likely to produce surprising results when the function > is genuinely volatile. > > regards, tom lane