Thread: ARRAY(subquery) volatility
Why does the first query below return the same value for each row while the second query returns random values? Planner optimization? test=> SELECT ARRAY(SELECT random()) FROM generate_series(1, 5); ?column? ---------------------{0.269273371561092}{0.269273371561092}{0.269273371561092}{0.269273371561092}{0.269273371561092} (5 rows) test=> SELECT ARRAY(SELECT random() + x * 0) FROM generate_series(1, 5) AS g(x); ?column? ---------------------{0.826863945846848}{0.42534113182935}{0.36419924318986}{0.258920902972538}{0.843205466327819} (5 rows) -- Michael Fuhr
Michael Fuhr wrote: > Why does the first query below return the same value for each row > while the second query returns random values? Planner optimization? I assume it is due to some kind of flattening in the planner, but it is totally unrelated to ARRAY(subquery): regression=# SELECT (SELECT random()) as f FROM generate_series(1, 5); f ------------------- 0.752416231088534 0.752416231088534 0.752416231088534 0.752416231088534 0.752416231088534 (5 rows) regression=# SELECT (SELECT random() + 0 * f) as f FROM generate_series(1, 5) as t(f); f ------------------- 0.176055165555354 0.608546747178094 0.55303416240636 0.127355110425202 0.21671894063089 (5 rows) Here's another example: regression=# create table t1(f text); CREATE TABLE regression=# insert into t1 values('0'); INSERT 17366 1 regression=# insert into t1 values('0'); INSERT 17367 1 regression=# insert into t1 values('0'); INSERT 17368 1 regression=# SELECT (SELECT timeofday()::timestamp + f::interval) as f FROM t1; f ---------------------------- 2005-08-16 16:38:12.738215 2005-08-16 16:38:12.738292 2005-08-16 16:38:12.738315 (3 rows) regression=# SELECT (SELECT timeofday()::timestamp) as f FROM t1; f ---------------------------- 2005-08-16 16:35:33.100791 2005-08-16 16:35:33.100791 2005-08-16 16:35:33.100791 (3 rows) This shows that it isn't related to use of an SRF either. Joe
Michael Fuhr <mike@fuhr.org> writes: > Why does the first query below return the same value for each row > while the second query returns random values? Planner optimization? > test=> SELECT ARRAY(SELECT random()) FROM generate_series(1, 5); > test=> SELECT ARRAY(SELECT random() + x * 0) FROM generate_series(1, 5) AS g(x); The sub-SELECT in the first one is considered an uncorrelated subquery, so you get a plan that evaluates the subquery just once: Function Scan on generate_series (cost=0.01..12.51 rows=1000 width=0) InitPlan -> Result (cost=0.00..0.01 rows=1 width=0) In the second case, x is an outer parameter to the subquery, so it has to be re-evaluated for each row of the outer query: Function Scan on generate_series g (cost=0.00..32.50 rows=1000 width=4) SubPlan -> Result (cost=0.00..0.02 rows=1width=0) Note the "InitPlan" vs "SubPlan" labels --- they look similar, but the evaluation rules are totally different. The fact that there's a volatile function in the subquery isn't considered while making this decision. I'm not sure if it should be. regards, tom lane