A while back, I posted a pathological minimal-case query where, in order
to select one row from a users table, Postgres needed to scan the whole
users table, because the restriction was not visible to the GROUP BY.
At the time, Tom wrote:
> Don't hold your breath waiting for that to change. To do what you're> wishing for, we'd have to treat the GROUP BY
subqueryas if it were an> inner indexscan, and push a join condition into it. That's not even> possible today. It
mightbe possible after I get done with the> parameterized-path stuff I've been speculating about for a couple of> years
now;but I suspect that even if it is possible, we won't do it> for subqueries because of the planner-performance hit
we'dtake from> repeatedly replanning the same subquery.
http://archives.postgresql.org/pgsql-performance/2011-11/msg00131.php
Given all the work that's been done on parameterized paths and LATERAL,
is this something to revisit yet? In 9.3, it's at least possible to
manually tweak the SRF, thanks to LATERAL. But it'd be great to allow
set-returning functions to remain blissfully unaware of their
surroundings. Modular code, Single Responsibility Principle, all that.
I guess a more general question is: Are there cases where the planner
can *use* LATERAL functionality to push down restrictions like this? (Do
LATERAL and pushdown conceptually overlap? I think maybe they do.)
Example code below - and before you say "but you could just use 'where
exists'", trust me that the original queries were much more involved :)
-----
drop schema if exists jaytest cascade; create schema jaytest; set search_path to jaytest;
create table questions ( id int not null primary key, user_id int not null ); insert into questions
select generate_series(1,1100), (random()*2000000)::int;
create table users ( id int not null primary key ); insert into users select generate_series(1,
2000000);
vacuum freeze analyze;
create function srf() returns table (user_id int) as $$ select u.id from users as u group by u.id
$$language sql stable;
-- Option 1: Use the set-returning function
explain analyze select questions.id from questions join srf() on srf.user_id = questions.user_id
wherequestions.id = 1;
-- Option 2: Use the equivalent of the set-returning function -- (remove any doubts about function call
overhead)
explain analyze select questions.id from questions join ( select u.id as user_id from users as u
group by u.id ) as srf1 on srf1.user_id = questions.user_id where questions.id = 1;
-- Option 3: Now that we have LATERAL, manually -- push the join down into the set-returning function
create function srf_lateral(questions_user_id int) returns table
(user_id int) as $$ select u.id from users as u where u.id = questions_user_id group by u.id $$
languagesql stable;
explain analyze select questions.id from questions, lateral srf_lateral(questions.user_id) where
questions.id= 1;
drop schema jaytest cascade;
-----
On my machine, options 1 and 2 usually have runtimes of 100-400 ms, and
option 3 is about 0.04ms.