Pushing restrictions down into GROUP BYs? - Mailing list pgsql-hackers

From Jay Levitt
Subject Pushing restrictions down into GROUP BYs?
Date
Msg-id 505DFD34.9020206@gmail.com
Whole thread Raw
List pgsql-hackers
A while back, I posted a pathological minimal-case query where, in order 
to select one row from a users table, Postgres needed to scan the whole 
users table, because the restriction was not visible to the GROUP BY.

At the time, Tom wrote:
> Don't hold your breath waiting for that to change.  To do what you're> wishing for, we'd have to treat the GROUP BY
subqueryas if it were an> inner indexscan, and push a join condition into it.  That's not even> possible today.  It
mightbe possible after I get done with the> parameterized-path stuff I've been speculating about for a couple of> years
now;but I suspect that even if it is possible, we won't do it> for subqueries because of the planner-performance hit
we'dtake from> repeatedly replanning the same subquery.
 

http://archives.postgresql.org/pgsql-performance/2011-11/msg00131.php

Given all the work that's been done on parameterized paths and LATERAL, 
is this something to revisit yet?  In 9.3, it's at least possible to 
manually tweak the SRF, thanks to LATERAL. But it'd be great to allow 
set-returning functions to remain blissfully unaware of their 
surroundings. Modular code, Single Responsibility Principle, all that.

I guess a more general question is: Are there cases where the planner 
can *use* LATERAL functionality to push down restrictions like this? (Do 
LATERAL and pushdown conceptually overlap? I think maybe they do.)

Example code below - and before you say "but you could just use 'where 
exists'", trust me that the original queries were much more involved :)

-----

    drop schema if exists jaytest cascade;    create schema jaytest;    set search_path to jaytest;
    create table questions (      id int not null primary key,      user_id int not null    );    insert into questions
    select generate_series(1,1100), (random()*2000000)::int;
 
    create table users (      id int not null primary key    );    insert into users select generate_series(1,
2000000);
    vacuum freeze analyze;
    create function srf() returns table (user_id int) as $$      select u.id      from users as u      group by u.id
$$language sql stable;
 
    -- Option 1: Use the set-returning function
    explain analyze    select questions.id    from questions    join srf()    on srf.user_id = questions.user_id
wherequestions.id = 1;
 
    -- Option 2: Use the equivalent of the set-returning function    -- (remove any doubts about function call
overhead)
    explain analyze    select questions.id    from questions    join (      select u.id as user_id      from users as u
    group by u.id    ) as srf1    on srf1.user_id = questions.user_id    where questions.id = 1;
 
    -- Option 3: Now that we have LATERAL, manually    -- push the join down into the set-returning function
    create function srf_lateral(questions_user_id int) returns table 
(user_id int) as $$      select u.id      from users as u      where u.id = questions_user_id      group by u.id    $$
languagesql stable;
 
    explain analyze    select questions.id    from questions, lateral srf_lateral(questions.user_id)    where
questions.id= 1;
 
    drop schema jaytest cascade;

-----

On my machine, options 1 and 2 usually have runtimes of 100-400 ms, and 
option 3 is about 0.04ms.



pgsql-hackers by date:

Previous
From: Thom Brown
Date:
Subject: Re: Caught up
Next
From: Tom Lane
Date:
Subject: Re: [PATCH] lock_timeout and common SIGALRM framework