Re: Subquery in a JOIN not getting restricted? - Mailing list pgsql-performance

From Jay Levitt
Subject Re: Subquery in a JOIN not getting restricted?
Date
Msg-id 4EBAE4F2.1090403@gmail.com
Whole thread Raw
In response to Re: Subquery in a JOIN not getting restricted?  ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>)
Responses Re: Subquery in a JOIN not getting restricted?
List pgsql-performance
Kevin Grittner wrote:
> Jay Levitt<jay.levitt@gmail.com>  wrote:
>
>> I don't get why the GROUP BY in this subquery forces it to scan
>> the entire users table (seq scan here, index scan on a larger
>> table) when there's only one row in users that can match:

> Are you sure there's a plan significantly faster than 1.3 ms?

Yep!  Watch this:

drop schema if exists jaytest cascade;
create schema jaytest;
set search_path to jaytest;

create table questions (
   id int not null primary key,
   user_id int not null
);
insert into questions
   select generate_series(1,1100), (random()*2000000)::int;

create table users (
   id int not null primary key
);
insert into users select generate_series(1, 2000000);

vacuum freeze analyze;

explain analyze
select questions.id
from questions
join (
   select u.id
   from users as u
   group by u.id
) as s
on s.id = questions.user_id
where questions.id = 1;

-----------------------
  Merge Join  (cost=8.28..90833.02 rows=1818 width=4) (actual
time=888.787..888.790 rows=1 loops=1)
    Merge Cond: (u.id = questions.user_id)
    ->  Group  (cost=0.00..65797.47 rows=2000000 width=4) (actual
time=0.017..735.509 rows=1747305 loops=1)
          ->  Index Scan using users_pkey on users u  (cost=0.00..60797.47
rows=2000000 width=4) (actual time=0.015..331.990 rows=1747305 loops=1)
    ->  Materialize  (cost=8.28..8.29 rows=1 width=8) (actual
time=0.013..0.015 rows=1 loops=1)
          ->  Sort  (cost=8.28..8.28 rows=1 width=8) (actual
time=0.012..0.013 rows=1 loops=1)
                Sort Key: questions.user_id
                Sort Method:  quicksort  Memory: 25kB
                ->  Index Scan using questions_pkey on questions
(cost=0.00..8.27 rows=1 width=8) (actual time=0.006..0.006 rows=1 loops=1)
                      Index Cond: (id = 1)
  Total runtime: 888.832 ms
(11 rows)

explain analyze
select questions.id
from questions
join (
   select u.id
   from users as u
) as s
on s.id = questions.user_id
where questions.id = 1;

-----------------------
  Nested Loop  (cost=0.00..16.77 rows=1 width=4) (actual time=0.019..0.021
rows=1 loops=1)
    ->  Index Scan using questions_pkey on questions  (cost=0.00..8.27
rows=1 width=8) (actual time=0.008..0.009 rows=1 loops=1)
          Index Cond: (id = 1)
    ->  Index Scan using users_pkey on users u  (cost=0.00..8.49 rows=1
width=4) (actual time=0.007..0.007 rows=1 loops=1)
          Index Cond: (u.id = questions.user_id)
  Total runtime: 0.045 ms
(6 rows)

> That said, there might be some room for an optimization which pushes
> that test into the query with the "group by" clause.  I don't know
> if there's a problem with that which I'm missing, the construct was
> judged to be too rare to be worth the cost of testing for it, or
> it's just that nobody has yet gotten to it.

Anyone have more insights on whether this is hard to optimize or simply
not-yet-optimized?  And if the latter, where might I start looking? (Not
that you -really- want me to submit a patch; my C has regressed to the "try
an ampersand.  OK, try an asterisk." level...)

Jay

pgsql-performance by date:

Previous
From: "Kevin Grittner"
Date:
Subject: Re: Subquery in a JOIN not getting restricted?
Next
From: Sorin Dudui
Date:
Subject: IMMUTABLE STABLE functions, daily updates