Re: Subquery in a JOIN not getting restricted? - Mailing list pgsql-performance
From | Gavin Flower |
---|---|
Subject | Re: Subquery in a JOIN not getting restricted? |
Date | |
Msg-id | 4EBE3C4D.9010208@archidevsys.co.nz Whole thread Raw |
In response to | Re: Subquery in a JOIN not getting restricted? (Jay Levitt <jay.levitt@gmail.com>) |
List | pgsql-performance |
On 10/11/11 09:39, Jay Levitt wrote: > Kevin Grittner wrote: >> Jay Levitt<jay.levitt@gmail.com> wrote: >> >>> I don't get why the GROUP BY in this subquery forces it to scan >>> the entire users table (seq scan here, index scan on a larger >>> table) when there's only one row in users that can match: > >> Are you sure there's a plan significantly faster than 1.3 ms? > > Yep! Watch this: > > drop schema if exists jaytest cascade; > create schema jaytest; > set search_path to jaytest; > > create table questions ( > id int not null primary key, > user_id int not null > ); > insert into questions > select generate_series(1,1100), (random()*2000000)::int; > > create table users ( > id int not null primary key > ); > insert into users select generate_series(1, 2000000); > > vacuum freeze analyze; > > explain analyze > select questions.id > from questions > join ( > select u.id > from users as u > group by u.id > ) as s > on s.id = questions.user_id > where questions.id = 1; > > ----------------------- > Merge Join (cost=8.28..90833.02 rows=1818 width=4) (actual > time=888.787..888.790 rows=1 loops=1) > Merge Cond: (u.id = questions.user_id) > -> Group (cost=0.00..65797.47 rows=2000000 width=4) (actual > time=0.017..735.509 rows=1747305 loops=1) > -> Index Scan using users_pkey on users u > (cost=0.00..60797.47 rows=2000000 width=4) (actual time=0.015..331.990 > rows=1747305 loops=1) > -> Materialize (cost=8.28..8.29 rows=1 width=8) (actual > time=0.013..0.015 rows=1 loops=1) > -> Sort (cost=8.28..8.28 rows=1 width=8) (actual > time=0.012..0.013 rows=1 loops=1) > Sort Key: questions.user_id > Sort Method: quicksort Memory: 25kB > -> Index Scan using questions_pkey on questions > (cost=0.00..8.27 rows=1 width=8) (actual time=0.006..0.006 rows=1 > loops=1) > Index Cond: (id = 1) > Total runtime: 888.832 ms > (11 rows) > > explain analyze > select questions.id > from questions > join ( > select u.id > from users as u > ) as s > on s.id = questions.user_id > where questions.id = 1; > > ----------------------- > Nested Loop (cost=0.00..16.77 rows=1 width=4) (actual > time=0.019..0.021 rows=1 loops=1) > -> Index Scan using questions_pkey on questions (cost=0.00..8.27 > rows=1 width=8) (actual time=0.008..0.009 rows=1 loops=1) > Index Cond: (id = 1) > -> Index Scan using users_pkey on users u (cost=0.00..8.49 rows=1 > width=4) (actual time=0.007..0.007 rows=1 loops=1) > Index Cond: (u.id = questions.user_id) > Total runtime: 0.045 ms > (6 rows) > >> That said, there might be some room for an optimization which pushes >> that test into the query with the "group by" clause. I don't know >> if there's a problem with that which I'm missing, the construct was >> judged to be too rare to be worth the cost of testing for it, or >> it's just that nobody has yet gotten to it. > > Anyone have more insights on whether this is hard to optimize or > simply not-yet-optimized? And if the latter, where might I start > looking? (Not that you -really- want me to submit a patch; my C has > regressed to the "try an ampersand. OK, try an asterisk." level...) > > Jay > Minor note: 'PRIMARY KEY' gives you a 'NOT NULL' constraint for free.
pgsql-performance by date: