Re: Subquery in a JOIN not getting restricted? - Mailing list pgsql-performance

From Gavin Flower
Subject Re: Subquery in a JOIN not getting restricted?
Date
Msg-id 4EBE3C4D.9010208@archidevsys.co.nz
Whole thread Raw
In response to Re: Subquery in a JOIN not getting restricted?  (Jay Levitt <jay.levitt@gmail.com>)
List pgsql-performance
On 10/11/11 09:39, Jay Levitt wrote:
> Kevin Grittner wrote:
>> Jay Levitt<jay.levitt@gmail.com>  wrote:
>>
>>> I don't get why the GROUP BY in this subquery forces it to scan
>>> the entire users table (seq scan here, index scan on a larger
>>> table) when there's only one row in users that can match:
>
>> Are you sure there's a plan significantly faster than 1.3 ms?
>
> Yep!  Watch this:
>
> drop schema if exists jaytest cascade;
> create schema jaytest;
> set search_path to jaytest;
>
> create table questions (
>   id int not null primary key,
>   user_id int not null
> );
> insert into questions
>   select generate_series(1,1100), (random()*2000000)::int;
>
> create table users (
>   id int not null primary key
> );
> insert into users select generate_series(1, 2000000);
>
> vacuum freeze analyze;
>
> explain analyze
> select questions.id
> from questions
> join (
>   select u.id
>   from users as u
>   group by u.id
> ) as s
> on s.id = questions.user_id
> where questions.id = 1;
>
> -----------------------
>  Merge Join  (cost=8.28..90833.02 rows=1818 width=4) (actual
> time=888.787..888.790 rows=1 loops=1)
>    Merge Cond: (u.id = questions.user_id)
>    ->  Group  (cost=0.00..65797.47 rows=2000000 width=4) (actual
> time=0.017..735.509 rows=1747305 loops=1)
>          ->  Index Scan using users_pkey on users u
> (cost=0.00..60797.47 rows=2000000 width=4) (actual time=0.015..331.990
> rows=1747305 loops=1)
>    ->  Materialize  (cost=8.28..8.29 rows=1 width=8) (actual
> time=0.013..0.015 rows=1 loops=1)
>          ->  Sort  (cost=8.28..8.28 rows=1 width=8) (actual
> time=0.012..0.013 rows=1 loops=1)
>                Sort Key: questions.user_id
>                Sort Method:  quicksort  Memory: 25kB
>                ->  Index Scan using questions_pkey on questions
> (cost=0.00..8.27 rows=1 width=8) (actual time=0.006..0.006 rows=1
> loops=1)
>                      Index Cond: (id = 1)
>  Total runtime: 888.832 ms
> (11 rows)
>
> explain analyze
> select questions.id
> from questions
> join (
>   select u.id
>   from users as u
> ) as s
> on s.id = questions.user_id
> where questions.id = 1;
>
> -----------------------
>  Nested Loop  (cost=0.00..16.77 rows=1 width=4) (actual
> time=0.019..0.021 rows=1 loops=1)
>    ->  Index Scan using questions_pkey on questions  (cost=0.00..8.27
> rows=1 width=8) (actual time=0.008..0.009 rows=1 loops=1)
>          Index Cond: (id = 1)
>    ->  Index Scan using users_pkey on users u  (cost=0.00..8.49 rows=1
> width=4) (actual time=0.007..0.007 rows=1 loops=1)
>          Index Cond: (u.id = questions.user_id)
>  Total runtime: 0.045 ms
> (6 rows)
>
>> That said, there might be some room for an optimization which pushes
>> that test into the query with the "group by" clause.  I don't know
>> if there's a problem with that which I'm missing, the construct was
>> judged to be too rare to be worth the cost of testing for it, or
>> it's just that nobody has yet gotten to it.
>
> Anyone have more insights on whether this is hard to optimize or
> simply not-yet-optimized?  And if the latter, where might I start
> looking? (Not that you -really- want me to submit a patch; my C has
> regressed to the "try an ampersand.  OK, try an asterisk." level...)
>
> Jay
>
Minor note:

'PRIMARY KEY' gives you a 'NOT NULL'  constraint for free.

pgsql-performance by date:

Previous
From: kzsolt
Date:
Subject: Re: Heavy contgnous load
Next
From: Pavel Stehule
Date:
Subject: Re: Heavy contgnous load