Re: any hope for my big query? - Mailing list pgsql-performance

From Ben
Subject Re: any hope for my big query?
Date
Msg-id Pine.LNX.4.64.0610041423340.21293@GRD.cube42.tai.silentmedia.com
Whole thread Raw
In response to Re: any hope for my big query?  ("Jim C. Nasby" <jim@nasby.net>)
Responses Re: any hope for my big query?
List pgsql-performance
On Fri, 29 Sep 2006, Jim C. Nasby wrote:

> There's no join criteria for umdb.node... is that really what you want?
>

Unfortunately, yes, it is.

I've taken in all of everybody's helpful advice (thanks!) and reworked
things a little, and now I'm left with this expensive nugget:

select aj.album from
(select seconds-1 as a,seconds+1 as b from node where node.dir = 6223) n
join public.track t
on (t.length between n.a*1000 and n.b*1000)
join public.albumjoin aj
on (aj.track = t.id)
join (select id from public.albummeta am where tracks between 3 and 7) lam
on (lam.id = aj.album)
group by aj.album having count(*) >= 4;

...which comes out to be:

  HashAggregate  (cost=904444.69..904909.99 rows=31020 width=4)
    Filter: (count(*) >= 4)
    ->  Nested Loop  (cost=428434.81..897905.17 rows=1307904 width=4)
          Join Filter: (("inner".length >= (("outer".seconds - 1) * 1000)) AND ("inner".length <= (("outer".seconds +
1)* 1000))) 
          ->  Index Scan using node_dir on node  (cost=0.00..3.46 rows=17 width=4)
                Index Cond: (dir = 6223)
          ->  Materialize  (cost=428434.81..438740.01 rows=692420 width=8)
                ->  Hash Join  (cost=210370.58..424361.39 rows=692420 width=8)
                      Hash Cond: ("outer".id = "inner".track)
                      ->  Seq Scan on track t  (cost=0.00..128028.41 rows=5123841 width=8)
                      ->  Hash  (cost=205258.53..205258.53 rows=692420 width=8)
                            ->  Hash Join  (cost=6939.10..205258.53 rows=692420 width=8)
                                  Hash Cond: ("outer".album = "inner".id)
                                  ->  Seq Scan on albumjoin aj  (cost=0.00..88918.41 rows=5123841 width=8)
                                  ->  Hash  (cost=6794.51..6794.51 rows=57834 width=4)
                                        ->  Bitmap Heap Scan on albummeta am  (cost=557.00..6794.51 rows=57834 width=4)
                                              Recheck Cond: ((tracks >= 3) AND (tracks <= 7))
                                              ->  Bitmap Index Scan on albummeta_tracks_index  (cost=0.00..557.00
rows=57834width=0) 
                                                    Index Cond: ((tracks >= 3) AND (tracks <= 7))
(19 rows)


I'm surprised (though probably just because I'm ignorant) that it would
have so much sequential scanning in there. For instance, because n is
going to have at most a couple dozen rows, it seems that instead of
scanning all of public.track, it should be able to convert my "t.length
between a and b" clause to some between statements or'd together. Or at
least, it would be nice if the planner could do that. :)


pgsql-performance by date:

Previous
From: "Merlin Moncure"
Date:
Subject: Re: Performance Optimization for Dummies 2 - the SQL
Next
From: "Carlo Stonebanks"
Date:
Subject: Re: Performance Optimization for Dummies 2 - the SQL