Home > mailing lists

Re: any hope for my big query? - Mailing list pgsql-performance

From	Edoardo Ceccarelli
Subject	Re: any hope for my big query?
Date	September 29, 2006 06:19:40
Msg-id	451CE517.3040903@axa.it Whole thread Raw
In response to	any hope for my big query? (Ben <bench@silentmedia.com>)
List	pgsql-performance

Tree view

You have 2 seqscans on albumjoin table, you first make a simple join:

...and albumjoin.album = public.album.id ...

that generates the first
->  Seq Scan on albumjoin  (cost=0.00..88642.18 rows=5107318 width=8)
and then you group values from same table counting them with

... (select album from albumjoin group by album having count(*) between
15 and 25)  ...

that generates the second

Seq Scan on albumjoin  (cost=0.00..88642.18 rows=5107318 width=4)

given the complexity of the query, maybe you could create an
intermediate table with only one seqscan and use that one in final query
but I don't know if that's possible with the db structure you have


Can I ask what exactly is albumjoin table? is it a n-n relation?

>
> explain select
>     distinct public.album.id
> from
>     public.album,public.albumjoin,public.track,umdb.node
> where
>     node.dir=2811
>     and albumjoin.album = public.album.id
>     and public.albumjoin.track = public.track.id
>     and levenshtein(substring(public.track.name for 75),
>         substring(node.file for 75)) <= 10
>     and public.album.id in
>         (select album from albumjoin group by album having count(*)
> between 15 and 25) group by public.album.id
> having count(*) >= 5;
>
>
>  Unique  (cost=991430.53..1013711.74 rows=425772 width=4)
>    ->  GroupAggregate  (cost=991430.53..1012647.31 rows=425772 width=4)
>          Filter: (count(*) >= 5)
>          ->  Sort  (cost=991430.53..996373.93 rows=1977360 width=4)
>                Sort Key: album.id
>                ->  Nested Loop  (cost=513549.06..737866.68
> rows=1977360 width=4)
>                      Join Filter:
> (levenshtein("substring"(("inner".name)::text, 1, 75),
> "substring"("outer".file, 1, 75)) <= 10)
>                      ->  Index Scan using node_dir on node
> (cost=0.00..3.22 rows=16 width=40)
>                            Index Cond: (dir = 2811)
>                      ->  Materialize  (cost=513549.06..520153.61
> rows=370755 width=25)
>                            ->  Hash Join  (cost=271464.72..510281.31
> rows=370755 width=25)
>                                  Hash Cond: ("outer".id = "inner".track)
>                                  ->  Seq Scan on track
> (cost=0.00..127872.69 rows=5111469 width=25)
>                                  ->  Hash  (cost=268726.83..268726.83
> rows=370755 width=8)
>                                        ->  Hash Join
> (cost=150840.51..268726.83 rows=370755 width=8)
>                                              Hash Cond: ("outer".album
> = "inner".id)
>                                              ->  Seq Scan on
> albumjoin  (cost=0.00..88642.18 rows=5107318 width=8)
>                                              ->  Hash
> (cost=150763.24..150763.24 rows=30908 width=8)
>                                                    ->  Hash Join
> (cost=127951.57..150763.24 rows=30908 width=8)
>                                                          Hash Cond:
> ("outer".id = "inner".album)
>                                                          ->  Seq Scan
> on album  (cost=0.00..12922.72 rows=425772 width=4)
>                                                          ->  Hash
> (cost=127874.30..127874.30 rows=30908 width=4)
>                                                                ->
> HashAggregate  (cost=126947.06..127565.22 rows=30908 width=4)
>
> Filter: ((count(*) >= 15) AND (count(*) <= 25))
>
> ->  Seq Scan on albumjoin  (cost=0.00..88642.18 rows=5107318 width=4)
>
>
> I've tried adding a length(public.track.name) index and filtering
> public.track to those rows where length(name) is within a few
> characters of node.file, but that actually makes the plan more expensive.
>
> Is there any hope to make things much cheaper? Unfortunately, I can't
> filter out anything from the album or albumjoin tables.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>       subscribe-nomail command to majordomo@postgresql.org so that your
>       message can get through to the mailing list cleanly
>

pgsql-performance by date:

From: Heikki Linnakangas
Date: 29 September 2006, 06:12:45
Subject: Re: Performace Optimization for Dummies

From: Simon Riggs
Date: 29 September 2006, 07:58:44
Subject: Re: Performace Optimization for Dummies

Re: any hope for my big query? - Mailing list pgsql-performance

Previous

Next