Re: any hope for my big query? - Mailing list pgsql-performance
From | Edoardo Ceccarelli |
---|---|
Subject | Re: any hope for my big query? |
Date | |
Msg-id | 451CE517.3040903@axa.it Whole thread Raw |
In response to | any hope for my big query? (Ben <bench@silentmedia.com>) |
List | pgsql-performance |
You have 2 seqscans on albumjoin table, you first make a simple join: ...and albumjoin.album = public.album.id ... that generates the first -> Seq Scan on albumjoin (cost=0.00..88642.18 rows=5107318 width=8) and then you group values from same table counting them with ... (select album from albumjoin group by album having count(*) between 15 and 25) ... that generates the second Seq Scan on albumjoin (cost=0.00..88642.18 rows=5107318 width=4) given the complexity of the query, maybe you could create an intermediate table with only one seqscan and use that one in final query but I don't know if that's possible with the db structure you have Can I ask what exactly is albumjoin table? is it a n-n relation? > > explain select > distinct public.album.id > from > public.album,public.albumjoin,public.track,umdb.node > where > node.dir=2811 > and albumjoin.album = public.album.id > and public.albumjoin.track = public.track.id > and levenshtein(substring(public.track.name for 75), > substring(node.file for 75)) <= 10 > and public.album.id in > (select album from albumjoin group by album having count(*) > between 15 and 25) group by public.album.id > having count(*) >= 5; > > > Unique (cost=991430.53..1013711.74 rows=425772 width=4) > -> GroupAggregate (cost=991430.53..1012647.31 rows=425772 width=4) > Filter: (count(*) >= 5) > -> Sort (cost=991430.53..996373.93 rows=1977360 width=4) > Sort Key: album.id > -> Nested Loop (cost=513549.06..737866.68 > rows=1977360 width=4) > Join Filter: > (levenshtein("substring"(("inner".name)::text, 1, 75), > "substring"("outer".file, 1, 75)) <= 10) > -> Index Scan using node_dir on node > (cost=0.00..3.22 rows=16 width=40) > Index Cond: (dir = 2811) > -> Materialize (cost=513549.06..520153.61 > rows=370755 width=25) > -> Hash Join (cost=271464.72..510281.31 > rows=370755 width=25) > Hash Cond: ("outer".id = "inner".track) > -> Seq Scan on track > (cost=0.00..127872.69 rows=5111469 width=25) > -> Hash (cost=268726.83..268726.83 > rows=370755 width=8) > -> Hash Join > (cost=150840.51..268726.83 rows=370755 width=8) > Hash Cond: ("outer".album > = "inner".id) > -> Seq Scan on > albumjoin (cost=0.00..88642.18 rows=5107318 width=8) > -> Hash > (cost=150763.24..150763.24 rows=30908 width=8) > -> Hash Join > (cost=127951.57..150763.24 rows=30908 width=8) > Hash Cond: > ("outer".id = "inner".album) > -> Seq Scan > on album (cost=0.00..12922.72 rows=425772 width=4) > -> Hash > (cost=127874.30..127874.30 rows=30908 width=4) > -> > HashAggregate (cost=126947.06..127565.22 rows=30908 width=4) > > Filter: ((count(*) >= 15) AND (count(*) <= 25)) > > -> Seq Scan on albumjoin (cost=0.00..88642.18 rows=5107318 width=4) > > > I've tried adding a length(public.track.name) index and filtering > public.track to those rows where length(name) is within a few > characters of node.file, but that actually makes the plan more expensive. > > Is there any hope to make things much cheaper? Unfortunately, I can't > filter out anything from the album or albumjoin tables. > > ---------------------------(end of broadcast)--------------------------- > TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly >
pgsql-performance by date: