Home > mailing lists

Re: any hope for my big query? - Mailing list pgsql-performance

From	Shaun Thomas
Subject	Re: any hope for my big query?
Date	October 2, 2006 15:54:15
Msg-id	200610021032.36181.sthomas@leapfrogonline.com Whole thread Raw
In response to	any hope for my big query? (Ben <bench@silentmedia.com>)
List	pgsql-performance

Tree view

On Thursday 28 September 2006 17:18, Ben wrote:

> explain select
>      distinct public.album.id
> from
>      public.album,public.albumjoin,public.track,umdb.node
> where
>      node.dir=2811
>      and albumjoin.album = public.album.id
>      and public.albumjoin.track = public.track.id
>      and levenshtein(substring(public.track.name for 75),
>          substring(node.file for 75)) <= 10
>      and public.album.id in
>          (select album from albumjoin group by album having count(*) between
15 and 25)
> group by public.album.id
> having count(*) >= 5;

If I'm reading this right, you want all the albums with 15-25 entries in
album join having 5 or more tracks that are (soundex type) similar to
other nodes.  Knowing that, you can also try something like this:

select a.album
  from (select album,track from albumjoin group by album having count(1)
between 15 and 25) a
  join public.track t on (a.track = t.id)
  join umdb.node n on (levenshtein(substring(t.name for 75),
substring(n.file for 75)) < 9)
 where n.dir = 2811
 group by a.album
 having count(1) > 4;

This removes two of your tables, since you were only interested in
albums with 15-25 albumjoins, and weren't actually using any album data
other than the ID, which albumjoin supplies.  Your subselect is now an
integral part of the whole query, being treated like a temp table that
only supplies album IDs with 15-25 albumjoins.  From there, add on the
track information, and use that to restrict the matching nodes.  Your
explain should be better with the above.

Just remember with the levenshtein in there, you're forcing a sequence
scan on the node table.  Depending on how big that table is, you may
not be able to truly optimize this.

--

Shaun Thomas
Database Administrator

Leapfrog Online
807 Greenwood Street
Evanston, IL 60201
Tel. 847-440-8253
Fax. 847-570-5750
www.leapfrogonline.com

pgsql-performance by date:

From: "Marcelo Costa"
Date: 02 October 2006, 15:50:14
Subject: How much memory in 32 bits Architecture to Shared Buffers is Possible

From: Simon Riggs
Date: 02 October 2006, 16:08:36
Subject: Re: Optimizing queries

Re: any hope for my big query? - Mailing list pgsql-performance

Previous

Next