Re: 2 simple SQL questions: optimizing aggegate query - Mailing list pgsql-sql

From Bruno Wolff III
Subject Re: 2 simple SQL questions: optimizing aggegate query
Date
Msg-id 20030506012818.GA8384@wolff.to
Whole thread Raw
In response to Re: 2 simple SQL questions: optimizing aggegate query  (Alex Rice <alrice@ARCplanning.com>)
Responses Cast type int2vector/array_field to character  ("Mr.F" <fhf_it@csahome.com>)
Re: 2 simple SQL questions: optimizing aggegate query  (Alex Rice <alrice@ARCplanning.com>)
List pgsql-sql
On Mon, May 05, 2003 at 14:03:18 -0600, Alex Rice <alrice@ARCplanning.com> wrote:
> 
> On Monday, May 5, 2003, at 01:45  PM, Bruno Wolff III wrote:
> 
> >>SELECT url.rec_id, url, title, content_type, txt,
> >>sum(
> >>  case
> >>   when dict.word = 'wordx' then 1
> >>   when dict.word = 'wordx' then 1
> >>   when dict.word = 'wordy' then 1
> >>   else 0
> >>  end
> >>) as rank
> >>FROM dict, url
> >>WHERE url.rec_id = dict.url_id
> >>GROUP BY rec_id, url, title, content_type, txt
> >>ORDER BY rank DESC
> >
> >Is there some reason you can't put the word list in the where clause?
> >(You can use coalesce to set NULL sums to 0.)
> 
> Can you give an example of what you mean use coalesce and how to use it 
> to produce a count of the matched words, the rank?

You don't really need coalesce. I mentioned that because if you take
the sum of zero rows (the case where no words match for one of the returned
rows) then you get NULL instead of 0 (which may or may not be a problem
for you).

> I am converting some existing query examples that use the form
> SUM( dict.word IN ( word list...))
> which doesn't work with pgsql.

You want to do this outside the sum, not inside (and change sum to
count). For example:
select count(dict.word) from dict where word in (word list ...);
(If word list is a select rather than a list of constants, you currently
want to use exists instead for performance reasons. In 7.4 that won't
be a problem.)

By putting things in the where close you don't have to examine all of
the rows for each word - url pair. This may cut out a lot of rows that
need to be examined if there is an index on dict.word and the number of
words in a url is large relative to the number being searched on.



pgsql-sql by date:

Previous
From: Jeff Eckermann
Date:
Subject: Re: Replication for a large database
Next
From: Rajesh Kumar Mallah
Date:
Subject: Re: mnogosearch examples