On Monday, May 5, 2003, at 07:28 PM, Bruno Wolff III wrote:
>> I am converting some existing query examples that use the form
>> SUM( dict.word IN ( word list...))
>> which doesn't work with pgsql.
>
> You want to do this outside the sum, not inside (and change sum to
> count). For example:
> select count(dict.word) from dict where word in (word list ...);
> (If word list is a select rather than a list of constants, you
> currently
> want to use exists instead for performance reasons. In 7.4 that won't
> be a problem.)
>
> By putting things in the where close you don't have to examine all of
> the rows for each word - url pair. This may cut out a lot of rows that
> need to be examined if there is an index on dict.word and the number of
> words in a url is large relative to the number being searched on.
>
Bruno: Thanks! Using your advice, my queries are now < 1 sec! Using
this form
SELECT url_id, url.url, url.txt, url.content_type,url.docsize, count(dict.word) AS rank
FROM url,dict WHERE word IN ('word1', 'word2', ...)
AND url.rec_id = dict.url_id
GROUP BY url_id, url, txt, content_type, docsize
ORDER BY rank DESC;
I have also upgraded to pgsql 7.3.2, and the query I posted yesterday
"sum( case ... ) " is a lot faster as well (about 2 seconds now). So I
think maybe there was also something wrong with my indices yesterday-
in addition to there being something wrong with my SQL. :-)
Thanks for the SQL advice.
Alex Rice, Software Developer
Architectural Research Consultants, Inc.
alrice@ARCplanning.com
alrice@swcp.com