Thread: 2 simple SQL questions: optimizing aggegate query

2 simple SQL questions: optimizing aggegate query

From

Alex Rice

Date:

05 May 2003, 14:59:53

Hello, I am using Mnogosearch with a PostgreSQL backend. I am writing 
my own queries because I can't use the existing Perl or PHP front ends 
to Mnogosearch.

1) Is there a way to optimize this query? It takes ~6 seconds on my 
workstation :-( I would like to get it under 2 seconds.

SELECT url.rec_id, url, title, content_type, txt,
sum(  case   when dict.word = 'wordx' then 1   when dict.word = 'wordx' then 1   when dict.word = 'wordy' then 1   else
0 end
 
) as rank
FROM dict, url
WHERE url.rec_id = dict.url_id
GROUP BY rec_id, url, title, content_type, txt
ORDER BY rank DESC

2) In the above query, why can't I write "HAVING rank > 0"? instead of 
repeating the whole entire sum() expression "HAVING sum(...)"

Thanks in advance,

Alex Rice, Software Developer
Architectural Research Consultants, Inc.
alrice@ARCplanning.com
alrice@swcp.com

Re: 2 simple SQL questions: optimizing aggegate query

From

Bruno Wolff III

Date:

05 May 2003, 15:44:18

On Mon, May 05, 2003 at 12:59:47 -0600, Alex Rice <alrice@ARCplanning.com> wrote:
> Hello, I am using Mnogosearch with a PostgreSQL backend. I am writing 
> my own queries because I can't use the existing Perl or PHP front ends 
> to Mnogosearch.
> 
> 1) Is there a way to optimize this query? It takes ~6 seconds on my 
> workstation :-( I would like to get it under 2 seconds.
> 
> SELECT url.rec_id, url, title, content_type, txt,
> sum(
>   case
>    when dict.word = 'wordx' then 1
>    when dict.word = 'wordx' then 1
>    when dict.word = 'wordy' then 1
>    else 0
>   end
> ) as rank
> FROM dict, url
> WHERE url.rec_id = dict.url_id
> GROUP BY rec_id, url, title, content_type, txt
> ORDER BY rank DESC

Is there some reason you can't put the word list in the where clause?
(You can use coalesce to set NULL sums to 0.)

Re: 2 simple SQL questions: optimizing aggegate query

From

Alex Rice

Date:

05 May 2003, 16:03:25

On Monday, May 5, 2003, at 01:45  PM, Bruno Wolff III wrote:

>> SELECT url.rec_id, url, title, content_type, txt,
>> sum(
>>   case
>>    when dict.word = 'wordx' then 1
>>    when dict.word = 'wordx' then 1
>>    when dict.word = 'wordy' then 1
>>    else 0
>>   end
>> ) as rank
>> FROM dict, url
>> WHERE url.rec_id = dict.url_id
>> GROUP BY rec_id, url, title, content_type, txt
>> ORDER BY rank DESC
>
> Is there some reason you can't put the word list in the where clause?
> (You can use coalesce to set NULL sums to 0.)

Can you give an example of what you mean use coalesce and how to use it 
to produce a count of the matched words, the rank?

I am converting some existing query examples that use the form
SUM( dict.word IN ( word list...))
which doesn't work with pgsql.

I got the idea for the case statement from this archive message
http://archives.postgresql.org/pgsql-sql/1999-10/msg00136.php

Thanks much,

Alex Rice, Software Developer
Architectural Research Consultants, Inc.
alrice@ARCplanning.com
alrice@swcp.com

Re: 2 simple SQL questions: optimizing aggegate query

From

Bruno Wolff III

Date:

05 May 2003, 21:27:09

On Mon, May 05, 2003 at 14:03:18 -0600, Alex Rice <alrice@ARCplanning.com> wrote:
> 
> On Monday, May 5, 2003, at 01:45  PM, Bruno Wolff III wrote:
> 
> >>SELECT url.rec_id, url, title, content_type, txt,
> >>sum(
> >>  case
> >>   when dict.word = 'wordx' then 1
> >>   when dict.word = 'wordx' then 1
> >>   when dict.word = 'wordy' then 1
> >>   else 0
> >>  end
> >>) as rank
> >>FROM dict, url
> >>WHERE url.rec_id = dict.url_id
> >>GROUP BY rec_id, url, title, content_type, txt
> >>ORDER BY rank DESC
> >
> >Is there some reason you can't put the word list in the where clause?
> >(You can use coalesce to set NULL sums to 0.)
> 
> Can you give an example of what you mean use coalesce and how to use it 
> to produce a count of the matched words, the rank?

You don't really need coalesce. I mentioned that because if you take
the sum of zero rows (the case where no words match for one of the returned
rows) then you get NULL instead of 0 (which may or may not be a problem
for you).

> I am converting some existing query examples that use the form
> SUM( dict.word IN ( word list...))
> which doesn't work with pgsql.

You want to do this outside the sum, not inside (and change sum to
count). For example:
select count(dict.word) from dict where word in (word list ...);
(If word list is a select rather than a list of constants, you currently
want to use exists instead for performance reasons. In 7.4 that won't
be a problem.)

By putting things in the where close you don't have to examine all of
the rows for each word - url pair. This may cut out a lot of rows that
need to be examined if there is an index on dict.word and the number of
words in a url is large relative to the number being searched on.

Cast type int2vector/array_field to character

From

"Mr.F"

Date:

06 May 2003, 04:01:49

Forgiveness if have been asked previously.

Simple question:
1. How can I cast type int2vector/array_field into character?
2. My case like this: I have 2 table,
first table: 

Field1 Field2
====== ======
1      aaaaa
2      bbbbb
3      ccccc
4      ddddd
5      eeeee

second table: (only one field)

Array_Integer
=============
1 2 4

Wishing to be yielded from relationship both of that:

Result
======
aaaaa
bbbbb
ddddd

How command of SQL to get that result?

exist in my mind now: "select * from table1 where field1 in (select
Array_Integer from table2". :)

Thanks in advance,
Fadjar Hamidi

Re: Cast type int2vector/array_field to character

From

Achilleus Mantzios

Date:

06 May 2003, 04:54:57

On Tue, 6 May 2003, Mr.F wrote:

> Forgiveness if have been asked previously.
> 
> Simple question:
> 1. How can I cast type int2vector/array_field into character?
> 2. My case like this: I have 2 table,
> first table: 
> 
> Field1 Field2
> ====== ======
> 1      aaaaa
> 2      bbbbb
> 3      ccccc
> 4      ddddd
> 5      eeeee
> 
> second table: (only one field)
> 
> Array_Integer
> =============
> 1 2 4
> 
> Wishing to be yielded from relationship both of that:
> 
> Result
> ======
> aaaaa
> bbbbb
> ddddd
> 
> How command of SQL to get that result?
> 
> exist in my mind now: "select * from table1 where field1 in (select
> Array_Integer from table2". :)

look at contrib/intarray package.

> 
> Thanks in advance,
> Fadjar Hamidi
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
> 

-- 
==================================================================
Achilleus Mantzios
S/W Engineer
IT dept
Dynacom Tankers Mngmt
Nikis 4, Glyfada
Athens 16610
Greece
tel:    +30-210-8981112
fax:    +30-210-8981877
email:  achill@matrix.gatewaynet.com       mantzios@softlab.ece.ntua.gr

Re: 2 simple SQL questions: optimizing aggegate query

From

Alex Rice

Date:

06 May 2003, 15:28:11

On Monday, May 5, 2003, at 07:28  PM, Bruno Wolff III wrote:

>> I am converting some existing query examples that use the form
>> SUM( dict.word IN ( word list...))
>> which doesn't work with pgsql.
>
> You want to do this outside the sum, not inside (and change sum to
> count). For example:
> select count(dict.word) from dict where word in (word list ...);
> (If word list is a select rather than a list of constants, you 
> currently
> want to use exists instead for performance reasons. In 7.4 that won't
> be a problem.)
>
> By putting things in the where close you don't have to examine all of
> the rows for each word - url pair. This may cut out a lot of rows that
> need to be examined if there is an index on dict.word and the number of
> words in a url is large relative to the number being searched on.
>

Bruno: Thanks! Using your advice, my queries are now < 1 sec! Using 
this form

SELECT url_id, url.url, url.txt, url.content_type,url.docsize, count(dict.word) AS rank
FROM url,dict WHERE word IN ('word1', 'word2', ...)
AND url.rec_id = dict.url_id
GROUP BY url_id, url, txt, content_type, docsize
ORDER BY rank DESC;

I have also upgraded to pgsql 7.3.2, and the query I posted yesterday 
"sum( case ... ) " is a lot faster as well (about 2 seconds now). So I 
think maybe there was also something wrong with my indices yesterday- 
in addition to there being something wrong with my SQL. :-)

Thanks for the SQL advice.

Alex Rice, Software Developer
Architectural Research Consultants, Inc.
alrice@ARCplanning.com
alrice@swcp.com