Thread: 2 simple SQL questions: optimizing aggegate query
Hello, I am using Mnogosearch with a PostgreSQL backend. I am writing my own queries because I can't use the existing Perl or PHP front ends to Mnogosearch. 1) Is there a way to optimize this query? It takes ~6 seconds on my workstation :-( I would like to get it under 2 seconds. SELECT url.rec_id, url, title, content_type, txt, sum( case when dict.word = 'wordx' then 1 when dict.word = 'wordx' then 1 when dict.word = 'wordy' then 1 else 0 end ) as rank FROM dict, url WHERE url.rec_id = dict.url_id GROUP BY rec_id, url, title, content_type, txt ORDER BY rank DESC 2) In the above query, why can't I write "HAVING rank > 0"? instead of repeating the whole entire sum() expression "HAVING sum(...)" Thanks in advance, Alex Rice, Software Developer Architectural Research Consultants, Inc. alrice@ARCplanning.com alrice@swcp.com
On Mon, May 05, 2003 at 12:59:47 -0600, Alex Rice <alrice@ARCplanning.com> wrote: > Hello, I am using Mnogosearch with a PostgreSQL backend. I am writing > my own queries because I can't use the existing Perl or PHP front ends > to Mnogosearch. > > 1) Is there a way to optimize this query? It takes ~6 seconds on my > workstation :-( I would like to get it under 2 seconds. > > SELECT url.rec_id, url, title, content_type, txt, > sum( > case > when dict.word = 'wordx' then 1 > when dict.word = 'wordx' then 1 > when dict.word = 'wordy' then 1 > else 0 > end > ) as rank > FROM dict, url > WHERE url.rec_id = dict.url_id > GROUP BY rec_id, url, title, content_type, txt > ORDER BY rank DESC Is there some reason you can't put the word list in the where clause? (You can use coalesce to set NULL sums to 0.)
On Monday, May 5, 2003, at 01:45 PM, Bruno Wolff III wrote: >> SELECT url.rec_id, url, title, content_type, txt, >> sum( >> case >> when dict.word = 'wordx' then 1 >> when dict.word = 'wordx' then 1 >> when dict.word = 'wordy' then 1 >> else 0 >> end >> ) as rank >> FROM dict, url >> WHERE url.rec_id = dict.url_id >> GROUP BY rec_id, url, title, content_type, txt >> ORDER BY rank DESC > > Is there some reason you can't put the word list in the where clause? > (You can use coalesce to set NULL sums to 0.) Can you give an example of what you mean use coalesce and how to use it to produce a count of the matched words, the rank? I am converting some existing query examples that use the form SUM( dict.word IN ( word list...)) which doesn't work with pgsql. I got the idea for the case statement from this archive message http://archives.postgresql.org/pgsql-sql/1999-10/msg00136.php Thanks much, Alex Rice, Software Developer Architectural Research Consultants, Inc. alrice@ARCplanning.com alrice@swcp.com
On Mon, May 05, 2003 at 14:03:18 -0600, Alex Rice <alrice@ARCplanning.com> wrote: > > On Monday, May 5, 2003, at 01:45 PM, Bruno Wolff III wrote: > > >>SELECT url.rec_id, url, title, content_type, txt, > >>sum( > >> case > >> when dict.word = 'wordx' then 1 > >> when dict.word = 'wordx' then 1 > >> when dict.word = 'wordy' then 1 > >> else 0 > >> end > >>) as rank > >>FROM dict, url > >>WHERE url.rec_id = dict.url_id > >>GROUP BY rec_id, url, title, content_type, txt > >>ORDER BY rank DESC > > > >Is there some reason you can't put the word list in the where clause? > >(You can use coalesce to set NULL sums to 0.) > > Can you give an example of what you mean use coalesce and how to use it > to produce a count of the matched words, the rank? You don't really need coalesce. I mentioned that because if you take the sum of zero rows (the case where no words match for one of the returned rows) then you get NULL instead of 0 (which may or may not be a problem for you). > I am converting some existing query examples that use the form > SUM( dict.word IN ( word list...)) > which doesn't work with pgsql. You want to do this outside the sum, not inside (and change sum to count). For example: select count(dict.word) from dict where word in (word list ...); (If word list is a select rather than a list of constants, you currently want to use exists instead for performance reasons. In 7.4 that won't be a problem.) By putting things in the where close you don't have to examine all of the rows for each word - url pair. This may cut out a lot of rows that need to be examined if there is an index on dict.word and the number of words in a url is large relative to the number being searched on.
Forgiveness if have been asked previously. Simple question: 1. How can I cast type int2vector/array_field into character? 2. My case like this: I have 2 table, first table: Field1 Field2 ====== ====== 1 aaaaa 2 bbbbb 3 ccccc 4 ddddd 5 eeeee second table: (only one field) Array_Integer ============= 1 2 4 Wishing to be yielded from relationship both of that: Result ====== aaaaa bbbbb ddddd How command of SQL to get that result? exist in my mind now: "select * from table1 where field1 in (select Array_Integer from table2". :) Thanks in advance, Fadjar Hamidi
On Tue, 6 May 2003, Mr.F wrote: > Forgiveness if have been asked previously. > > Simple question: > 1. How can I cast type int2vector/array_field into character? > 2. My case like this: I have 2 table, > first table: > > Field1 Field2 > ====== ====== > 1 aaaaa > 2 bbbbb > 3 ccccc > 4 ddddd > 5 eeeee > > second table: (only one field) > > Array_Integer > ============= > 1 2 4 > > Wishing to be yielded from relationship both of that: > > Result > ====== > aaaaa > bbbbb > ddddd > > How command of SQL to get that result? > > exist in my mind now: "select * from table1 where field1 in (select > Array_Integer from table2". :) look at contrib/intarray package. > > Thanks in advance, > Fadjar Hamidi > > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org > -- ================================================================== Achilleus Mantzios S/W Engineer IT dept Dynacom Tankers Mngmt Nikis 4, Glyfada Athens 16610 Greece tel: +30-210-8981112 fax: +30-210-8981877 email: achill@matrix.gatewaynet.com mantzios@softlab.ece.ntua.gr
On Monday, May 5, 2003, at 07:28 PM, Bruno Wolff III wrote: >> I am converting some existing query examples that use the form >> SUM( dict.word IN ( word list...)) >> which doesn't work with pgsql. > > You want to do this outside the sum, not inside (and change sum to > count). For example: > select count(dict.word) from dict where word in (word list ...); > (If word list is a select rather than a list of constants, you > currently > want to use exists instead for performance reasons. In 7.4 that won't > be a problem.) > > By putting things in the where close you don't have to examine all of > the rows for each word - url pair. This may cut out a lot of rows that > need to be examined if there is an index on dict.word and the number of > words in a url is large relative to the number being searched on. > Bruno: Thanks! Using your advice, my queries are now < 1 sec! Using this form SELECT url_id, url.url, url.txt, url.content_type,url.docsize, count(dict.word) AS rank FROM url,dict WHERE word IN ('word1', 'word2', ...) AND url.rec_id = dict.url_id GROUP BY url_id, url, txt, content_type, docsize ORDER BY rank DESC; I have also upgraded to pgsql 7.3.2, and the query I posted yesterday "sum( case ... ) " is a lot faster as well (about 2 seconds now). So I think maybe there was also something wrong with my indices yesterday- in addition to there being something wrong with my SQL. :-) Thanks for the SQL advice. Alex Rice, Software Developer Architectural Research Consultants, Inc. alrice@ARCplanning.com alrice@swcp.com