Thread: SubQuery speed

SubQuery speed

From
andreas.fredriksson@pacer.se
Date:
Hi, I'm writing a indexed web search, stuffing some data into a
few pgsql tables.

The query involved can look something like:

BEGIN;
DECLARE CURSOR curs FOR
SELECT url,title FROM idx_files WHERE id IN (
 SELECT file_id FROM idx_ref WHERE word_id IN (
  SELECT id FROM idx_words WHERE word = 'search term 1' OR .. ) );
FETCH FORWARD 10 IN curs;
END;

The idx_files table holds about 1.000 rows, the idx_words table holds
about 50.000 words, and the idx_ref is a massive 250.000+
row cross-reference (file_id, word_id) which links the words and
pages together.

A query like the above took several hours on my Pentium 200 at work before
I killed the stuck daemon. This is kind of strange since the innermost
subquery is only suppose to return a single id. :-(

Are there any alternate methods of solving this? I don't think it's
possible to write this kind of query using a multi-JOIN?

Thanks for your input,
---
Andreas `dep' Fredriksson           andreas.fredriksson@pacer.se
Programmerare, Pacer Communication    www.pacer.se, 08-665 34 10

Linux: Because rebooting is for adding new hardware.


Re: [SQL] SubQuery speed

From
Herouth Maoz
Date:
At 16:45 +0200 on 8/9/98, andreas.fredriksson@pacer.se wrote:


> Are there any alternate methods of solving this? I don't think it's
> possible to write this kind of query using a multi-JOIN?

Why not? So long as the query is "in", and not "not in"?

Should be something like:

SELECT DISTINCT url,title
FROM idx_files, idx_ref, idx_words
WHERE idx_files.id = idx_ref.file_id
  AND idx_ref.word_id = idx_words.id
  AND (idx_words.word = 'search term 1' OR .. );

In any case, are there indices on the appropriate fields?

If the innermost query is supposed to return only one tuple, you should use
"=" rather than IN, although I don't know how much that adds. The optimizer
probably has an easier time with "=" than with IN.

Herouth

--
Herouth Maoz, Internet developer.
Open University of Israel - Telem project
http://telem.openu.ac.il/~herutma



Re: [SQL] SubQuery speed

From
Vadim Mikheev
Date:
andreas.fredriksson@pacer.se wrote:
>
> BEGIN;
> DECLARE CURSOR curs FOR
> SELECT url,title FROM idx_files WHERE id IN (
>  SELECT file_id FROM idx_ref WHERE word_id IN (
>   SELECT id FROM idx_words WHERE word = 'search term 1' OR .. ) );
> FETCH FORWARD 10 IN curs;
> END;
>
> The idx_files table holds about 1.000 rows, the idx_words table holds
> about 50.000 words, and the idx_ref is a massive 250.000+
> row cross-reference (file_id, word_id) which links the words and
> pages together.
>
> A query like the above took several hours on my Pentium 200 at work before
> I killed the stuck daemon. This is kind of strange since the innermost
> subquery is only suppose to return a single id. :-(
>
> Are there any alternate methods of solving this? I don't think it's
> possible to write this kind of query using a multi-JOIN?

Try to rewrite query using EXISTS instead of IN and create
indices.

Vadim