Thread: Query optimization

Query optimization

From

tmp

Date:

30 November 2008, 15:12:40

I am struggeling with the following query which fetches a random subset
of 200 questions that matches certain tags within certain languages.
However, the query takes forever to evaluate, even though I have a
"limit 200" appended. Any ideas on how to optimize it?

QUERY: ================================================

SELECT distinct q.question_id
               FROM question_tags qt, questions q
               WHERE q.question_id = qt.question_id
                     AND q.STATUS = 1
                     AND not q.author_id = 105
                     AND ((qt.language_id = 5 and qt.tag_id in
(1101,917,734,550,367,183)) or (qt.language_id = 4 and qt.tag_id in
(856,428)) or (qt.language_id =
  3 and qt.tag_id in (1156,1028,899,771,642,514,385,257,128)) or
(qt.language_id = 2 and qt.tag_id in
(1193,1101,1009,917,826,734,642,550,458,367,275,183,91)))
                      and q.question_id not in (413)
               LIMIT 200

EXPLAIN ANALYZE: =========================================

  Limit  (cost=1.50..1267.27 rows=200 width=4) (actual
time=278.169..880.934 rows=200 loops=1)
    ->  Unique  (cost=1.50..317614.50 rows=50185 width=4) (actual
time=278.165..880.843 rows=200 loops=1)
          ->  Merge Join  (cost=1.50..317489.04 rows=50185 width=4)
(actual time=278.162..880.579 rows=441 loops=1)
                Merge Cond: (qt.question_id = q.question_id)
                ->  Index Scan using question_tags_question_id on
question_tags qt  (cost=0.00..301256.96 rows=82051 width=4) (actual
time=24.171..146.811 rows=6067 loops=1)
                      Filter: (((language_id = 5) AND (tag_id = ANY
('{1101,917,734,550,367,183}'::integer[]))) OR ((language_id = 4) AND
(tag_id = ANY ('{856,428}'::integer[]))) OR ((language_id = 3) AND
(tag_id = ANY ('{1156,1028,899,771,642,514,385,257,128}'::integer[])))
OR ((language_id = 2) AND (tag_id = ANY
('{1193,1101,1009,917,826,734,642,550,458,367,275,183,91}'::integer[]))))
                ->  Index Scan using questions_pkey on questions q
(cost=0.00..15464.12 rows=83488 width=4) (actual time=222.956..731.737
rows=1000 loops=1)
                      Filter: ((q.author_id <> 105) AND (q.question_id
<> 413) AND (q.status = 1))
  Total runtime: 881.152 ms
(9 rows)

Re: Query optimization

From

Marc Cousin

Date:

30 November 2008, 15:37:42

Le Sunday 30 November 2008 19:45:11 tmp, vous avez écrit :
> I am struggeling with the following query which fetches a random subset
> of 200 questions that matches certain tags within certain languages.
> However, the query takes forever to evaluate, even though I have a
> "limit 200" appended. Any ideas on how to optimize it?
>
> QUERY: ================================================
>
> SELECT distinct q.question_id
>                FROM question_tags qt, questions q
>                WHERE q.question_id = qt.question_id
>                      AND q.STATUS = 1
>                      AND not q.author_id = 105
>                      AND ((qt.language_id = 5 and qt.tag_id in
> (1101,917,734,550,367,183)) or (qt.language_id = 4 and qt.tag_id in
> (856,428)) or (qt.language_id =
>   3 and qt.tag_id in (1156,1028,899,771,642,514,385,257,128)) or
> (qt.language_id = 2 and qt.tag_id in
> (1193,1101,1009,917,826,734,642,550,458,367,275,183,91)))
>                       and q.question_id not in (413)
>                LIMIT 200
>
> EXPLAIN ANALYZE: =========================================
>
>   Limit  (cost=1.50..1267.27 rows=200 width=4) (actual
> time=278.169..880.934 rows=200 loops=1)
>     ->  Unique  (cost=1.50..317614.50 rows=50185 width=4) (actual
> time=278.165..880.843 rows=200 loops=1)
>           ->  Merge Join  (cost=1.50..317489.04 rows=50185 width=4)
> (actual time=278.162..880.579 rows=441 loops=1)
>                 Merge Cond: (qt.question_id = q.question_id)
>                 ->  Index Scan using question_tags_question_id on
> question_tags qt  (cost=0.00..301256.96 rows=82051 width=4) (actual
> time=24.171..146.811 rows=6067 loops=1)
>                       Filter: (((language_id = 5) AND (tag_id = ANY
> ('{1101,917,734,550,367,183}'::integer[]))) OR ((language_id = 4) AND
> (tag_id = ANY ('{856,428}'::integer[]))) OR ((language_id = 3) AND
> (tag_id = ANY ('{1156,1028,899,771,642,514,385,257,128}'::integer[])))
> OR ((language_id = 2) AND (tag_id = ANY
> ('{1193,1101,1009,917,826,734,642,550,458,367,275,183,91}'::integer[]))))
>                 ->  Index Scan using questions_pkey on questions q
> (cost=0.00..15464.12 rows=83488 width=4) (actual time=222.956..731.737
> rows=1000 loops=1)
>                       Filter: ((q.author_id <> 105) AND (q.question_id
> <> 413) AND (q.status = 1))
>   Total runtime: 881.152 ms
> (9 rows)

First, because of the distinct, the limit 200 wont reduce the work to be done
a lot : it will still have to sort for the unique. Only when everything is
sorted will it take only the first 200 records. And anyhow it seems there are
only 441 rows before doing the distinct, so, at least for this query, it
won't change a lot the times.

Then it seems to me that you may try to create composed indexes, because there
is a lot of filtering after the index scans (that is if you want the query to
be faster).
Maybe (q.author_id,q.status).

For qt you may try (qt.language_id,qt.tag_id)...

Hope it helps

Cheers

Re: Query optimization

From

PFC

Date:

01 December 2008, 09:30:22

> I am struggeling with the following query which fetches a random subset
> of 200 questions that matches certain tags within certain languages.
> However, the query takes forever to evaluate, even though I have a
> "limit 200" appended. Any ideas on how to optimize it?
>
> QUERY: ================================================
>
> SELECT distinct q.question_id
>                FROM question_tags qt, questions q
>                WHERE q.question_id = qt.question_id
>                      AND q.STATUS = 1
>                      AND not q.author_id = 105
>                      AND ((qt.language_id = 5 and qt.tag_id in
> (1101,917,734,550,367,183)) or (qt.language_id = 4 and qt.tag_id in
> (856,428)) or (qt.language_id =
>   3 and qt.tag_id in (1156,1028,899,771,642,514,385,257,128)) or
> (qt.language_id = 2 and qt.tag_id in
> (1193,1101,1009,917,826,734,642,550,458,367,275,183,91)))
>                       and q.question_id not in (413)
>                LIMIT 200
>
> EXPLAIN ANALYZE: =========================================
>
>   Limit  (cost=1.50..1267.27 rows=200 width=4) (actual
> time=278.169..880.934 rows=200 loops=1)
>     ->  Unique  (cost=1.50..317614.50 rows=50185 width=4) (actual
> time=278.165..880.843 rows=200 loops=1)
>           ->  Merge Join  (cost=1.50..317489.04 rows=50185 width=4)
> (actual time=278.162..880.579 rows=441 loops=1)
>                 Merge Cond: (qt.question_id = q.question_id)
>                 ->  Index Scan using question_tags_question_id on
> question_tags qt  (cost=0.00..301256.96 rows=82051 width=4) (actual
> time=24.171..146.811 rows=6067 loops=1)
>                       Filter: (((language_id = 5) AND (tag_id = ANY
> ('{1101,917,734,550,367,183}'::integer[]))) OR ((language_id = 4) AND
> (tag_id = ANY ('{856,428}'::integer[]))) OR ((language_id = 3) AND
> (tag_id = ANY ('{1156,1028,899,771,642,514,385,257,128}'::integer[])))
> OR ((language_id = 2) AND (tag_id = ANY
> ('{1193,1101,1009,917,826,734,642,550,458,367,275,183,91}'::integer[]))))
>                 ->  Index Scan using questions_pkey on questions q
> (cost=0.00..15464.12 rows=83488 width=4) (actual time=222.956..731.737
> rows=1000 loops=1)
>                       Filter: ((q.author_id <> 105) AND (q.question_id
> <> 413) AND (q.status = 1))
>   Total runtime: 881.152 ms
> (9 rows)

    An index on (language_id,tag_id) should be the first thing to try.
    Or perhaps even (status,language_id,tag_id) or (language_id, tad_id,
status) (but that depends on the stats on "status" column).

    An index on author_id will probably not be useful for this particular
query because your condition is "author_id != constant".

    Also CLUSTER question_tags on (language_id, tad_id).

    What is the database size versus RAM ? You must have a hell of a lot of
questions to make this slow... (or bloat ?)