Re: Query optimization - Mailing list pgsql-performance

From PFC
Subject Re: Query optimization
Date
Msg-id op.ulhjjchgcigqcu@soyouz
Whole thread Raw
In response to Query optimization  (tmp <skrald@amossen.dk>)
List pgsql-performance
> I am struggeling with the following query which fetches a random subset
> of 200 questions that matches certain tags within certain languages.
> However, the query takes forever to evaluate, even though I have a
> "limit 200" appended. Any ideas on how to optimize it?
>
> QUERY: ================================================
>
> SELECT distinct q.question_id
>                FROM question_tags qt, questions q
>                WHERE q.question_id = qt.question_id
>                      AND q.STATUS = 1
>                      AND not q.author_id = 105
>                      AND ((qt.language_id = 5 and qt.tag_id in
> (1101,917,734,550,367,183)) or (qt.language_id = 4 and qt.tag_id in
> (856,428)) or (qt.language_id =
>   3 and qt.tag_id in (1156,1028,899,771,642,514,385,257,128)) or
> (qt.language_id = 2 and qt.tag_id in
> (1193,1101,1009,917,826,734,642,550,458,367,275,183,91)))
>                       and q.question_id not in (413)
>                LIMIT 200
>
> EXPLAIN ANALYZE: =========================================
>
>   Limit  (cost=1.50..1267.27 rows=200 width=4) (actual
> time=278.169..880.934 rows=200 loops=1)
>     ->  Unique  (cost=1.50..317614.50 rows=50185 width=4) (actual
> time=278.165..880.843 rows=200 loops=1)
>           ->  Merge Join  (cost=1.50..317489.04 rows=50185 width=4)
> (actual time=278.162..880.579 rows=441 loops=1)
>                 Merge Cond: (qt.question_id = q.question_id)
>                 ->  Index Scan using question_tags_question_id on
> question_tags qt  (cost=0.00..301256.96 rows=82051 width=4) (actual
> time=24.171..146.811 rows=6067 loops=1)
>                       Filter: (((language_id = 5) AND (tag_id = ANY
> ('{1101,917,734,550,367,183}'::integer[]))) OR ((language_id = 4) AND
> (tag_id = ANY ('{856,428}'::integer[]))) OR ((language_id = 3) AND
> (tag_id = ANY ('{1156,1028,899,771,642,514,385,257,128}'::integer[])))
> OR ((language_id = 2) AND (tag_id = ANY
> ('{1193,1101,1009,917,826,734,642,550,458,367,275,183,91}'::integer[]))))
>                 ->  Index Scan using questions_pkey on questions q
> (cost=0.00..15464.12 rows=83488 width=4) (actual time=222.956..731.737
> rows=1000 loops=1)
>                       Filter: ((q.author_id <> 105) AND (q.question_id
> <> 413) AND (q.status = 1))
>   Total runtime: 881.152 ms
> (9 rows)

    An index on (language_id,tag_id) should be the first thing to try.
    Or perhaps even (status,language_id,tag_id) or (language_id, tad_id,
status) (but that depends on the stats on "status" column).

    An index on author_id will probably not be useful for this particular
query because your condition is "author_id != constant".

    Also CLUSTER question_tags on (language_id, tad_id).

    What is the database size versus RAM ? You must have a hell of a lot of
questions to make this slow... (or bloat ?)



pgsql-performance by date:

Previous
From: "Andrus"
Date:
Subject: Seq scan over 3.3 millions of rows instead of using date and pattern indexes
Next
From: Craig James
Date:
Subject: Sort causes system to freeze