> I am struggeling with the following query which fetches a random subset
> of 200 questions that matches certain tags within certain languages.
> However, the query takes forever to evaluate, even though I have a
> "limit 200" appended. Any ideas on how to optimize it?
>
> QUERY: ================================================
>
> SELECT distinct q.question_id
> FROM question_tags qt, questions q
> WHERE q.question_id = qt.question_id
> AND q.STATUS = 1
> AND not q.author_id = 105
> AND ((qt.language_id = 5 and qt.tag_id in
> (1101,917,734,550,367,183)) or (qt.language_id = 4 and qt.tag_id in
> (856,428)) or (qt.language_id =
> 3 and qt.tag_id in (1156,1028,899,771,642,514,385,257,128)) or
> (qt.language_id = 2 and qt.tag_id in
> (1193,1101,1009,917,826,734,642,550,458,367,275,183,91)))
> and q.question_id not in (413)
> LIMIT 200
>
> EXPLAIN ANALYZE: =========================================
>
> Limit (cost=1.50..1267.27 rows=200 width=4) (actual
> time=278.169..880.934 rows=200 loops=1)
> -> Unique (cost=1.50..317614.50 rows=50185 width=4) (actual
> time=278.165..880.843 rows=200 loops=1)
> -> Merge Join (cost=1.50..317489.04 rows=50185 width=4)
> (actual time=278.162..880.579 rows=441 loops=1)
> Merge Cond: (qt.question_id = q.question_id)
> -> Index Scan using question_tags_question_id on
> question_tags qt (cost=0.00..301256.96 rows=82051 width=4) (actual
> time=24.171..146.811 rows=6067 loops=1)
> Filter: (((language_id = 5) AND (tag_id = ANY
> ('{1101,917,734,550,367,183}'::integer[]))) OR ((language_id = 4) AND
> (tag_id = ANY ('{856,428}'::integer[]))) OR ((language_id = 3) AND
> (tag_id = ANY ('{1156,1028,899,771,642,514,385,257,128}'::integer[])))
> OR ((language_id = 2) AND (tag_id = ANY
> ('{1193,1101,1009,917,826,734,642,550,458,367,275,183,91}'::integer[]))))
> -> Index Scan using questions_pkey on questions q
> (cost=0.00..15464.12 rows=83488 width=4) (actual time=222.956..731.737
> rows=1000 loops=1)
> Filter: ((q.author_id <> 105) AND (q.question_id
> <> 413) AND (q.status = 1))
> Total runtime: 881.152 ms
> (9 rows)
An index on (language_id,tag_id) should be the first thing to try.
Or perhaps even (status,language_id,tag_id) or (language_id, tad_id,
status) (but that depends on the stats on "status" column).
An index on author_id will probably not be useful for this particular
query because your condition is "author_id != constant".
Also CLUSTER question_tags on (language_id, tad_id).
What is the database size versus RAM ? You must have a hell of a lot of
questions to make this slow... (or bloat ?)