Bryce Nesbitt <bryce2@obviously.com> writes:
> The query plans are now attached (sorry I did not start there: many
> lists reject attachments). Or you can click on "text" at the query
> planner analysis site http://explain.depesz.com/s/qYq
At least some of the problem is the terrible quality of the rowcount
estimates in the IN subquery, as you extracted here:
> Nested Loop (cost=0.00..23393.15 rows=23 width=4) (actual time=0.077..15.637 rows=4003 loops=1)
> -> Index Scan using words_word on words (cost=0.00..5.47 rows=1 width=4) (actual time=0.049..0.051 rows=1
loops=1)
> Index Cond: ((word)::text = 'insider'::text)
> -> Index Scan using article_words_wc on article_words (cost=0.00..23234.38 rows=12264 width=8) (actual
time=0.020..7.237rows=4003 loops=1)
> Index Cond: (article_words.word_key = words.word_key)
> Total runtime: 19.776 ms
Given that it estimated 1 row out of "words" (quite correctly) and 12264
rows out of each scan on article_words, you'd think that the join size
estimate would be 12264, which would be off by "only" a factor of 3 from
the true result. Instead it's 23, off by a factor of 200 :-(.
Running a roughly similar test case here, I see that 8.4 gives
significantly saner estimates, which I think is because of this patch:
http://archives.postgresql.org/pgsql-committers/2008-10/msg00191.php
At the time I didn't want to risk back-patching it, because there
were a lot of other changes in the same general area in 8.4. But
it would be interesting to see what happens with your example if
you patch 8.3 similarly. (Note: I think only the first diff hunk
is relevant to 8.3.)
regards, tom lane