"John D. Burger" <john@mitre.org> writes:
> I have two queries for looking up related words which I think should
> be equivalent, but 7.4.8 comes up with very different plans.
They're not at all equivalent:
> explain analyze select w2.word from allwords w1 join allwordrelations
> as r on (w1.wordid = r.word1id) join allwords w2 on (w2.wordid =
> r.word2id) where w1.word = 'dogging';
> explain analyze select w2.word from allwords w1, allwords w2 where
> (w1.wordid, w2.wordid) in (select word1id, word2id from
> allwordrelations ) and w1.word = 'dogging';
If there are duplicate word1id,word2id entries in allwordrelations, the
first query will produce duplicate outputs; the second will not.
If there were a unique constraint on (word1id, word2id), in theory
the planner could prove that the IN form could be simplified to a plain
join, but there is no such logic in HEAD let alone 7.4, and in any case
you've not got such a constraint.
The plan that gets chosen is to forcibly unique-ify the (word1id,
word2id) data (via a "sort | uniq"-like pipeline) and then do a normal
join with that. Which is expensive because allwordrelations is big.
But the alternative is probably even worse: without that
allwordrelations has to be joined to w1 and w2 simultaneously, meaning
that the unconstrained cartesian product of w1 and w2 has to be formed
first.
regards, tom lane