Tom Lane replied:
>> I have two queries for looking up related words which I think should
>> be equivalent, but 7.4.8 comes up with very different plans.
>
> They're not at all equivalent:
> If there are duplicate word1id,word2id entries in allwordrelations,
> the
> first query will produce duplicate outputs; the second will not.
Ah, that should have been my second guess - whenever I fail to get
stuff like this, it's usually to do with either duplicates or NULLs.
> If there were a unique constraint on (word1id, word2id), in theory
> the planner could prove that the IN form could be simplified to a
> plain
> join, but there is no such logic in HEAD let alone 7.4, and in any
> case
> you've not got such a constraint.
But such would reflect the reality of my data, so it should be there.
> The plan that gets chosen is to forcibly unique-ify the (word1id,
> word2id) data (via a "sort | uniq"-like pipeline) and then do a normal
> join with that. Which is expensive because allwordrelations is big.
> But the alternative is probably even worse: without that
> allwordrelations has to be joined to w1 and w2 simultaneously, meaning
> that the unconstrained cartesian product of w1 and w2 has to be formed
> first.
Hmm, but wouldn't it at least filter one side per my where clause:
w1.word = 'dogging'? Anyway, thanks, the incremental enlightenment
continues.
- John Burger
MITRE