On Fri, 22 Feb 2008, Kynn Jones wrote:
> Hi. I'm trying to optimize...
>
> (Q1) SELECT a1.word, a2.word
> FROM T a1 JOIN T a2 USING ( zipk )
> WHERE a1.type = <int1>
> AND a2.type = <int2>;
Okay, try this:
Create an index on T(type, zipk), and then CLUSTER on that index. That
will effectively group all the data for one type together and sort it by
zipk, making a merge join very quick indeed. I'm not sure whether Postgres
will notice that, but it's worth a try.
> More specifically, how can I go about building table T and the views
> V<int?>'s to maximize the performance of (Q1)? For example, I'm thinking
> that if T had an additional id column and were built in such a way that all
> the records belonging to each V<int?> were physically contiguous, and (say)
> had contiguous values in the id column, then I could define each view like
> this
The above index and CLUSTER will effectively do this - you don't need to
introduce another field.
Alternatively, you could go *really evil* and pre-join the table.
Something like this:
CREATE TABLE evilJoin AS SELECT a1.type AS type1, a2.type AS type2,
a1.zipk, a1.word AS word1, a2.word AS word2
FROM T AS a1, T AS a2
WHERE a1.zipk = a2.zipk
ORDER BY a1.type, a2.type, a1.zipk;
CREATE INDEX evilIndex1 ON evilJoin(type1, type2, zipk);
Then your query becomes:
SELECT word1, word2
FROM evilJoin
WHERE type1 = <int1>
AND type2 = <int2>
which should run quick. However, your cache usefulness will be reduced
because of the extra volume of data.
Matthew
--
[About NP-completeness] These are the problems that make efficient use of
the Fairy Godmother. -- Computer Science Lecturer