> I could of course post the updated query plan if anybody is interested; let
> me know. (The data is still available if anybody needs it as well, of
> course.)
I've taken a look and managed to cut out quite a bit of used time.
You'll need to confirm it's the same results though (I didn't -- it is
the same number of results (query below)
First off, "DROP INDEX prodid_index;". It doesn't help anything since
the primary key is just as usable, but it does take enough space that it
causes thrashing in the buffer_cache. Any queries based on prodid will
use the index for the PRIMARY KEY instead.
Secondly, I had no luck getting the hashjoin but this probably doesn't
matter. I've assumed that the number of users will climb faster than the
product set offered, and generated additional data via the below command
run 4 times:
INSERT INTO opinions SELECT prodid, uid + (SELECT max(uid) FROM
opinions), opinion FROM opinions;
I found that by this point, the hashjoin and mergejoin have essentially
the same performance -- in otherwords, as you grow you'll want the
mergejoin eventually so I wouldn't worry about it too much.
New Query cuts about 1/3rd the time, forcing hashjoin gets another 1/3rd
but see the above note:
SELECT o3.prodid
, SUM(o3.opinion*o12.correlation) AS total_correlation
FROM opinions o3
-- Plain join okay since o12.correlation <> 0
-- eliminates any NULLs anyway.
-- Was RIGHT JOIN
JOIN (SELECT o2.uid
, SUM(o1.opinion*o2.opinion)/SQRT(count(*)::numeric)
AS correlation
FROM opinions AS o1
JOIN opinions AS o2 USING (prodid)
WHERE o1.uid = 1355
GROUP BY o2.uid
) AS o12 USING (uid)
-- Was old Left join
WHERE o3.prodid NOT IN (SELECT prodid
FROM opinions AS o4
WHERE uid = 1355)
AND o3.opinion <> 0
AND o12.correlation <> 0
GROUP BY o3.prodid
ORDER BY total_correlation desc;