Re: Odd sorting behaviour - Mailing list pgsql-performance

From Rod Taylor
Subject Re: Odd sorting behaviour
Date
Msg-id 1090376297.12565.9.camel@jester
Whole thread Raw
In response to Re: Odd sorting behaviour  ("Steinar H. Gunderson" <sgunderson@bigfoot.com>)
Responses Re: Odd sorting behaviour
List pgsql-performance
> I could of course post the updated query plan if anybody is interested; let
> me know. (The data is still available if anybody needs it as well, of
> course.)

I've taken a look and managed to cut out quite a bit of used time.
You'll need to confirm it's the same results though (I didn't -- it is
the same number of results (query below)

First off, "DROP INDEX prodid_index;". It doesn't help anything since
the primary key is just as usable, but it does take enough space that it
causes thrashing in the buffer_cache. Any queries based on prodid will
use the index for the PRIMARY KEY instead.

Secondly, I had no luck getting the hashjoin but this probably doesn't
matter. I've assumed that the number of users will climb faster than the
product set offered, and generated additional data via the below command
run 4 times:

        INSERT INTO opinions SELECT prodid, uid + (SELECT max(uid) FROM
        opinions), opinion FROM opinions;

I found that by this point, the hashjoin and mergejoin have essentially
the same performance -- in otherwords, as you grow you'll want the
mergejoin eventually so I wouldn't worry about it too much.


New Query cuts about 1/3rd the time, forcing hashjoin gets another 1/3rd
but see the above note:

  SELECT o3.prodid
       , SUM(o3.opinion*o12.correlation) AS total_correlation
    FROM opinions o3

         -- Plain join okay since o12.correlation <> 0
         -- eliminates any NULLs anyway.
         -- Was RIGHT JOIN
    JOIN (SELECT o2.uid
               , SUM(o1.opinion*o2.opinion)/SQRT(count(*)::numeric)
                 AS correlation
            FROM opinions AS o1
            JOIN opinions AS o2 USING (prodid)
           WHERE o1.uid = 1355
        GROUP BY o2.uid
         ) AS o12 USING (uid)

         -- Was old Left join
   WHERE o3.prodid NOT IN (SELECT prodid
                             FROM opinions AS o4
                            WHERE uid = 1355)
     AND o3.opinion <> 0
     AND o12.correlation <> 0
GROUP BY o3.prodid
ORDER BY total_correlation desc;



pgsql-performance by date:

Previous
From: Christopher Kings-Lynne
Date:
Subject: Re: Unbearably slow cascading deletes
Next
From: "Steinar H. Gunderson"
Date:
Subject: Re: Odd sorting behaviour